Skip to content

Commit 9c731cb

Browse files
committed
TEP-0127: Larger Results via Sidecar Logs
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
1 parent 524ca13 commit 9c731cb

File tree

2 files changed

+293
-0
lines changed

2 files changed

+293
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
---
2+
status: implementable
3+
title: Larger Results via Sidecar Logs
4+
creation-date: '2022-11-30'
5+
last-updated: '2022-12-01'
6+
authors:
7+
- '@chitrangpatel'
8+
- '@jerop'
9+
see-also:
10+
- TEP-0086
11+
---
12+
13+
# TEP-0127: Larger Results via Sidecar Logs
14+
15+
<!-- toc -->
16+
- [Summary](#summary)
17+
- [Motivation](#motivation)
18+
- [Goals](#goals)
19+
- [Non-Goals](#non-goals)
20+
- [Use Cases](#use-cases)
21+
- [Proposal](#proposal)
22+
- [Feature Gate](#feature-gate)
23+
- [Logs Access](#logs-access)
24+
- [Size Limit](#size-limit)
25+
- [PipelineRun Status](#pipelinerun-status)
26+
- [Design Details](#design-details)
27+
- [Design Evaluation](#design-evaluation)
28+
- [Reusability](#reusability)
29+
- [Interoperability](#interoperability)
30+
- [Simplicity](#simplicity)
31+
- [Performance](#performance)
32+
- [Security](#security)
33+
- [Experiment Questions](#experiment-questions)
34+
- [References](#references)
35+
<!-- /toc -->
36+
37+
This TEP builds on the hard work of many people who have been tackling the problem over the past couple years,
38+
including but not limited to:
39+
- '@abayer'
40+
- '@afrittoli'
41+
- '@bobcatfish'
42+
- '@dibyom'
43+
- '@imjasonh'
44+
- '@pritidesai'
45+
- '@ScrapCodes'
46+
- '@skaegi'
47+
- '@tlawrie'
48+
- '@tomcli'
49+
- '@vdemeester'
50+
- '@wlynch'
51+
52+
## Summary
53+
54+
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].
55+
56+
The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
57+
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
58+
This allows us to support larger `Results` which are stored within `TaskRun` CRDs.
59+
60+
## Motivation
61+
62+
`Results` are too small - see [issue][4012]. The current implementation of `Results` involves parsing from disk and
63+
storing as part of the `Termination Message` which has a limit of 4KB per `Container` and 12KB per `Pod`. As such,
64+
the size limit of `Results` is 12KB per `TaskRun` and 4KB per `Step` at best.
65+
66+
To make matters worse, the limit is divided equally among all `Containers` in a `Pod` - see [issue][4808]. Therefore,
67+
the more the `Steps` in a `Task` the less the size limit for `Results`. For example, if there are 12 `Steps` then each
68+
has 1KB in `Termination Message` storage to produce `Results`.
69+
70+
[TEP-0086][tep-0086] aims to support larger `Results`. [TEP-0086][tep-0086] has many [alternatives][tep-0086-alts] with
71+
no proposal because there's no obvious "best" solution that would meet all the requirements.
72+
73+
This TEP proposes experimenting with `Sidecar` logs to support larger `Results` that are stored within `TaskRun` CRDs.
74+
This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other
75+
alternatives.
76+
77+
In addition, the [documented guidance][docs] is that `Results` are used for outputs less than 1KB while `Workspaces`
78+
are used for larger data. Supporting larger `Results` up to the CRD limit allows users to reuse `Tasks` in more
79+
scenarios without having to change the specification to use `Workspaces` upon hitting the current low size limit of
80+
4KB per `TaskRun`.
81+
82+
> As a general rule-of-thumb, if a `Result` needs to be larger than a kilobyte, you should likely use a `Workspace`
83+
to store and pass it between `Tasks` within a `Pipeline`.
84+
85+
### Goals
86+
87+
The main goal is to support larger `Results` that are limited by the size of a `TaskRun` CRD (1.5MB) via `Sidecar` logs.
88+
89+
### Non-Goals
90+
91+
The following are out of scope for this TEP:
92+
93+
1. Solving use cases that requires really large `Results` beyond the size limit of a CRD (1.5MB).
94+
95+
2. Addressing other [alternatives][tep-0086-alts] for larger `Results` that are listed in [TEP-0086][tep-0086].
96+
97+
### Use Cases
98+
99+
1. Support signing `Results` using SPIRE for non-falsifiable provenance that's required for [SLSA L3][L3]. As described
100+
in [TEP-0089][tep-0089], the signatures and certificates used to verify `Results` are stored alongside the `Results`
101+
themselves in the `Termination Message` which further exacerbates the size limit issues. The size of the certificate
102+
is 800 bytes and the size of signatures is approximately 100 bytes per `Result`.
103+
104+
> For now, signatures of the `Results` will be contained within the `Termination Message` of the `Pod`, alongside
105+
any additional material required to perform verification. One consideration of this is the size of the additional
106+
fields required. The size of the certificate needed for verification is about 800 bytes, and the size of the
107+
signatures is about 100 bytes * (number of `Results` + 1). The current `Termination Message` size is 4K, but there
108+
is TEP-0086 looking at supporting larger results.
109+
110+
2. Support emitting structured `Results`. For example, store the images produced by `ko` and all their copies to the
111+
various regional registries in a `TaskRun`. The release `Pipeline` has this use case, and it ran into the current
112+
size limit of 4096 bytes. As described in the [issue][4282], the images produced were 4572 bytes which didn't fit
113+
into `Results`. For further details about structured `Results`, see [TEP-0075][tep-0075] and [TEP-0076][tep-0076].
114+
115+
## Proposal
116+
117+
We propose using stdout logs from a dedicated `Sidecar` to return a json `Result` object to support larger `Results`.
118+
The `Pipeline` controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and
119+
append `Results` to the `TaskRun`.
120+
121+
We propose injecting the dedicated `Sidecar` alongside other `Steps`. The `Sidecar` will watch the `Results` paths of
122+
the `Steps`. This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The
123+
`TaskRun` controller will access the stdout logs of the `Sidecar` then extract the `Results` and its contents.
124+
125+
After the `Steps` have terminated, the `TaskRun` controller will write the `Results` from the logs of the `Sidecar`
126+
instead of using the `Termination Message`, hence bypassing the 4KB limit. This method keeps the rest of the current
127+
functionality identical and does not require any external storage mechanism.
128+
129+
For further details, see the [demonstration][demo] of the [implementation][poc].
130+
131+
This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we continue to
132+
explore other alternatives, including those that involve external storage.
133+
134+
### Feature Gate
135+
136+
This feature will be gated using a `result-extraction-method` feature flag.
137+
138+
Users have to set `result-extraction-method` to `"sidecar-logs"` to enable the feature:
139+
```shell
140+
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"result-extraction-method":"sidecar-logs"}}'
141+
```
142+
143+
This feature is disabled by default. When disabled, `Results` continue to pass through `Termination Message`.
144+
145+
### Logs Access
146+
147+
This feature requires that the `Pipeline` controller has access to `Pod` logs.
148+
149+
Users have to grant `get` access to all `pods/log` to the `Pipeline` controller:
150+
```shell
151+
kubectl apply -f config/enable-log-access-to-controller/
152+
```
153+
154+
### Size Limit
155+
156+
The size limit per `Result` can be configured using the `max-result-size` feature flag, which takes the integer value
157+
of the bytes.
158+
159+
The `max-result-size` feature flag defaults to 4096 bytes. This ensures that we support existing `Tasks` with only one
160+
`Result` that uses up the whole size limit of the `Termination Message`.
161+
162+
If users want to set the size limit per `Result` to be something other than 4096 bytes, they can set `max-result-size`
163+
by setting `max-result-size: <VALUE-IN-BYTES>`. The value set here cannot exceed the CRD size limit of 1.5MB.
164+
165+
```shell
166+
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}'
167+
```
168+
169+
Even though the size limit per `Result` is configurable, the size of `Results` are limited by CRD size limit of 1.5MB.
170+
If the size of `Results` exceeds this limit, then the `TaskRun` will fail with a message indicating the size limit has
171+
been exceeded.
172+
173+
### PipelineRun Status
174+
175+
In [TEP-0100][tep-0100], we proposed changes to `PipelineRun` status to reduce the amount of information stored about
176+
the status of `TaskRuns` and `Runs`. As a result, the `PipelineRun` status is set up to handle larger `Results` without
177+
exacerbating the storage issues that were there before.
178+
179+
For `ChildReferences` to be populated, the `embedded-status` must be set to `"minimal"`. We recommend that the minimal
180+
embedded status - `ChildReferences` - is enabled while migration is ongoing until it becomes the only supported status.
181+
This will ensure that larger `Results` from all its `TaskRuns` will not bloat the `PipelineRun` CRD.
182+
183+
## Design Details
184+
185+
The `Sidecar` will run a binary that:
186+
- receives argument for `Results`' paths and names which are identified from `taskSpec.results` field. This allows the
187+
`Sidecar` to know the `Results` it needs to read.
188+
- has `/tekton/run` volume mounted as [read-only][4227] where status of each `Step` is written.
189+
- periodically checks for `Step` status in the path `/tekton/run`.
190+
- when all `Steps` have completed, it immediately parses all the `Results` in paths and prints them to stdout in a
191+
parsable pattern.
192+
193+
For further details, see the [demonstration][demo] of the [implementation][poc].
194+
195+
## Design Evaluation
196+
197+
### Reusability
198+
199+
This proposal does not introduce any API changes to specification `Results`. The changes are in implementation details
200+
of `Results`. The existing `Tasks` will continue to function as they are, only that they can support larger `Results`.
201+
202+
Even more, supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without
203+
having to change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`.
204+
This allows users to control execution as needed by their context without having to modify `Tasks` and `Pipelines`.
205+
206+
### Interoperability
207+
208+
Users may write `Tasks` that assume larger `Results` support. These `Tasks` would only work on *Tekton Pipelines*
209+
installations that are configured to support it. This is a risk to `Task` interoperability which is mitigated by:
210+
1. We still have a hard limit on the size of `Results` which is the CRD size limit.
211+
2. We still plan to support larger `Results` in the long run regardless of the implementation details.
212+
213+
### Simplicity
214+
215+
This proposal provides a simple solution that solves most use cases:
216+
- Users don't need additional infrastructure, such as server or object storage, to support larger `Results`.
217+
- Existing `Tasks` will continue to function as they do now, while supporting larger `Results`, without any API changes.
218+
219+
### Performance
220+
221+
Performance benchmarking with 20-30 `PipelineRuns`, each with 3 `TaskRuns` each with two `Steps`:
222+
- Average `Pipeline` controller's CPU difference during `PipelineRun` execution: 1%
223+
- Average `Pipeline` controller's Memory usage difference during `PipelineRun` execution: 0.2%
224+
- Average `Pod` startup time (time to get to running state) difference: 3s per `TaskRun`
225+
226+
In the experiment, we will continue to measure the startup overhead and explore ways to improve it.
227+
228+
For further details, see the [performance metrics][performance].
229+
230+
### Security
231+
232+
This approach requires that the `Pipeline` controller has access to `Pod` logs. The `Pipeline` controller already has
233+
extensive permissions in the cluster, such as read access to `Secrets`. Expanding the access even further is a concern
234+
for some users, but is also acceptable for some users given the advantages. We will document the extended permissions
235+
so that users can make the right choice for their own use cases and requirements.
236+
237+
When users have enabled larger `Results` via `Sidecar` logs, there's a validation check to ensure that users cannot
238+
inject their own `Sidecar` overtop of the one specified by Tekton.
239+
240+
## Experiment Questions
241+
242+
These are some questions we plan to answer in the experiment:
243+
244+
- What impact does this change have on the startup and execution time of `TaskRuns` and `PipelineRuns`? Can we
245+
improve the performance impact?
246+
247+
- How reliable is using `Sidecar` logs to process `Results`?
248+
249+
- How many users adopt this solution? How many are satisfied with it given the advantages and disadvantages?
250+
We will conduct a user survey soon after the feature has been released.
251+
252+
## References
253+
254+
- Implementation:
255+
- [Implementation Pull Request][poc]
256+
- [Demonstration by Chitrang][demo]
257+
- Tekton Enhancement Proposals:
258+
- [TEP-0075: Object Parameters and Results][tep-0075]
259+
- [TEP-0076: Array Results][tep-0076]
260+
- [TEP-0086: Larger Results][tep-0086]
261+
- [TEP-0089: Non-Falsifiable Provenance][tep-0089]
262+
- [TEP-0100: PipelineRun Status][tep-0100]
263+
- Issues:
264+
- [Changing the way Results are stored][4012]
265+
- [Results, TerminationMessage and Containers][4808]
266+
- [Publish task fails, IMAGES result too large][4282]
267+
- Prior Work:
268+
- [TEP-0086: Using logs emitted by the Task][tep-0086-logs]
269+
- [TEP-0086: Larger Results via Sidecar Logs][745]
270+
- [TEP-0086 Working Group Meeting Notes][notes]
271+
- [Tekton Data Interface - Problem Space][data-interface]
272+
273+
[docs]: https://tekton.dev/docs/pipelines/tasks/#emitting-results
274+
[4012]: https://github.com/tektoncd/pipeline/issues/4012
275+
[4808]: https://github.com/tektoncd/pipeline/issues/4808
276+
[4282]: https://github.com/tektoncd/pipeline/issues/4282
277+
[4227]: https://github.com/tektoncd/pipeline/issues/4227
278+
[745]: https://github.com/tektoncd/community/pull/745
279+
[L3]: https://slsa.dev/spec/v0.1/levels
280+
[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292
281+
[demo]: https://drive.google.com/file/d/1NrWudE_XBqweomiY24DP2Txnl1yN0yD9/view
282+
[poc]: https://github.com/tektoncd/pipeline/pull/5695
283+
[performance]: https://github.com/tektoncd/community/pull/745#issuecomment-1206668381
284+
[tep-0075]: ./0075-object-param-and-result-types.md
285+
[tep-0076]: ./0076-array-result-types.md
286+
[tep-0086]: ./0086-changing-the-way-result-parameters-are-stored.md
287+
[tep-0086-alts]: ./0086-changing-the-way-result-parameters-are-stored.md#alternatives
288+
[tep-0086-logs]: ./0086-changing-the-way-result-parameters-are-stored.md#using-logs-emitted-by-the-task
289+
[tep-0089]: ./0089-nonfalsifiable-provenance-support.md
290+
[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
291+
[notes]: https://docs.google.com/document/d/1z2ME1o_XHvqv6cVEeElljvqVqHV8-XwtXdTkNklFU_8/edit?usp=sharing
292+
[data-interface]: https://docs.google.com/document/d/1XbeI-_4bFaBize3adQmSL6Czo8MXVvoYn6dJgFZJ_So/edit?usp=sharing

teps/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -292,3 +292,4 @@ This is the complete list of Tekton teps:
292292
|[TEP-0123](0123-specifying-on-demand-retry-in-pipelinetask.md) | Specifying on-demand-retry in a PipelineTask | proposed | 2022-10-11 |
293293
|[TEP-0124](0124-distributed-tracing-for-tasks-and-pipelines.md) | Distributed tracing for Tasks and Pipelines | implementable | 2022-10-16 |
294294
|[TEP-0125](0125-add-credential-filter-to-entrypoint-logger.md) | Add credential filter to entrypoint logger | proposed | 2022-10-27 |
295+
|[TEP-0127](0127-larger-results-via-sidecar-logs.md) | Larger Results via Sidecar Logs | implementable | 2022-12-01 |

0 commit comments

Comments
 (0)