Skip to content

Commit 11052f3

Browse files
committed
TEP-0127: Larger Results via Sidecar Logs
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. This allows us to support larger `Results` that are stored within `TaskRun` CRDs. [4012]: tektoncd/pipeline#4012 [4808]: tektoncd/pipeline#4808 [tep-0086]: https://github.com/tektoncd/community/blob/main/teps/0086-changing-the-way-result-parameters-are-stored.md
1 parent 524ca13 commit 11052f3

File tree

2 files changed

+221
-0
lines changed

2 files changed

+221
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
status: implementable
3+
title: Larger Results via Sidecar Logs
4+
creation-date: '2022-11-30'
5+
last-updated: '2022-11-30'
6+
authors:
7+
- '@chitrangpatel'
8+
- '@jerop'
9+
---
10+
11+
# TEP-0127: Larger Results via Sidecar Logs
12+
13+
<!-- toc -->
14+
- [Summary](#summary)
15+
- [Motivation](#motivation)
16+
- [Goals](#goals)
17+
- [Non-Goals](#non-goals)
18+
- [Use Cases](#use-cases)
19+
- [Proposal](#proposal)
20+
- [Feature Gate](#feature-gate)
21+
- [Logs Access](#logs-access)
22+
- [Size Limit](#size-limit)
23+
- [Script in `Sidecar`](#script-in-sidecar)
24+
- [`PipelineRun` Status](#pipelinerun-status)
25+
- [Design Evaluation](#design-evaluation)
26+
- [Reusability](#reusability)
27+
- [Simplicity](#simplicity)
28+
- [Performance](#performance)
29+
- [Security](#security)
30+
- [References](#references)
31+
<!-- /toc -->
32+
33+
This TEP builds on the hard work of many people who have been tackling the problem over the past couple years,
34+
including but not limited to:
35+
- '@tlawrie'
36+
- '@imjasonh'
37+
- '@bobcatfish'
38+
- '@pritidesai'
39+
- '@tomcli'
40+
- '@ScrapCodes'
41+
- '@wlynch'
42+
43+
## Summary
44+
45+
Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012].
46+
47+
The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many
48+
alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs.
49+
This allows us to support larger `Results` that are stored within `TaskRun` CRDs.
50+
51+
## Motivation
52+
53+
`Results` are too small - see [issue][4012]. The current implementation of `Results` involves parsing from disk and
54+
storing as part of the `Termination Message` which has a limit of 4KB per `Container` and 12KB per `Pod`. As such,
55+
the size limit of `Results` is 12KB per `TaskRun` and 4KB per `Step` at best.
56+
57+
To make matters worse, the limit is divided equally among all `Containers` in a `Pod` - see [issue][4808]. Therefore,
58+
the more the `Steps` in a `Task` the less the size limit for `Results`. For example, if there are 12 `Steps` then each
59+
has 1KB in `Termination Message` storage to produce `Results`.
60+
61+
[TEP-0086][tep-0086] aims to support larger `Results`. [TEP-0086][tep-0086] has many [alternatives][tep-0086-alts] with
62+
no proposal because there's no obvious "best" solution that would meet all the requirements.
63+
64+
This TEP proposes experimenting with `Sidecar` logs to support larger `Results` that are stored with `TaskRun` CRDs.
65+
This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other
66+
alternatives.
67+
68+
### Goals
69+
70+
The main goal is to support larger `Results` that are limited by the size of a `TaskRun` CRD (1.5MB) via `Sidecar` logs.
71+
72+
### Non-Goals
73+
74+
The following are out of scope for this TEP:
75+
76+
1. Solving use cases that requires really large `Results` beyond the size limit of a CRD (1.5MB).
77+
78+
2. Addressing other [alternatives][tep-0086-alts] for larger `Results` that are listed in [TEP-0086][tep-0086].
79+
80+
### Use Cases
81+
82+
1. Store larger `Results`, such as JSON payloads from an HTTP call or a Cloud Event, that can be inspected later or
83+
passed to other `Tasks` without the need to understand storage mechanisms.
84+
85+
3. The [documented guidance][docs] is that `Results` are used for outputs less than 1KB while `Workspaces` are used
86+
for larger data.
87+
88+
> As a general rule-of-thumb, if a `Result` needs to be larger than a kilobyte, you should likely use a `Workspace`
89+
to store and pass it between `Tasks` within a `Pipeline`.
90+
91+
Supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without having to
92+
change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`.
93+
94+
## Proposal
95+
96+
We propose using stdout logs from a dedicated `Sidecar` to return a json `Result` object to support larger `Results`.
97+
The `Pipeline` controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and
98+
append `Results` to the `TaskRun`.
99+
100+
We propose injecting the dedicated `Sidecar` alongside other `Steps`. The `Sidecar` will watch the `Results` paths of
101+
the `Steps`. This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The
102+
`TaskRun` controller will access the stdout logs of the `Sidecar` then extract the `Results` and its contents.
103+
104+
After the `Steps` have terminated, the `TaskRun` controller will write the `Results` from the logs of the `Sidecar`
105+
instead of using the `Termination Message`, hence bypassing the 4KB limit. This method keeps the rest of the current
106+
functionality identical and does not require any external storage mechanism.
107+
108+
For further details, see the [demonstration][demo] and [proof of concept][poc].
109+
110+
This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we continue to
111+
explore other alternatives, including those that involve external storage.
112+
113+
### Feature Gate
114+
115+
This feature will be gated using a `result-extraction-method` feature flag.
116+
117+
Users have to set `result-extraction-method` to `"sidecar-logs"` to enable the feature:
118+
```shell
119+
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"result-extraction-method":"sidecar-logs"}}'
120+
```
121+
122+
This feature is disabled by default. When disabled, `Results` continue to pass through `Termination Message`.
123+
124+
### Logs Access
125+
126+
This feature requires that the `Pipeline` controller has access to `Pod` logs.
127+
128+
Users have to grant `get` access to all `pods/log` to the `Pipeline` controller:
129+
```shell
130+
kubectl apply -f config/enable-log-access-to-controller/
131+
```
132+
133+
### Size Limit
134+
135+
The size limit per `Result` can be `max-result-size` feature flag, which takes the integer value of the bytes.
136+
137+
The `max-result-size` feature flag defaults to 4096 bytes. This ensures that we support existing `Tasks` with only one
138+
`Result` that uses up the whole size limit of the `Termination Message`.
139+
140+
If users want to set the size limit per `Result` to be something other than 4096 bytes, they can set `max-result-size`
141+
by setting `max-result-size: <VALUE-IN-BYTES>`. The value set here cannot exceed the CRD size limit of 1.5MB.
142+
143+
```shell
144+
kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}'
145+
```
146+
147+
Even though the size limit per `Result` is configurable, the size of `Results` are limited by CRD size limit of 1.5MB.
148+
If the size of `Results` exceeds this limit, then the `TaskRun` will fail with a message indicating the size limit has
149+
been exceeded.
150+
151+
### Script in `Sidecar`
152+
153+
The `Sidecar` will run a script that:
154+
- is auto-generated based on the required `Results` - identified from `taskSpec.results` field.
155+
- periodically checks for files in the directory `/tekton/results`.
156+
- when a `Result` is found, it prints it to stdout in a parsable pattern.
157+
- when all the expected `Results` are found, it breaks out of the periodic loop and exits.
158+
159+
For further details, see the [demonstration][demo] of the proof of concept.
160+
161+
### `PipelineRun` Status
162+
163+
In [TEP-0100][tep-0100], we proposed changes to `PipelineRun` status to reduce the amount of information stored about
164+
the status of `TaskRuns` and `Runs` to improve performance and reduce memory bloat. The `PipelineRun` status is set up
165+
to handle larger `Results` without exacerbating the performance and storage issues that were there before.
166+
167+
For `ChildReferences` to be populated, the `embedded-status` must be set to `"minimal"`. We recommend that the minimal
168+
embedded status - `ChildReferences` - is enabled while migration is ongoing until it becomes the only supported status.
169+
170+
## Design Evaluation
171+
172+
### Reusability
173+
174+
This proposal does not introduce any API changes to specification `Results`, the changes are in implementation details
175+
of `Results`. The existing `Tasks` will continue to function as they are, only that they can support larger `Results`.
176+
177+
Even more, this supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without
178+
having to change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`.
179+
This allows users to control execution as needed by their context without having to modify `Tasks` and `Pipelines`.
180+
181+
### Simplicity
182+
183+
This proposal provides a simple solution that solves most use cases:
184+
- Users don't need additional infrastructure, such as server or object storage, to support larger `Results`.
185+
- Existing `Tasks` will continue to function as they are, while supporting larger `Results`, without any API changes.
186+
187+
### Performance
188+
189+
Performance benchmarking with 20-30 `PipelineRuns`, each with 3 `TaskRuns` each with two `Steps`:
190+
- Average `Pipeline` controller's CPU difference during `PipelineRun` execution: 1%
191+
- Average `Pipeline` controller's Memory usage difference during `PipelineRun` execution: 0.2%
192+
- Average `Pod` startup time (time to get to running state) difference: 3s per `TaskRun`
193+
194+
In the experiment, we will continue to measure the startup overhead and explore ways to improve it.
195+
196+
For further details, see the [performance metrics][performance].
197+
198+
### Security
199+
200+
This approach requires that the `Pipeline` controller has access to `Pod` logs. The `Pipeline` controller already has
201+
extensive permissions in the cluster, such as read access to `Secrets`. Expanding the access even further is a concern
202+
for some users, but is also acceptable for some users given the advantages. We will document the extended permissions
203+
so that users can make the right choice for their own use cases and requirements.
204+
205+
## References
206+
207+
- [TEP-0086: Larger Results][tep-0086]
208+
- [TEP-0086: Larger Results via Sidecar Logs][745]
209+
210+
[docs]: https://tekton.dev/docs/pipelines/tasks/#emitting-results
211+
[4012]: https://github.com/tektoncd/pipeline/issues/4012
212+
[4808]: https://github.com/tektoncd/pipeline/issues/4808
213+
[tep-0086]: ./0086-changing-the-way-result-parameters-are-stored.md
214+
[tep-0086-alts]: ./0086-changing-the-way-result-parameters-are-stored.md#alternatives
215+
[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292
216+
[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md
217+
[demo]: https://drive.google.com/file/d/1NrWudE_XBqweomiY24DP2Txnl1yN0yD9/view
218+
[poc]: https://github.com/tektoncd/pipeline/pull/5695
219+
[performance]: https://github.com/tektoncd/community/pull/745#issuecomment-1206668381
220+
[745]: https://github.com/tektoncd/community/pull/745

teps/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -292,3 +292,4 @@ This is the complete list of Tekton teps:
292292
|[TEP-0123](0123-specifying-on-demand-retry-in-pipelinetask.md) | Specifying on-demand-retry in a PipelineTask | proposed | 2022-10-11 |
293293
|[TEP-0124](0124-distributed-tracing-for-tasks-and-pipelines.md) | Distributed tracing for Tasks and Pipelines | implementable | 2022-10-16 |
294294
|[TEP-0125](0125-add-credential-filter-to-entrypoint-logger.md) | Add credential filter to entrypoint logger | proposed | 2022-10-27 |
295+
|[TEP-0127](0127-larger-results-via-sidecar-logs.md) | Larger Results via Sidecar Logs | implementable | 2022-11-30 |

0 commit comments

Comments
 (0)