|
| 1 | +--- |
| 2 | +status: implementable |
| 3 | +title: Larger Results via Sidecar Logs |
| 4 | +creation-date: '2022-11-30' |
| 5 | +last-updated: '2022-12-01' |
| 6 | +authors: |
| 7 | +- '@chitrangpatel' |
| 8 | +- '@jerop' |
| 9 | +see-also: |
| 10 | +- TEP-0086 |
| 11 | +--- |
| 12 | + |
| 13 | +# TEP-0127: Larger Results via Sidecar Logs |
| 14 | + |
| 15 | +<!-- toc --> |
| 16 | +- [Summary](#summary) |
| 17 | +- [Motivation](#motivation) |
| 18 | + - [Goals](#goals) |
| 19 | + - [Non-Goals](#non-goals) |
| 20 | + - [Use Cases](#use-cases) |
| 21 | +- [Proposal](#proposal) |
| 22 | + - [Feature Gate](#feature-gate) |
| 23 | + - [Logs Access](#logs-access) |
| 24 | + - [Size Limit](#size-limit) |
| 25 | + - [PipelineRun Status](#pipelinerun-status) |
| 26 | +- [Design Details](#design-details) |
| 27 | +- [Design Evaluation](#design-evaluation) |
| 28 | + - [Reusability](#reusability) |
| 29 | + - [Interoperability](#interoperability) |
| 30 | + - [Simplicity](#simplicity) |
| 31 | + - [Performance](#performance) |
| 32 | + - [Security](#security) |
| 33 | +- [Experiment Questions](#experiment-questions) |
| 34 | +- [References](#references) |
| 35 | +<!-- /toc --> |
| 36 | + |
| 37 | +This TEP builds on the hard work of many people who have been tackling the problem over the past couple years, |
| 38 | +including but not limited to: |
| 39 | +- '@abayer' |
| 40 | +- '@afrittoli' |
| 41 | +- '@bobcatfish' |
| 42 | +- '@dibyom' |
| 43 | +- '@imjasonh' |
| 44 | +- '@pritidesai' |
| 45 | +- '@ScrapCodes' |
| 46 | +- '@skaegi' |
| 47 | +- '@tlawrie' |
| 48 | +- '@tomcli' |
| 49 | +- '@vdemeester' |
| 50 | +- '@wlynch' |
| 51 | + |
| 52 | +## Summary |
| 53 | + |
| 54 | +Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. |
| 55 | + |
| 56 | +The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many |
| 57 | +alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. |
| 58 | +This allows us to support larger `Results` which are stored within `TaskRun` CRDs. |
| 59 | + |
| 60 | +## Motivation |
| 61 | + |
| 62 | +`Results` are too small - see [issue][4012]. The current implementation of `Results` involves parsing from disk and |
| 63 | +storing as part of the `Termination Message` which has a limit of 4KB per `Container` and 12KB per `Pod`. As such, |
| 64 | +the size limit of `Results` is 12KB per `TaskRun` and 4KB per `Step` at best. |
| 65 | + |
| 66 | +To make matters worse, the limit is divided equally among all `Containers` in a `Pod` - see [issue][4808]. Therefore, |
| 67 | +the more the `Steps` in a `Task` the less the size limit for `Results`. For example, if there are 12 `Steps` then each |
| 68 | +has 1KB in `Termination Message` storage to produce `Results`. |
| 69 | + |
| 70 | +[TEP-0086][tep-0086] aims to support larger `Results`. [TEP-0086][tep-0086] has many [alternatives][tep-0086-alts] with |
| 71 | +no proposal because there's no obvious "best" solution that would meet all the requirements. |
| 72 | + |
| 73 | +This TEP proposes experimenting with `Sidecar` logs to support larger `Results` that are stored within `TaskRun` CRDs. |
| 74 | +This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other |
| 75 | +alternatives. |
| 76 | + |
| 77 | +In addition, the [documented guidance][docs] is that `Results` are used for outputs less than 1KB while `Workspaces` |
| 78 | +are used for larger data. Supporting larger `Results` up to the CRD limit allows users to reuse `Tasks` in more |
| 79 | +scenarios without having to change the specification to use `Workspaces` upon hitting the current low size limit of |
| 80 | +4KB per `TaskRun`. |
| 81 | + |
| 82 | +> As a general rule-of-thumb, if a `Result` needs to be larger than a kilobyte, you should likely use a `Workspace` |
| 83 | +to store and pass it between `Tasks` within a `Pipeline`. |
| 84 | + |
| 85 | +### Goals |
| 86 | + |
| 87 | +The main goal is to support larger `Results` that are limited by the size of a `TaskRun` CRD (1.5MB) via `Sidecar` logs. |
| 88 | + |
| 89 | +### Non-Goals |
| 90 | + |
| 91 | +The following are out of scope for this TEP: |
| 92 | + |
| 93 | +1. Solving use cases that requires really large `Results` beyond the size limit of a CRD (1.5MB). |
| 94 | + |
| 95 | +2. Addressing other [alternatives][tep-0086-alts] for larger `Results` that are listed in [TEP-0086][tep-0086]. |
| 96 | + |
| 97 | +### Use Cases |
| 98 | + |
| 99 | +1. Support signing `Results` using SPIRE for non-falsifiable provenance that's required for [SLSA L3][L3]. As described |
| 100 | + in [TEP-0089][tep-0089], the signatures and certificates used to verify `Results` are stored alongside the `Results` |
| 101 | + themselves in the `Termination Message` which further exacerbates the size limit issues. The size of the certificate |
| 102 | + is 800 bytes and the size of signatures is approximately 100 bytes per `Result`. |
| 103 | + |
| 104 | + > For now, signatures of the `Results` will be contained within the `Termination Message` of the `Pod`, alongside |
| 105 | + any additional material required to perform verification. One consideration of this is the size of the additional |
| 106 | + fields required. The size of the certificate needed for verification is about 800 bytes, and the size of the |
| 107 | + signatures is about 100 bytes * (number of `Results` + 1). The current `Termination Message` size is 4K, but there |
| 108 | + is TEP-0086 looking at supporting larger results. |
| 109 | + |
| 110 | +2. Support emitting structured `Results`. For example, store the images produced by `ko` and all their copies to the |
| 111 | + various regional registries in a `TaskRun`. The release `Pipeline` has this use case, and it ran into the current |
| 112 | + size limit of 4096 bytes. As described in the [issue][4282], the images produced were 4572 bytes which didn't fit |
| 113 | + into `Results`. For further details about structured `Results`, see [TEP-0075][tep-0075] and [TEP-0076][tep-0076]. |
| 114 | + |
| 115 | +## Proposal |
| 116 | + |
| 117 | +We propose using stdout logs from a dedicated `Sidecar` to return a json `Result` object to support larger `Results`. |
| 118 | +The `Pipeline` controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and |
| 119 | +append `Results` to the `TaskRun`. |
| 120 | + |
| 121 | +We propose injecting the dedicated `Sidecar` alongside other `Steps`. The `Sidecar` will watch the `Results` paths of |
| 122 | +the `Steps`. This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The |
| 123 | +`TaskRun` controller will access the stdout logs of the `Sidecar` then extract the `Results` and its contents. |
| 124 | + |
| 125 | +After the `Steps` have terminated, the `TaskRun` controller will write the `Results` from the logs of the `Sidecar` |
| 126 | +instead of using the `Termination Message`, hence bypassing the 4KB limit. This method keeps the rest of the current |
| 127 | +functionality identical and does not require any external storage mechanism. |
| 128 | + |
| 129 | +For further details, see the [demonstration][demo] of the [implementation][poc]. |
| 130 | + |
| 131 | +This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we continue to |
| 132 | +explore other alternatives, including those that involve external storage. |
| 133 | + |
| 134 | +### Feature Gate |
| 135 | + |
| 136 | +This feature will be gated using a `result-extraction-method` feature flag. |
| 137 | + |
| 138 | +Users have to set `result-extraction-method` to `"sidecar-logs"` to enable the feature: |
| 139 | +```shell |
| 140 | +kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"result-extraction-method":"sidecar-logs"}}' |
| 141 | +``` |
| 142 | + |
| 143 | +This feature is disabled by default. When disabled, `Results` continue to pass through `Termination Message`. |
| 144 | + |
| 145 | +### Logs Access |
| 146 | + |
| 147 | +This feature requires that the `Pipeline` controller has access to `Pod` logs. |
| 148 | + |
| 149 | +Users have to grant `get` access to all `pods/log` to the `Pipeline` controller: |
| 150 | +```shell |
| 151 | +kubectl apply -f config/enable-log-access-to-controller/ |
| 152 | +``` |
| 153 | + |
| 154 | +### Size Limit |
| 155 | + |
| 156 | +The size limit per `Result` can be configured using the `max-result-size` feature flag, which takes the integer value |
| 157 | +of the bytes. |
| 158 | + |
| 159 | +The `max-result-size` feature flag defaults to 4096 bytes. This ensures that we support existing `Tasks` with only one |
| 160 | +`Result` that uses up the whole size limit of the `Termination Message`. |
| 161 | + |
| 162 | +If users want to set the size limit per `Result` to be something other than 4096 bytes, they can set `max-result-size` |
| 163 | +by setting `max-result-size: <VALUE-IN-BYTES>`. The value set here cannot exceed the CRD size limit of 1.5MB. |
| 164 | + |
| 165 | +```shell |
| 166 | +kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}' |
| 167 | +``` |
| 168 | + |
| 169 | +Even though the size limit per `Result` is configurable, the size of `Results` are limited by CRD size limit of 1.5MB. |
| 170 | +If the size of `Results` exceeds this limit, then the `TaskRun` will fail with a message indicating the size limit has |
| 171 | +been exceeded. |
| 172 | + |
| 173 | +### PipelineRun Status |
| 174 | + |
| 175 | +In [TEP-0100][tep-0100], we proposed changes to `PipelineRun` status to reduce the amount of information stored about |
| 176 | +the status of `TaskRuns` and `Runs`. As a result, the `PipelineRun` status is set up to handle larger `Results` without |
| 177 | +exacerbating the storage issues that were there before. |
| 178 | + |
| 179 | +For `ChildReferences` to be populated, the `embedded-status` must be set to `"minimal"`. We recommend that the minimal |
| 180 | +embedded status - `ChildReferences` - is enabled while migration is ongoing until it becomes the only supported status. |
| 181 | +This will ensure that larger `Results` from all its `TaskRuns` will not bloat the `PipelineRun` CRD. |
| 182 | + |
| 183 | +## Design Details |
| 184 | + |
| 185 | +The `Sidecar` will run a binary that: |
| 186 | +- receives argument for `Results`' paths and names which are identified from `taskSpec.results` field. This allows the |
| 187 | + `Sidecar` to know the `Results` it needs to read. |
| 188 | +- has `/tekton/run` volume mounted as [read-only][4227] where status of each `Step` is written. |
| 189 | +- periodically checks for `Step` status in the path `/tekton/run`. |
| 190 | +- when all `Steps` have completed, it immediately parses all the `Results` in paths and prints them to stdout in a |
| 191 | + parsable pattern. |
| 192 | + |
| 193 | +For further details, see the [demonstration][demo] of the [implementation][poc]. |
| 194 | + |
| 195 | +## Design Evaluation |
| 196 | + |
| 197 | +### Reusability |
| 198 | + |
| 199 | +This proposal does not introduce any API changes to specification `Results`. The changes are in implementation details |
| 200 | +of `Results`. The existing `Tasks` will continue to function as they are, only that they can support larger `Results`. |
| 201 | + |
| 202 | +Even more, supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without |
| 203 | +having to change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`. |
| 204 | +This allows users to control execution as needed by their context without having to modify `Tasks` and `Pipelines`. |
| 205 | + |
| 206 | +### Interoperability |
| 207 | + |
| 208 | +Users may write `Tasks` that assume larger `Results` support. These `Tasks` would only work on *Tekton Pipelines* |
| 209 | +installations that are configured to support it. This is a risk to `Task` interoperability which is mitigated by: |
| 210 | +1. We still have a hard limit on the size of `Results` which is the CRD size limit. |
| 211 | +2. We still plan to support larger `Results` in the long run regardless of the implementation details. |
| 212 | + |
| 213 | +### Simplicity |
| 214 | + |
| 215 | +This proposal provides a simple solution that solves most use cases: |
| 216 | +- Users don't need additional infrastructure, such as server or object storage, to support larger `Results`. |
| 217 | +- Existing `Tasks` will continue to function as they do now, while supporting larger `Results`, without any API changes. |
| 218 | + |
| 219 | +### Performance |
| 220 | + |
| 221 | +Performance benchmarking with 20-30 `PipelineRuns`, each with 3 `TaskRuns` each with two `Steps`: |
| 222 | +- Average `Pipeline` controller's CPU difference during `PipelineRun` execution: 1% |
| 223 | +- Average `Pipeline` controller's Memory usage difference during `PipelineRun` execution: 0.2% |
| 224 | +- Average `Pod` startup time (time to get to running state) difference: 3s per `TaskRun` |
| 225 | + |
| 226 | +In the experiment, we will continue to measure the startup overhead and explore ways to improve it. |
| 227 | + |
| 228 | +For further details, see the [performance metrics][performance]. |
| 229 | + |
| 230 | +### Security |
| 231 | + |
| 232 | +This approach requires that the `Pipeline` controller has access to `Pod` logs. The `Pipeline` controller already has |
| 233 | +extensive permissions in the cluster, such as read access to `Secrets`. Expanding the access even further is a concern |
| 234 | +for some users, but is also acceptable for some users given the advantages. We will document the extended permissions |
| 235 | +so that users can make the right choice for their own use cases and requirements. |
| 236 | + |
| 237 | +When users have enabled larger `Results` via `Sidecar` logs, there's a validation check to ensure that users cannot |
| 238 | +inject their own `Sidecar` overtop of the one specified by Tekton. |
| 239 | + |
| 240 | +## Experiment Questions |
| 241 | + |
| 242 | +These are some questions we plan to answer in the experiment: |
| 243 | + |
| 244 | +- What impact does this change have on the startup and execution time of `TaskRuns` and `PipelineRuns`? Can we |
| 245 | + improve the performance impact? |
| 246 | + |
| 247 | +- How reliable is using `Sidecar` logs to process `Results`? |
| 248 | + |
| 249 | +- How many users adopt this solution? How many are satisfied with it given the advantages and disadvantages? |
| 250 | + We will conduct a user survey soon after the feature has been released. |
| 251 | + |
| 252 | +## References |
| 253 | + |
| 254 | +- Implementation: |
| 255 | + - [Implementation Pull Request][poc] |
| 256 | + - [Demonstration by Chitrang][demo] |
| 257 | +- Tekton Enhancement Proposals: |
| 258 | + - [TEP-0075: Object Parameters and Results][tep-0075] |
| 259 | + - [TEP-0076: Array Results][tep-0076] |
| 260 | + - [TEP-0086: Larger Results][tep-0086] |
| 261 | + - [TEP-0089: Non-Falsifiable Provenance][tep-0089] |
| 262 | + - [TEP-0100: PipelineRun Status][tep-0100] |
| 263 | +- Issues: |
| 264 | + - [Changing the way Results are stored][4012] |
| 265 | + - [Results, TerminationMessage and Containers][4808] |
| 266 | + - [Publish task fails, IMAGES result too large][4282] |
| 267 | +- Prior Work: |
| 268 | + - [TEP-0086: Using logs emitted by the Task][tep-0086-logs] |
| 269 | + - [TEP-0086: Larger Results via Sidecar Logs][745] |
| 270 | + - [TEP-0086 Working Group Meeting Notes][notes] |
| 271 | + - [Tekton Data Interface - Problem Space][data-interface] |
| 272 | + |
| 273 | +[docs]: https://tekton.dev/docs/pipelines/tasks/#emitting-results |
| 274 | +[4012]: https://github.com/tektoncd/pipeline/issues/4012 |
| 275 | +[4808]: https://github.com/tektoncd/pipeline/issues/4808 |
| 276 | +[4282]: https://github.com/tektoncd/pipeline/issues/4282 |
| 277 | +[4227]: https://github.com/tektoncd/pipeline/issues/4227 |
| 278 | +[745]: https://github.com/tektoncd/community/pull/745 |
| 279 | +[L3]: https://slsa.dev/spec/v0.1/levels |
| 280 | +[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292 |
| 281 | +[demo]: https://drive.google.com/file/d/1NrWudE_XBqweomiY24DP2Txnl1yN0yD9/view |
| 282 | +[poc]: https://github.com/tektoncd/pipeline/pull/5695 |
| 283 | +[performance]: https://github.com/tektoncd/community/pull/745#issuecomment-1206668381 |
| 284 | +[tep-0075]: ./0075-object-param-and-result-types.md |
| 285 | +[tep-0076]: ./0076-array-result-types.md |
| 286 | +[tep-0086]: ./0086-changing-the-way-result-parameters-are-stored.md |
| 287 | +[tep-0086-alts]: ./0086-changing-the-way-result-parameters-are-stored.md#alternatives |
| 288 | +[tep-0086-logs]: ./0086-changing-the-way-result-parameters-are-stored.md#using-logs-emitted-by-the-task |
| 289 | +[tep-0089]: ./0089-nonfalsifiable-provenance-support.md |
| 290 | +[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md |
| 291 | +[notes]: https://docs.google.com/document/d/1z2ME1o_XHvqv6cVEeElljvqVqHV8-XwtXdTkNklFU_8/edit?usp=sharing |
| 292 | +[data-interface]: https://docs.google.com/document/d/1XbeI-_4bFaBize3adQmSL6Czo8MXVvoYn6dJgFZJ_So/edit?usp=sharing |
0 commit comments