|
| 1 | +--- |
| 2 | +status: implementable |
| 3 | +title: Larger Results via Sidecar Logs |
| 4 | +creation-date: '2022-11-30' |
| 5 | +last-updated: '2022-11-30' |
| 6 | +authors: |
| 7 | +- '@chitrangpatel' |
| 8 | +- '@jerop' |
| 9 | +--- |
| 10 | + |
| 11 | +# TEP-0127: Larger Results via Sidecar Logs |
| 12 | + |
| 13 | +<!-- toc --> |
| 14 | +- [Summary](#summary) |
| 15 | +- [Motivation](#motivation) |
| 16 | + - [Goals](#goals) |
| 17 | + - [Non-Goals](#non-goals) |
| 18 | + - [Use Cases](#use-cases) |
| 19 | +- [Proposal](#proposal) |
| 20 | + - [Feature Gate](#feature-gate) |
| 21 | + - [Logs Access](#logs-access) |
| 22 | + - [Size Limit](#size-limit) |
| 23 | + - [Script in `Sidecar`](#script-in-sidecar) |
| 24 | + - [`PipelineRun` Status](#pipelinerun-status) |
| 25 | +- [Design Evaluation](#design-evaluation) |
| 26 | + - [Reusability](#reusability) |
| 27 | + - [Simplicity](#simplicity) |
| 28 | + - [Performance](#performance) |
| 29 | + - [Security](#security) |
| 30 | +- [References](#references) |
| 31 | +<!-- /toc --> |
| 32 | + |
| 33 | +This TEP builds on the hard work of many people who have been tackling the problem over the past couple years, |
| 34 | +including but not limited to: |
| 35 | +- '@tlawrie' |
| 36 | +- '@imjasonh' |
| 37 | +- '@bobcatfish' |
| 38 | +- '@pritidesai' |
| 39 | +- '@tomcli' |
| 40 | +- '@ScrapCodes' |
| 41 | +- '@wlynch' |
| 42 | + |
| 43 | +## Summary |
| 44 | + |
| 45 | +Today, `Results` have a size limit of 4KB per `Step` and 12KB per `TaskRun` in the best case - see [issue][4012]. |
| 46 | + |
| 47 | +The goal of [TEP-0086][tep-0086] is to support larger `Results` beyond the current size limits. TEP-0086 has many |
| 48 | +alternatives but no proposal. This TEP is proposes experimenting with one of the alternatives - `Sidecar` logs. |
| 49 | +This allows us to support larger `Results` that are stored within `TaskRun` CRDs. |
| 50 | + |
| 51 | +## Motivation |
| 52 | + |
| 53 | +`Results` are too small - see [issue][4012]. The current implementation of `Results` involves parsing from disk and |
| 54 | +storing as part of the `Termination Message` which has a limit of 4KB per `Container` and 12KB per `Pod`. As such, |
| 55 | +the size limit of `Results` is 12KB per `TaskRun` and 4KB per `Step` at best. |
| 56 | + |
| 57 | +To make matters worse, the limit is divided equally among all `Containers` in a `Pod` - see [issue][4808]. Therefore, |
| 58 | +the more the `Steps` in a `Task` the less the size limit for `Results`. For example, if there are 12 `Steps` then each |
| 59 | +has 1KB in `Termination Message` storage to produce `Results`. |
| 60 | + |
| 61 | +[TEP-0086][tep-0086] aims to support larger `Results`. [TEP-0086][tep-0086] has many [alternatives][tep-0086-alts] with |
| 62 | +no proposal because there's no obvious "best" solution that would meet all the requirements. |
| 63 | + |
| 64 | +This TEP proposes experimenting with `Sidecar` logs to support larger `Results` that are stored with `TaskRun` CRDs. |
| 65 | +This allows us to provide an immediate solution to the urgent needs of users, while not blocking pursuit of the other |
| 66 | +alternatives. |
| 67 | + |
| 68 | +### Goals |
| 69 | + |
| 70 | +The main goal is to support larger `Results` that are limited by the size of a `TaskRun` CRD (1.5MB) via `Sidecar` logs. |
| 71 | + |
| 72 | +### Non-Goals |
| 73 | + |
| 74 | +The following are out of scope for this TEP: |
| 75 | + |
| 76 | +1. Solving use cases that requires really large `Results` beyond the size limit of a CRD (1.5MB). |
| 77 | + |
| 78 | +2. Addressing other [alternatives][tep-0086-alts] for larger `Results` that are listed in [TEP-0086][tep-0086]. |
| 79 | + |
| 80 | +### Use Cases |
| 81 | + |
| 82 | +1. Store larger `Results`, such as JSON payloads from an HTTP call or a Cloud Event, that can be inspected later or |
| 83 | + passed to other `Tasks` without the need to understand storage mechanisms. |
| 84 | + |
| 85 | +3. The [documented guidance][docs] is that `Results` are used for outputs less than 1KB while `Workspaces` are used |
| 86 | + for larger data. |
| 87 | + |
| 88 | + > As a general rule-of-thumb, if a `Result` needs to be larger than a kilobyte, you should likely use a `Workspace` |
| 89 | + to store and pass it between `Tasks` within a `Pipeline`. |
| 90 | + |
| 91 | + Supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without having to |
| 92 | + change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`. |
| 93 | + |
| 94 | +## Proposal |
| 95 | + |
| 96 | +We propose using stdout logs from a dedicated `Sidecar` to return a json `Result` object to support larger `Results`. |
| 97 | +The `Pipeline` controller would wait for the `Sidecar` to exit and then read the logs based on a particular query and |
| 98 | +append `Results` to the `TaskRun`. |
| 99 | + |
| 100 | +We propose injecting the dedicated `Sidecar` alongside other `Steps`. The `Sidecar` will watch the `Results` paths of |
| 101 | +the `Steps`. This `Sidecar` will output the name of the `Result` and its contents to stdout in a parsable pattern. The |
| 102 | +`TaskRun` controller will access the stdout logs of the `Sidecar` then extract the `Results` and its contents. |
| 103 | + |
| 104 | +After the `Steps` have terminated, the `TaskRun` controller will write the `Results` from the logs of the `Sidecar` |
| 105 | +instead of using the `Termination Message`, hence bypassing the 4KB limit. This method keeps the rest of the current |
| 106 | +functionality identical and does not require any external storage mechanism. |
| 107 | + |
| 108 | +For further details, see the [demonstration][demo] and [proof of concept][poc]. |
| 109 | + |
| 110 | +This provides an opportunity to experiment with this solution to provide `Results` within the CRDs as we continue to |
| 111 | +explore other alternatives, including those that involve external storage. |
| 112 | + |
| 113 | +### Feature Gate |
| 114 | + |
| 115 | +This feature will be gated using a `result-extraction-method` feature flag. |
| 116 | + |
| 117 | +Users have to set `result-extraction-method` to `"sidecar-logs"` to enable the feature: |
| 118 | +```shell |
| 119 | +kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"result-extraction-method":"sidecar-logs"}}' |
| 120 | +``` |
| 121 | + |
| 122 | +This feature is disabled by default. When disabled, `Results` continue to pass through `Termination Message`. |
| 123 | + |
| 124 | +### Logs Access |
| 125 | + |
| 126 | +This feature requires that the `Pipeline` controller has access to `Pod` logs. |
| 127 | + |
| 128 | +Users have to grant `get` access to all `pods/log` to the `Pipeline` controller: |
| 129 | +```shell |
| 130 | +kubectl apply -f config/enable-log-access-to-controller/ |
| 131 | +``` |
| 132 | + |
| 133 | +### Size Limit |
| 134 | + |
| 135 | +The size limit per `Result` can be `max-result-size` feature flag, which takes the integer value of the bytes. |
| 136 | + |
| 137 | +The `max-result-size` feature flag defaults to 4096 bytes. This ensures that we support existing `Tasks` with only one |
| 138 | +`Result` that uses up the whole size limit of the `Termination Message`. |
| 139 | + |
| 140 | +If users want to set the size limit per `Result` to be something other than 4096 bytes, they can set `max-result-size` |
| 141 | +by setting `max-result-size: <VALUE-IN-BYTES>`. The value set here cannot exceed the CRD size limit of 1.5MB. |
| 142 | + |
| 143 | +```shell |
| 144 | +kubectl patch cm feature-flags -n tekton-pipelines -p '{"data":{"max-result-size":"<VALUE-IN-BYTES>"}}' |
| 145 | +``` |
| 146 | + |
| 147 | +Even though the size limit per `Result` is configurable, the size of `Results` are limited by CRD size limit of 1.5MB. |
| 148 | +If the size of `Results` exceeds this limit, then the `TaskRun` will fail with a message indicating the size limit has |
| 149 | +been exceeded. |
| 150 | + |
| 151 | +### Script in `Sidecar` |
| 152 | + |
| 153 | +The `Sidecar` will run a script that: |
| 154 | +- is auto-generated based on the required `Results` - identified from `taskSpec.results` field. |
| 155 | +- periodically checks for files in the directory `/tekton/results`. |
| 156 | +- when a `Result` is found, it prints it to stdout in a parsable pattern. |
| 157 | +- when all the expected `Results` are found, it breaks out of the periodic loop and exits. |
| 158 | + |
| 159 | +For further details, see the [demonstration][demo] of the proof of concept. |
| 160 | + |
| 161 | +### `PipelineRun` Status |
| 162 | + |
| 163 | +In [TEP-0100][tep-0100], we proposed changes to `PipelineRun` status to reduce the amount of information stored about |
| 164 | +the status of `TaskRuns` and `Runs` to improve performance and reduce memory bloat. The `PipelineRun` status is set up |
| 165 | +to handle larger `Results` without exacerbating the performance and storage issues that were there before. |
| 166 | + |
| 167 | +For `ChildReferences` to be populated, the `embedded-status` must be set to `"minimal"`. We recommend that the minimal |
| 168 | +embedded status - `ChildReferences` - is enabled while migration is ongoing until it becomes the only supported status. |
| 169 | + |
| 170 | +## Design Evaluation |
| 171 | + |
| 172 | +### Reusability |
| 173 | + |
| 174 | +This proposal does not introduce any API changes to specification `Results`, the changes are in implementation details |
| 175 | +of `Results`. The existing `Tasks` will continue to function as they are, only that they can support larger `Results`. |
| 176 | + |
| 177 | +Even more, this supporting larger `Results` upto the CRD limit allows users to reuse `Tasks` in more scenarios without |
| 178 | +having to change the specification to use `Workspaces` upon hitting the current low size limit of 4KB per `TaskRun`. |
| 179 | +This allows users to control execution as needed by their context without having to modify `Tasks` and `Pipelines`. |
| 180 | + |
| 181 | +### Simplicity |
| 182 | + |
| 183 | +This proposal provides a simple solution that solves most use cases: |
| 184 | +- Users don't need additional infrastructure, such as server or object storage, to support larger `Results`. |
| 185 | +- Existing `Tasks` will continue to function as they are, while supporting larger `Results`, without any API changes. |
| 186 | + |
| 187 | +### Performance |
| 188 | + |
| 189 | +Performance benchmarking with 20-30 `PipelineRuns`, each with 3 `TaskRuns` each with two `Steps`: |
| 190 | +- Average `Pipeline` controller's CPU difference during `PipelineRun` execution: 1% |
| 191 | +- Average `Pipeline` controller's Memory usage difference during `PipelineRun` execution: 0.2% |
| 192 | +- Average `Pod` startup time (time to get to running state) difference: 3s per `TaskRun` |
| 193 | + |
| 194 | +In the experiment, we will continue to measure the startup overhead and explore ways to improve it. |
| 195 | + |
| 196 | +For further details, see the [performance metrics][performance]. |
| 197 | + |
| 198 | +### Security |
| 199 | + |
| 200 | +This approach requires that the `Pipeline` controller has access to `Pod` logs. The `Pipeline` controller already has |
| 201 | +extensive permissions in the cluster, such as read access to `Secrets`. Expanding the access even further is a concern |
| 202 | +for some users, but is also acceptable for some users given the advantages. We will document the extended permissions |
| 203 | +so that users can make the right choice for their own use cases and requirements. |
| 204 | + |
| 205 | +## References |
| 206 | + |
| 207 | +- [TEP-0086: Larger Results][tep-0086] |
| 208 | +- [TEP-0086: Larger Results via Sidecar Logs][745] |
| 209 | + |
| 210 | +[docs]: https://tekton.dev/docs/pipelines/tasks/#emitting-results |
| 211 | +[4012]: https://github.com/tektoncd/pipeline/issues/4012 |
| 212 | +[4808]: https://github.com/tektoncd/pipeline/issues/4808 |
| 213 | +[tep-0086]: ./0086-changing-the-way-result-parameters-are-stored.md |
| 214 | +[tep-0086-alts]: ./0086-changing-the-way-result-parameters-are-stored.md#alternatives |
| 215 | +[crd-size]: https://github.com/kubernetes/kubernetes/issues/82292 |
| 216 | +[tep-0100]: ./0100-embedded-taskruns-and-runs-status-in-pipelineruns.md |
| 217 | +[demo]: https://drive.google.com/file/d/1NrWudE_XBqweomiY24DP2Txnl1yN0yD9/view |
| 218 | +[poc]: https://github.com/tektoncd/pipeline/pull/5695 |
| 219 | +[performance]: https://github.com/tektoncd/community/pull/745#issuecomment-1206668381 |
| 220 | +[745]: https://github.com/tektoncd/community/pull/745 |
0 commit comments