|
| 1 | +# Application Specific Staging Resources |
| 2 | + |
| 3 | +* **Original Author(s):**: @rix0rrr |
| 4 | +* **Tracking Issue**: #513 |
| 5 | +* **API Bar Raiser**: - |
| 6 | + |
| 7 | +Currently, to deploy any interesting applications the CDK requires an account to be bootstrapped: it requires the |
| 8 | +provisioning of roles and staging resources to hold "assets" (files and Docker images) before any application can |
| 9 | +be deployed. |
| 10 | + |
| 11 | +If those staging resources could be created as part of a normal application deployment, the requirement to precreate |
| 12 | +those resources is dropped. Users can choose to provision roles if they want to enable CI/CD or cross-account |
| 13 | +deployments, or they can choose not to bootstrap at all if they want to use existing credentials. |
| 14 | + |
| 15 | +## A brief history of synthesizers and bootstrapping |
| 16 | + |
| 17 | +The AWS CDK needs some infrastructure to deploy applications into an account and region. What supporting resources exist |
| 18 | +and what their names are is a contract between the CDK application and the AWS account. "Synthesizers" are the part of |
| 19 | +a CDK application that encode this contract: users prepare their account a certain way, and then pick a synthesizer |
| 20 | +that matches the resources they have provisioned (optionally configuring it with non-default parameters). Synthesizers |
| 21 | +were introduced in CDKv2; before that, there was only "the" default assumptions that the CDK would make about "the" |
| 22 | +account, and none of it was configurable. |
| 23 | + |
| 24 | +The process of preparing an AWS account to be used with a synthesizer is called "bootstrapping". |
| 25 | + |
| 26 | +### V1 |
| 27 | + |
| 28 | +In the original bootstrapping stack, we create an S3 bucket to hold files: large CloudFormation templates and assets |
| 29 | +such as Lambda code. ECR repositories are created on-demand by the CLI, if Docker images needed to be uploaded. |
| 30 | +Originally, we added in a Custom Resource to the template that would clean up the ECR repository when the Stack gets |
| 31 | +cleaned up. In 1.21.0, we removed this, and now leave cleanup of dynamically created ECR repositories to users. Asset |
| 32 | +locations are completely controlled by the CLI via parameters. |
| 33 | + |
| 34 | +All deployments are being done with the credentials of the user that runs the CLI. |
| 35 | + |
| 36 | +DOWNSIDES |
| 37 | + |
| 38 | +* Assets take up template parameters, of which there is a limited amount (~50 when we built this system) |
| 39 | +* The dynamism and arbitrary ECR repo creation does not work well in CI/CD systems. |
| 40 | +* The user must have CLI credentials for each account they want to deploy to, and if a single app deployment should |
| 41 | + go into multiple accounts they must selectively deploy stacks into different accounts using different sets of |
| 42 | + credentials. |
| 43 | + |
| 44 | +### V2 |
| 45 | + |
| 46 | +The bootstrap resources were redesigned as part of the development of CDK Pipelines, an opinionated construct that |
| 47 | +allows trivial deployment of any number of CDK stacks to any number of accounts and regions. The design was designed to |
| 48 | +work for the CLI, a CodePipeline-based solution, as well as other CI/CD solutions in general. It also allows |
| 49 | +cross-region deployments. |
| 50 | + |
| 51 | +To that end, the bootstrap stack now creates (for each account and region combination): |
| 52 | + |
| 53 | +* A single S3 bucket and single ECR repository with well-known names (that need to be reflected in the CDK app if they are non-standard). |
| 54 | +* An encryption key for the S3 bucket |
| 55 | +* An Execution Role for the CloudFormation deployment |
| 56 | +* A role to trigger the deployment, a role to write to the S3 bucket, a role to write to the ECR repository |
| 57 | +* A role to look up context in the account |
| 58 | +* An SSM parameter with a version number of the bootstrap stack |
| 59 | + |
| 60 | +This solution solves for the CI/CD and cross-environment deployments by pre-provisioned roles, and removes |
| 61 | +the need for parameters by rendering the location of each asset directly into the template. |
| 62 | + |
| 63 | +DOWNSIDES |
| 64 | + |
| 65 | +* Some users don’t like the pre-provisioned roles and prefer the v1 situation where their existing credentials were used |
| 66 | + for permissions. |
| 67 | +* A common complaint about the bootstrap stack is that the resources we create by default do not comply with a given |
| 68 | + corporate policy, followed by an endless stream of feature requests to add this-and-that feature to the bootstrap |
| 69 | + stack (block public access, block SSL, tag immutability, image scanning, etc. etc). We solve this by telling customers |
| 70 | + to take the bootstrap template and customize it themselves, but CloudFormation templates can’t be patched simply and |
| 71 | + this requires users to effectively “fork” our bootstrap stack and manually keep it up-to-date with incoming changes. |
| 72 | +* Because all staging resources need to be provisioned a priori and need to serve all types of applications, we can't |
| 73 | + depend on application knowledge. Specifically, we won't know how many Docker images will be used in the application, |
| 74 | + so we create a single ECR repository to hold all images. This has a number of downsides: |
| 75 | + * Docker caching relies on pulling the “latest” image from a repository and skipping layers that were already built. |
| 76 | + This doesn’t work if images built off of various different Dockerfiles are in the same repository. |
| 77 | + * Lifecycle policies cannot be used because different images from potentially different applications with very |
| 78 | + different life cycles are all in the same repository. The same was already true for S3, but the problem is |
| 79 | + less severe because S3 is pretty cheap while ECR is not. |
| 80 | + * Some people were using the V1 Docker image publishing mechanism not as a vehicle for uploading Docker images to be used |
| 81 | + by the CDK’s CloudFormation deployment, but simply as a mechanism for building and publishing Docker images, to be |
| 82 | + used by a completely different deployment later. The lack of control over the target ECR repository breaks this |
| 83 | + use case (required the development of an `aws-ecr-deployments` construct module, which does give the necessary |
| 84 | + control but racks up costs by doubling ECR storage requirements, and still does not allow staging resource cleanup). |
| 85 | + * We always create an empty ECR repository because we cannot know whether apps deployed into the account will need |
| 86 | + it or not, so the ECR repository may go unused. AWS Security Hub will throw warnings about empty ECR repositories, |
| 87 | + which makes customers uneasy. |
| 88 | +* Bootstrap stacks are expected to be account-wide, and mix assets from all applications. Some customers that deploy |
| 89 | + multiple applications into the same account are very sensitive to this mixing, and would rather keep these resources |
| 90 | + separate. They can do multiple bootstrap stacks in the same account, but this is all a bit onerous. |
| 91 | + |
| 92 | +## A new proposal: application specific staging resources |
| 93 | + |
| 94 | +The bootstrap stack contains two classes of resources: staging resources, which hold assets (bucket and ECR repo), and |
| 95 | +roles, which allow for unattended (CI/CD) and cross-account access. In the new proposal, we will separate out the |
| 96 | +staging resources from the roles. Roles will still be bootstrapped (if used), but staging resources will not. |
| 97 | + |
| 98 | +* Staging resources will be created on a per-CDK app basis. We will create one S3 bucket with different object prefixes |
| 99 | + for different types of assets (see Appendix A: two types of assets), and an ECR repository per Docker image. Resource |
| 100 | + access roles can also be created on an as-needed basis. This solves the problems of asset resources of different |
| 101 | + applications mixing together, and it would also remove the need for garbage collection by allowing use of life cycle |
| 102 | + rules. |
| 103 | +* Since the roles are now the only things that need to be bootstrapped, that will have a number of advantages: |
| 104 | + * Bootstrapping will be faster since the heavy resource of a KMS key is no longer involved. |
| 105 | + * Because roles are a global resource, every account now only needs to be bootstrapped once. First of all the lack |
| 106 | + of necessary control of regions will work a lot better with Control Tower+automatic Stack Sets (which does not |
| 107 | + allow region control). |
| 108 | + |
| 109 | +If we can make the bootstrapping resources part of the CDK application, then users now have a familiar way to customize |
| 110 | +them to their heart’s content, so the treadmill of bootstrap stack customization requests is going to disappear, and |
| 111 | +customers will also not need to customize the bootstrap template anymore (assuming their customizations have to do with |
| 112 | +the resources instead of the roles). |
| 113 | + |
| 114 | +A downside is potentially that we lose the ability to have a version number on the bootstrapped resources (because SSM |
| 115 | +is not global), but we might say that’s not necessary anymore since the Roles are unlikely to change often. |
| 116 | + |
| 117 | +> If we wanted to maintain versioning on the Roles, we could say that the stack always must be deployed in `us-east-1` |
| 118 | +> and that’s where we look for the version; however, this may require cross-internet traffic and therefore be considered |
| 119 | +> dodgy from a reliability perspective, and we could only do the versioning check using the CLI, not from the |
| 120 | +> CloudFormation template. Of course we’ll have to pick the correct leader region per partition, `aws-cn`, `aws-iso`, etc. |
| 121 | +
|
| 122 | +### How it will work in practice |
| 123 | + |
| 124 | +Bootstrapping resources are currently designed the way they are because the CLI relies on the assumption that the |
| 125 | +bootstrap resources are present with a well-known name, before the first CloudFormation deployment starts. In other |
| 126 | +words, this is purely a limitation of the orchestration, that we can take away. |
| 127 | + |
| 128 | +Here’s what we’re going to do: |
| 129 | + |
| 130 | +* We will introduce a new Stack Synthesizer, called `AppStagingSynthesizer`. |
| 131 | +* This synthesizer will create a support stack with the bucket, and an ECR repository per Docker image. |
| 132 | +* Assets will have a dependency on the support stack. This is a new concept that doesn’t currently exist because assets |
| 133 | + are an orchestration artifact that looks independent like stacks are, but they aren't really: in practice the orchestration |
| 134 | + ignores everything except stacks, and treats assets as being part of a stack. |
| 135 | + * Docker assets may still be built before the first deployment (although for proper caching we need the repository |
| 136 | + to exist first), but will only be uploaded when it’s their time in the orchestration workflow. |
| 137 | +* For a minimal diff these resources could have fixed names, but we could add support for Stack Outputs and assets could |
| 138 | + have support for Parameters, so that we can thread generated bucket and repository names through the system. For now, |
| 139 | + we will do fixed names for the staging resources. |
| 140 | + |
| 141 | +### What the API looks like |
| 142 | + |
| 143 | +To use the new synthesizer: |
| 144 | + |
| 145 | +```ts |
| 146 | +import { AppStagingSynthesizer } from '@aws-cdk/app-staging-synthesizer'; |
| 147 | + |
| 148 | +const app = new App({ |
| 149 | + defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({ |
| 150 | + appId: 'my-app-id', // put a unique id here |
| 151 | + deploymentIdentities: DeploymentIdentities.defaultBootstrapRoles({ bootstrapRegion: 'us-east-1' }), |
| 152 | + |
| 153 | + // How long to keep File and Docker assets around for rollbacks (without requiring resynth) |
| 154 | + deployTimeFileAssetLifetime: Duration.days(100), |
| 155 | + imageAssetVersionCount: 10, |
| 156 | + }), |
| 157 | +}); |
| 158 | +``` |
| 159 | + |
| 160 | +For any additional customization (such as using custom buckets or ECR repositories), `DefaultStagingStack` |
| 161 | +can be subclasses or a full reimplementation of `IStagingResources` can be provided: |
| 162 | + |
| 163 | +```ts |
| 164 | +class MyStagingStack extends DefaultStaginStack { |
| 165 | + private bucket?: s3.Bucket; |
| 166 | + |
| 167 | + public addFile(asset: FileAssetSource): FileStagingLocation { |
| 168 | + this.getCreateBucket(); |
| 169 | + |
| 170 | + return { |
| 171 | + bucketName: 'my-asset-bucket',, |
| 172 | + dependencyStack: this, |
| 173 | + }; |
| 174 | + } |
| 175 | + |
| 176 | + private createOrGetBucket() { |
| 177 | + if (!this.bucket) { |
| 178 | + this.bucket = new s3.Bucket(this, 'Bucket', { |
| 179 | + bucketName: 'my-asset-bucket', |
| 180 | + }); |
| 181 | + } |
| 182 | + return this.bucket; |
| 183 | + } |
| 184 | +} |
| 185 | + |
| 186 | +const app = new App({ |
| 187 | + defaultStackSynthesizer: AppStagingSynthesizer.customFactory({ |
| 188 | + factory: { |
| 189 | + obtainStagingResources(stack, context) { |
| 190 | + const myApp = App.of(stack); |
| 191 | + return new MyStagingStack(myApp, `CustomStagingStack-${context.environmentString}`, {}); |
| 192 | + }, |
| 193 | + }, |
| 194 | + }), |
| 195 | +}); |
| 196 | +``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +Ticking the box below indicates that the public API of this RFC has been |
| 201 | +signed-off by the API bar raiser (the `api-approved` label was applied to the |
| 202 | +RFC pull request): |
| 203 | + |
| 204 | +``` |
| 205 | +[ ] Signed-off by API Bar Raiser @xxxxx |
| 206 | +``` |
| 207 | + |
| 208 | +## Public FAQ |
| 209 | + |
| 210 | +### What are we launching today? |
| 211 | + |
| 212 | +We are launching a new synthesizer that has fewer demands on the AWS account that CDK apps are deployed into. It only |
| 213 | +needs preprovisioned Roles, and those are only necessary for CI/CD deployments or for cross-account deployments. For |
| 214 | +same-account, CLI deployments no bootstrapping is necessary anymore. If you are using bootstrapped roles anyway, |
| 215 | +they only need to be provisioned in one region, making it easier to use with StackSets. |
| 216 | + |
| 217 | +The new staging resources are specific to an application and can be cleaned up alongside the application. In addition, |
| 218 | +the way the staging resources are structured, they now allow the use of lifecycle rules, keeping costs down for |
| 219 | +running CDK applications over a long period of time. |
| 220 | + |
| 221 | +### Why should I use this feature? |
| 222 | + |
| 223 | +You should use this feature if you: |
| 224 | + |
| 225 | +- Want to take advantage of lifecycle rules on asset staging resources; |
| 226 | +- Do not use ECR and don't want to see the SecurityHub warning that tells you you have an empty ECR repository; |
| 227 | +- Need to deploy to multiple regions in a set of accounts and want to use StackSets to bootstrap the accounts; |
| 228 | +- Want to deploy an application and remove it and be sure that the assets have been cleaned up as well; |
| 229 | + |
| 230 | +## Internal FAQ |
| 231 | + |
| 232 | +### Why should we _not_ do this? |
| 233 | + |
| 234 | +Users generally don't appreciate change, especially if it saddles them with busywork. While the migration path will be |
| 235 | +purely optional, and there are definite benefits to be had, synthesis+bootstrapping is already a sore spot for users |
| 236 | +(it’s hard to explain and therefore a bit under-documented) and introducing more churn may lead to backlash. |
| 237 | + |
| 238 | +### What is the high-level project plan? |
| 239 | + |
| 240 | +- We will release the new synthesizer as an optional feature, first initially only for the CLI. |
| 241 | +- CDK Pipelines support can be added later. When Pipelines support is added, it should be taken into |
| 242 | + account that the time interval between stage deployments may be significant, especially if it involves manual |
| 243 | + approval steps. We must take care that the docker images published to the Testing stage are not rebuilt for |
| 244 | + the Production stage, but are replicated. |
| 245 | +- We have to clearly explain the concept of Synthesizers, the account contract, and Bootstrapping, along with the choices |
| 246 | + users have and how they should navigate them in the Developer Guide. |
| 247 | +- Customization by subclassing is possible, but we will probably have to selectively expose some protected helper |
| 248 | + functions to make it more convenient. We will do that when feature requests start coming in. |
| 249 | +- After a tryout period, we will move the synthesizer into the core library and document it as a possible alternative |
| 250 | + in the developer guide, and we will probably vend a bootstrap template specifically for this synthesizer. |
| 251 | + |
| 252 | +### New bootstrap template |
| 253 | + |
| 254 | +By introducing a new template, we technically have an opportunity to rename roles and get rid of the `hnb659fds` |
| 255 | +identifier that customers hate. However, to make the migration from the current bootstrap stack as smooth as possible, |
| 256 | +we probably should NOT be taking this opportunity and just keep the same role names. |
| 257 | + |
| 258 | +The new bootstrap template will contain exactly the **CloudFormation Execution Role**, **Deployment Role**, and **Lookup Role** |
| 259 | +from the current template, and nothing else. |
| 260 | + |
| 261 | +We can put a version on it for informational purposes, but that version will not be checkable by CloudFormation deployments; |
| 262 | +perhaps it could be make checkable by the CLI during `cdk deploy` time. At least `cdk bootstrap` will be able to look at the |
| 263 | +version to prevent downgrading. |
| 264 | + |
| 265 | +The bootstrap template will be selected by either running `cdk bootstrap` in an app directory that uses the `AppStagingSynthesizer`, |
| 266 | +or passing a command-line flag to CDK bootstrap: `cdk bootstrap --synthesizer=[legacy|default|appstaging]`. If `cdk bootstrap` detects |
| 267 | +it is changing the "type" of bootstrap stack, it will throw up a confirmation prompt with an explanation of the consequences: |
| 268 | + |
| 269 | +``` |
| 270 | +$ cdk bootstrap --synthesizer=appstaging |
| 271 | +This operation will change the style of bootstrap stack from "default" version 18 to "appstaging" version 1. |
| 272 | +This bootstrap stack style has been designed for the AppStagingSynthesizer; make sure that you are using that synthesizer |
| 273 | +in the CDK apps you plan to deploy to this environment. For more information, see http://amzn.to/5vjQYrtejA. |
| 274 | +Continue (y/N)? |
| 275 | +``` |
| 276 | + |
| 277 | +### Are there any open issues that need to be addressed later? |
| 278 | + |
| 279 | +- The template for the staging resources stack must be small enough to fit into a CloudFormation API call, which means |
| 280 | + it may not exceed 50kB. Since every ECR repository will add to this size, we have to limit the count. We may need |
| 281 | + to create multiple stacks using an overflow strategy to lift this limit. |
| 282 | + |
| 283 | +## Appendix A: two types of assets |
| 284 | + |
| 285 | +There are two types of assets: |
| 286 | + |
| 287 | +* “Handoff” assets: these are temporarily put somewhere, so that in the course of a service call we can point to them. |
| 288 | + The service will make their own copy of these assets. For example, large CloudFormation templates and Lambda Code |
| 289 | + bundles are an example of this: the CloudFormation template will only read the template once during the deployment, |
| 290 | + and the Lambda service will make a private copy of the S3 file. |
| 291 | + * Rollbacks by means of a pure-CloudFormation deployment (so not fresh deployment that involves a CLI call) may |
| 292 | + require presence of the old handoff asset for a while, so it shouldn’t be deleted right away, but it is reasonable |
| 293 | + to put a lifecycle policy on handoff assets, equal to the longest period of time a user should still reasonably |
| 294 | + expect to want to do a rollback in (see the BONES sev2 and damage control campaign from a couple of years ago when |
| 295 | + the BONES team decided a month was a reasonable period and some service team wanted to roll back to a version of 2 |
| 296 | + months old). |
| 297 | +* “Live” assets: these get continuously accessed in their staged location by the running application. Examples are ALL |
| 298 | + Docker images (ECS will constantly pull from the user’s ECR container, and never make their own copy), and some |
| 299 | + asset-assisted conveniences like CodeBuild shellables or CFN-init scripts. |
| 300 | + * These can in principle only be garbage collected by mark-and-sweep: we must know they are not needed by any |
| 301 | + current CDK stacks, nor by any CDK stack revisions the user might want to roll back to. |
| 302 | + * However, for ECR images we can do slightly better: since we have an ECR repository per docker image per |
| 303 | + application, we can use a lifecycle policy of the form “keep only the most recent 5 images.” |
| 304 | + * That leaves only certain eccentric types of file assets which are not collectible (until the entire application |
| 305 | + gets deleted). This might be a “good enough” position to be in. |
0 commit comments