Skip to content

Commit 55d07e0

Browse files
authoredDec 19, 2023
RFC 513: Application Specific Staging Resources (#515)
Retrospective RFC #513 Not the best write-up, but it'll give us a location to start discussing. [Rendered version](https://github.com/aws/aws-cdk-rfcs/blob/huijbers/app-specific-bootstrapping/text/0513-app-specific-staging.md) --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
1 parent cefd2d7 commit 55d07e0

File tree

1 file changed

+305
-0
lines changed

1 file changed

+305
-0
lines changed
 

‎text/0513-app-specific-staging.md

+305
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# Application Specific Staging Resources
2+
3+
* **Original Author(s):**: @rix0rrr
4+
* **Tracking Issue**: #513
5+
* **API Bar Raiser**: -
6+
7+
Currently, to deploy any interesting applications the CDK requires an account to be bootstrapped: it requires the
8+
provisioning of roles and staging resources to hold "assets" (files and Docker images) before any application can
9+
be deployed.
10+
11+
If those staging resources could be created as part of a normal application deployment, the requirement to precreate
12+
those resources is dropped. Users can choose to provision roles if they want to enable CI/CD or cross-account
13+
deployments, or they can choose not to bootstrap at all if they want to use existing credentials.
14+
15+
## A brief history of synthesizers and bootstrapping
16+
17+
The AWS CDK needs some infrastructure to deploy applications into an account and region. What supporting resources exist
18+
and what their names are is a contract between the CDK application and the AWS account. "Synthesizers" are the part of
19+
a CDK application that encode this contract: users prepare their account a certain way, and then pick a synthesizer
20+
that matches the resources they have provisioned (optionally configuring it with non-default parameters). Synthesizers
21+
were introduced in CDKv2; before that, there was only "the" default assumptions that the CDK would make about "the"
22+
account, and none of it was configurable.
23+
24+
The process of preparing an AWS account to be used with a synthesizer is called "bootstrapping".
25+
26+
### V1
27+
28+
In the original bootstrapping stack, we create an S3 bucket to hold files: large CloudFormation templates and assets
29+
such as Lambda code. ECR repositories are created on-demand by the CLI, if Docker images needed to be uploaded.
30+
Originally, we added in a Custom Resource to the template that would clean up the ECR repository when the Stack gets
31+
cleaned up. In 1.21.0, we removed this, and now leave cleanup of dynamically created ECR repositories to users. Asset
32+
locations are completely controlled by the CLI via parameters.
33+
34+
All deployments are being done with the credentials of the user that runs the CLI.
35+
36+
DOWNSIDES
37+
38+
* Assets take up template parameters, of which there is a limited amount (~50 when we built this system)
39+
* The dynamism and arbitrary ECR repo creation does not work well in CI/CD systems.
40+
* The user must have CLI credentials for each account they want to deploy to, and if a single app deployment should
41+
go into multiple accounts they must selectively deploy stacks into different accounts using different sets of
42+
credentials.
43+
44+
### V2
45+
46+
The bootstrap resources were redesigned as part of the development of CDK Pipelines, an opinionated construct that
47+
allows trivial deployment of any number of CDK stacks to any number of accounts and regions. The design was designed to
48+
work for the CLI, a CodePipeline-based solution, as well as other CI/CD solutions in general. It also allows
49+
cross-region deployments.
50+
51+
To that end, the bootstrap stack now creates (for each account and region combination):
52+
53+
* A single S3 bucket and single ECR repository with well-known names (that need to be reflected in the CDK app if they are non-standard).
54+
* An encryption key for the S3 bucket
55+
* An Execution Role for the CloudFormation deployment
56+
* A role to trigger the deployment, a role to write to the S3 bucket, a role to write to the ECR repository
57+
* A role to look up context in the account
58+
* An SSM parameter with a version number of the bootstrap stack
59+
60+
This solution solves for the CI/CD and cross-environment deployments by pre-provisioned roles, and removes
61+
the need for parameters by rendering the location of each asset directly into the template.
62+
63+
DOWNSIDES
64+
65+
* Some users don’t like the pre-provisioned roles and prefer the v1 situation where their existing credentials were used
66+
for permissions.
67+
* A common complaint about the bootstrap stack is that the resources we create by default do not comply with a given
68+
corporate policy, followed by an endless stream of feature requests to add this-and-that feature to the bootstrap
69+
stack (block public access, block SSL, tag immutability, image scanning, etc. etc). We solve this by telling customers
70+
to take the bootstrap template and customize it themselves, but CloudFormation templates can’t be patched simply and
71+
this requires users to effectively “fork” our bootstrap stack and manually keep it up-to-date with incoming changes.
72+
* Because all staging resources need to be provisioned a priori and need to serve all types of applications, we can't
73+
depend on application knowledge. Specifically, we won't know how many Docker images will be used in the application,
74+
so we create a single ECR repository to hold all images. This has a number of downsides:
75+
* Docker caching relies on pulling the “latest” image from a repository and skipping layers that were already built.
76+
This doesn’t work if images built off of various different Dockerfiles are in the same repository.
77+
* Lifecycle policies cannot be used because different images from potentially different applications with very
78+
different life cycles are all in the same repository. The same was already true for S3, but the problem is
79+
less severe because S3 is pretty cheap while ECR is not.
80+
* Some people were using the V1 Docker image publishing mechanism not as a vehicle for uploading Docker images to be used
81+
by the CDK’s CloudFormation deployment, but simply as a mechanism for building and publishing Docker images, to be
82+
used by a completely different deployment later. The lack of control over the target ECR repository breaks this
83+
use case (required the development of an `aws-ecr-deployments` construct module, which does give the necessary
84+
control but racks up costs by doubling ECR storage requirements, and still does not allow staging resource cleanup).
85+
* We always create an empty ECR repository because we cannot know whether apps deployed into the account will need
86+
it or not, so the ECR repository may go unused. AWS Security Hub will throw warnings about empty ECR repositories,
87+
which makes customers uneasy.
88+
* Bootstrap stacks are expected to be account-wide, and mix assets from all applications. Some customers that deploy
89+
multiple applications into the same account are very sensitive to this mixing, and would rather keep these resources
90+
separate. They can do multiple bootstrap stacks in the same account, but this is all a bit onerous.
91+
92+
## A new proposal: application specific staging resources
93+
94+
The bootstrap stack contains two classes of resources: staging resources, which hold assets (bucket and ECR repo), and
95+
roles, which allow for unattended (CI/CD) and cross-account access. In the new proposal, we will separate out the
96+
staging resources from the roles. Roles will still be bootstrapped (if used), but staging resources will not.
97+
98+
* Staging resources will be created on a per-CDK app basis. We will create one S3 bucket with different object prefixes
99+
for different types of assets (see Appendix A: two types of assets), and an ECR repository per Docker image. Resource
100+
access roles can also be created on an as-needed basis. This solves the problems of asset resources of different
101+
applications mixing together, and it would also remove the need for garbage collection by allowing use of life cycle
102+
rules.
103+
* Since the roles are now the only things that need to be bootstrapped, that will have a number of advantages:
104+
* Bootstrapping will be faster since the heavy resource of a KMS key is no longer involved.
105+
* Because roles are a global resource, every account now only needs to be bootstrapped once. First of all the lack
106+
of necessary control of regions will work a lot better with Control Tower+automatic Stack Sets (which does not
107+
allow region control).
108+
109+
If we can make the bootstrapping resources part of the CDK application, then users now have a familiar way to customize
110+
them to their heart’s content, so the treadmill of bootstrap stack customization requests is going to disappear, and
111+
customers will also not need to customize the bootstrap template anymore (assuming their customizations have to do with
112+
the resources instead of the roles).
113+
114+
A downside is potentially that we lose the ability to have a version number on the bootstrapped resources (because SSM
115+
is not global), but we might say that’s not necessary anymore since the Roles are unlikely to change often.
116+
117+
> If we wanted to maintain versioning on the Roles, we could say that the stack always must be deployed in `us-east-1`
118+
> and that’s where we look for the version; however, this may require cross-internet traffic and therefore be considered
119+
> dodgy from a reliability perspective, and we could only do the versioning check using the CLI, not from the
120+
> CloudFormation template. Of course we’ll have to pick the correct leader region per partition, `aws-cn`, `aws-iso`, etc.
121+
122+
### How it will work in practice
123+
124+
Bootstrapping resources are currently designed the way they are because the CLI relies on the assumption that the
125+
bootstrap resources are present with a well-known name, before the first CloudFormation deployment starts. In other
126+
words, this is purely a limitation of the orchestration, that we can take away.
127+
128+
Here’s what we’re going to do:
129+
130+
* We will introduce a new Stack Synthesizer, called `AppStagingSynthesizer`.
131+
* This synthesizer will create a support stack with the bucket, and an ECR repository per Docker image.
132+
* Assets will have a dependency on the support stack. This is a new concept that doesn’t currently exist because assets
133+
are an orchestration artifact that looks independent like stacks are, but they aren't really: in practice the orchestration
134+
ignores everything except stacks, and treats assets as being part of a stack.
135+
* Docker assets may still be built before the first deployment (although for proper caching we need the repository
136+
to exist first), but will only be uploaded when it’s their time in the orchestration workflow.
137+
* For a minimal diff these resources could have fixed names, but we could add support for Stack Outputs and assets could
138+
have support for Parameters, so that we can thread generated bucket and repository names through the system. For now,
139+
we will do fixed names for the staging resources.
140+
141+
### What the API looks like
142+
143+
To use the new synthesizer:
144+
145+
```ts
146+
import { AppStagingSynthesizer } from '@aws-cdk/app-staging-synthesizer';
147+
148+
const app = new App({
149+
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({
150+
appId: 'my-app-id', // put a unique id here
151+
deploymentIdentities: DeploymentIdentities.defaultBootstrapRoles({ bootstrapRegion: 'us-east-1' }),
152+
153+
// How long to keep File and Docker assets around for rollbacks (without requiring resynth)
154+
deployTimeFileAssetLifetime: Duration.days(100),
155+
imageAssetVersionCount: 10,
156+
}),
157+
});
158+
```
159+
160+
For any additional customization (such as using custom buckets or ECR repositories), `DefaultStagingStack`
161+
can be subclasses or a full reimplementation of `IStagingResources` can be provided:
162+
163+
```ts
164+
class MyStagingStack extends DefaultStaginStack {
165+
private bucket?: s3.Bucket;
166+
167+
public addFile(asset: FileAssetSource): FileStagingLocation {
168+
this.getCreateBucket();
169+
170+
return {
171+
bucketName: 'my-asset-bucket',,
172+
dependencyStack: this,
173+
};
174+
}
175+
176+
private createOrGetBucket() {
177+
if (!this.bucket) {
178+
this.bucket = new s3.Bucket(this, 'Bucket', {
179+
bucketName: 'my-asset-bucket',
180+
});
181+
}
182+
return this.bucket;
183+
}
184+
}
185+
186+
const app = new App({
187+
defaultStackSynthesizer: AppStagingSynthesizer.customFactory({
188+
factory: {
189+
obtainStagingResources(stack, context) {
190+
const myApp = App.of(stack);
191+
return new MyStagingStack(myApp, `CustomStagingStack-${context.environmentString}`, {});
192+
},
193+
},
194+
}),
195+
});
196+
```
197+
198+
---
199+
200+
Ticking the box below indicates that the public API of this RFC has been
201+
signed-off by the API bar raiser (the `api-approved` label was applied to the
202+
RFC pull request):
203+
204+
```
205+
[ ] Signed-off by API Bar Raiser @xxxxx
206+
```
207+
208+
## Public FAQ
209+
210+
### What are we launching today?
211+
212+
We are launching a new synthesizer that has fewer demands on the AWS account that CDK apps are deployed into. It only
213+
needs preprovisioned Roles, and those are only necessary for CI/CD deployments or for cross-account deployments. For
214+
same-account, CLI deployments no bootstrapping is necessary anymore. If you are using bootstrapped roles anyway,
215+
they only need to be provisioned in one region, making it easier to use with StackSets.
216+
217+
The new staging resources are specific to an application and can be cleaned up alongside the application. In addition,
218+
the way the staging resources are structured, they now allow the use of lifecycle rules, keeping costs down for
219+
running CDK applications over a long period of time.
220+
221+
### Why should I use this feature?
222+
223+
You should use this feature if you:
224+
225+
- Want to take advantage of lifecycle rules on asset staging resources;
226+
- Do not use ECR and don't want to see the SecurityHub warning that tells you you have an empty ECR repository;
227+
- Need to deploy to multiple regions in a set of accounts and want to use StackSets to bootstrap the accounts;
228+
- Want to deploy an application and remove it and be sure that the assets have been cleaned up as well;
229+
230+
## Internal FAQ
231+
232+
### Why should we _not_ do this?
233+
234+
Users generally don't appreciate change, especially if it saddles them with busywork. While the migration path will be
235+
purely optional, and there are definite benefits to be had, synthesis+bootstrapping is already a sore spot for users
236+
(it’s hard to explain and therefore a bit under-documented) and introducing more churn may lead to backlash.
237+
238+
### What is the high-level project plan?
239+
240+
- We will release the new synthesizer as an optional feature, first initially only for the CLI.
241+
- CDK Pipelines support can be added later. When Pipelines support is added, it should be taken into
242+
account that the time interval between stage deployments may be significant, especially if it involves manual
243+
approval steps. We must take care that the docker images published to the Testing stage are not rebuilt for
244+
the Production stage, but are replicated.
245+
- We have to clearly explain the concept of Synthesizers, the account contract, and Bootstrapping, along with the choices
246+
users have and how they should navigate them in the Developer Guide.
247+
- Customization by subclassing is possible, but we will probably have to selectively expose some protected helper
248+
functions to make it more convenient. We will do that when feature requests start coming in.
249+
- After a tryout period, we will move the synthesizer into the core library and document it as a possible alternative
250+
in the developer guide, and we will probably vend a bootstrap template specifically for this synthesizer.
251+
252+
### New bootstrap template
253+
254+
By introducing a new template, we technically have an opportunity to rename roles and get rid of the `hnb659fds`
255+
identifier that customers hate. However, to make the migration from the current bootstrap stack as smooth as possible,
256+
we probably should NOT be taking this opportunity and just keep the same role names.
257+
258+
The new bootstrap template will contain exactly the **CloudFormation Execution Role**, **Deployment Role**, and **Lookup Role**
259+
from the current template, and nothing else.
260+
261+
We can put a version on it for informational purposes, but that version will not be checkable by CloudFormation deployments;
262+
perhaps it could be make checkable by the CLI during `cdk deploy` time. At least `cdk bootstrap` will be able to look at the
263+
version to prevent downgrading.
264+
265+
The bootstrap template will be selected by either running `cdk bootstrap` in an app directory that uses the `AppStagingSynthesizer`,
266+
or passing a command-line flag to CDK bootstrap: `cdk bootstrap --synthesizer=[legacy|default|appstaging]`. If `cdk bootstrap` detects
267+
it is changing the "type" of bootstrap stack, it will throw up a confirmation prompt with an explanation of the consequences:
268+
269+
```
270+
$ cdk bootstrap --synthesizer=appstaging
271+
This operation will change the style of bootstrap stack from "default" version 18 to "appstaging" version 1.
272+
This bootstrap stack style has been designed for the AppStagingSynthesizer; make sure that you are using that synthesizer
273+
in the CDK apps you plan to deploy to this environment. For more information, see http://amzn.to/5vjQYrtejA.
274+
Continue (y/N)?
275+
```
276+
277+
### Are there any open issues that need to be addressed later?
278+
279+
- The template for the staging resources stack must be small enough to fit into a CloudFormation API call, which means
280+
it may not exceed 50kB. Since every ECR repository will add to this size, we have to limit the count. We may need
281+
to create multiple stacks using an overflow strategy to lift this limit.
282+
283+
## Appendix A: two types of assets
284+
285+
There are two types of assets:
286+
287+
* “Handoff” assets: these are temporarily put somewhere, so that in the course of a service call we can point to them.
288+
The service will make their own copy of these assets. For example, large CloudFormation templates and Lambda Code
289+
bundles are an example of this: the CloudFormation template will only read the template once during the deployment,
290+
and the Lambda service will make a private copy of the S3 file.
291+
* Rollbacks by means of a pure-CloudFormation deployment (so not fresh deployment that involves a CLI call) may
292+
require presence of the old handoff asset for a while, so it shouldn’t be deleted right away, but it is reasonable
293+
to put a lifecycle policy on handoff assets, equal to the longest period of time a user should still reasonably
294+
expect to want to do a rollback in (see the BONES sev2 and damage control campaign from a couple of years ago when
295+
the BONES team decided a month was a reasonable period and some service team wanted to roll back to a version of 2
296+
months old).
297+
* “Live” assets: these get continuously accessed in their staged location by the running application. Examples are ALL
298+
Docker images (ECS will constantly pull from the user’s ECR container, and never make their own copy), and some
299+
asset-assisted conveniences like CodeBuild shellables or CFN-init scripts.
300+
* These can in principle only be garbage collected by mark-and-sweep: we must know they are not needed by any
301+
current CDK stacks, nor by any CDK stack revisions the user might want to roll back to.
302+
* However, for ECR images we can do slightly better: since we have an ECR repository per docker image per
303+
application, we can use a lifecycle policy of the form “keep only the most recent 5 images.”
304+
* That leaves only certain eccentric types of file assets which are not collectible (until the entire application
305+
gets deleted). This might be a “good enough” position to be in.

0 commit comments

Comments
 (0)
Please sign in to comment.