You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit includes a major revision to the project.
* All IAM policies revised to provide least-privilege access
* Configuration parameters renamed and re-documented to simplify the configuration process
* "Putting it together" section updated with instructions & AWS CLI commands for copying Sales and Marketing sample data sets
* Updated Python 2.7-style print statements
* Minor bug fixes to GlueRunner and AthenaRunner lambda functions
Copy file name to clipboardexpand all lines: README.md
+46-30
Original file line number
Diff line number
Diff line change
@@ -161,11 +161,11 @@ Specifies parameters for creation of the `gluerunner-lambda` CloudFormation stac
161
161
```json
162
162
[
163
163
{
164
-
"ParameterKey": "SourceS3BucketName",
164
+
"ParameterKey": "ArtifactBucketName",
165
165
"ParameterValue": "<NO-DEFAULT>"
166
166
},
167
167
{
168
-
"ParameterKey": "SourceS3Key",
168
+
"ParameterKey": "LambdaSourceS3Key",
169
169
"ParameterValue": "src/gluerunner.zip"
170
170
},
171
171
{
@@ -180,9 +180,9 @@ Specifies parameters for creation of the `gluerunner-lambda` CloudFormation stac
180
180
```
181
181
#### Parameters:
182
182
183
-
*`SourceS3BucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) from which the Glue Runner AWS Lambda function package (.zip file) will be fetched by AWS CloudFormation.
183
+
*`ArtifactBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) in which Glue scripts and Lambda function source will be stored. **If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.**
184
184
185
-
*`SourceS3Key` - The Amazon S3 key (e.g. `src/gluerunner.zip`) pointing to your AWS Lambda function's .zip package.
185
+
*`LambdaSourceS3Key` - The Amazon S3 key (e.g. `src/gluerunner.zip`) pointing to your AWS Lambda function's .zip package in the artifact bucket.
186
186
187
187
*`DDBTableName` - The Amazon DynamoDB table in which the state of active AWS Glue jobs is tracked between Glue Runner AWS Lambda function invocations.
188
188
@@ -196,11 +196,11 @@ Specifies parameters for creation of the `gluerunner-lambda` CloudFormation stac
196
196
```json
197
197
[
198
198
{
199
-
"ParameterKey": "SourceS3BucketName",
199
+
"ParameterKey": "ArtifactBucketName",
200
200
"ParameterValue": "<NO-DEFAULT>"
201
201
},
202
202
{
203
-
"ParameterKey": "SourceS3Key",
203
+
"ParameterKey": "LambdaSourceS3Key",
204
204
"ParameterValue": "src/athenarunner.zip"
205
205
},
206
206
{
@@ -215,9 +215,9 @@ Specifies parameters for creation of the `gluerunner-lambda` CloudFormation stac
215
215
```
216
216
#### Parameters:
217
217
218
-
*`SourceS3BucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) from which the Athena Runner AWS Lambda function package (.zip file) will be fetched by AWS CloudFormation.
218
+
*`ArtifactBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) in which Glue scripts and Lambda function source will be stored. **If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.**
219
219
220
-
*`SourceS3Key` - The Amazon S3 key (e.g. `src/athenarunner.zip`) pointing to your AWS Lambda function's .zip package.
220
+
*`LambdaSourceS3Key` - The Amazon S3 key (e.g. `src/athenarunner.zip`) pointing to your AWS Lambda function's .zip package.
221
221
222
222
*`DDBTableName` - The Amazon DynamoDB table in which the state of active AWS Athena queries is tracked between Athena Runner AWS Lambda function invocations.
223
223
@@ -234,20 +234,20 @@ Sample content:
234
234
```json
235
235
{
236
236
"gluerunner": {
237
-
"SourceS3BucketName": "<NO-DEFAULT>",
238
-
"SourceS3Key":"src/gluerunner.zip"
237
+
"ArtifactBucketName": "<NO-DEFAULT>",
238
+
"LambdaSourceS3Key":"src/gluerunner.zip"
239
239
},
240
240
"ons3objectcreated": {
241
-
"SourceS3BucketName": "<NO-DEFAULT>",
242
-
"SourceS3Key":"src/ons3objectcreated.zip"
241
+
"ArtifactBucketName": "<NO-DEFAULT>",
242
+
"LambdaSourceS3Key":"src/ons3objectcreated.zip"
243
243
}
244
244
}
245
245
```
246
246
#### Parameters:
247
247
248
-
*`SourceS3BucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) to which the Glue Runner AWS Lambda function package (.zip file) will be deployed. If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.
248
+
*`ArtifactBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) in which Glue scripts and Lambda function source will be stored. **If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.**
249
249
250
-
*`SourceS3Key` - The Amazon S3 key (e.g. `src/gluerunner.zip`) for your AWS Lambda function's .zip package.
250
+
*`LambdaSourceS3Key` - The Amazon S3 key (e.g. `src/gluerunner.zip`) for your AWS Lambda function's .zip package.
251
251
252
252
>**NOTE: The values set here must match values set in `cloudformation/gluerunner-lambda-params.json`.**
253
253
@@ -260,26 +260,34 @@ Specifies parameters for creation of the `glue-resources` CloudFormation stack (
260
260
```json
261
261
[
262
262
{
263
-
"ParameterKey": "S3ETLScriptPath",
263
+
"ParameterKey": "ArtifactBucketName",
264
264
"ParameterValue": "<NO-DEFAULT>"
265
265
},
266
266
{
267
-
"ParameterKey": "S3ETLOutputPath",
268
-
"ParameterValue": "<NO-DEFAULT>"
267
+
"ParameterKey": "ETLScriptsPrefix",
268
+
"ParameterValue": "scripts"
269
269
},
270
270
{
271
-
"ParameterKey": "SourceDataBucketName",
271
+
"ParameterKey": "DataBucketName",
272
272
"ParameterValue": "<NO-DEFAULT>"
273
+
},
274
+
{
275
+
"ParameterKey": "ETLOutputPrefix",
276
+
"ParameterValue": "output"
273
277
}
274
278
]
275
279
```
276
280
#### Parameters:
277
281
278
-
*`S3ETLScriptPath` - The Amazon S3 path (including bucket name and prefix in ``s3://example/path`` format) to which AWS Glue scripts under `glue-scripts` directory will be dpeloyed.
282
+
*`ArtifactBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) that will be created by the `step-functions-resources.yaml` CloudFormation template. **If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.**
283
+
284
+
*`ETLScriptsPrefix` - The Amazon S3 prefix (in the format ``example/path`` without leading or trailing '/') to which AWS Glue scripts will be deployed in the artifact bucket. Glue scripts can be found under the `glue-scripts` project directory
285
+
286
+
*`DataBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) that will be created by the `step-functions-resources.yaml` CloudFormation template. This is the bucket to which Sales and Marketing datasets must be uploaded. It is also the bucket in which output will be created. **This bucket is created by `step-functions-resources` CloudFormation. CloudFormation stack creation will fail if the bucket already exists.**
287
+
288
+
*`ETLOutputPrefix` - The Amazon S3 prefix (in the format ``example/path`` without leading or trailing '/') to which AWS Glue jobs will produce their intermediary outputs. This path will be created in the data bucket.
279
289
280
-
*`S3ETLOutputPath` - The Amazon S3 path to which AWS Glue jobs will produce their intermediary outputs.
281
290
282
-
*`SourceDataBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) that will be created by the `step-functions-resources.yaml` CloudFormation template. This is the bucket to which Sales and Marketing datasets must be uploaded.
283
291
284
292
The parameters are used by AWS CloudFormation during the creation of `glue-resources` stack.
285
293
@@ -290,7 +298,7 @@ Specifies the parameters used by Glue Runner AWS Lambda function at run-time.
@@ -299,7 +307,7 @@ Specifies the parameters used by Glue Runner AWS Lambda function at run-time.
299
307
```
300
308
#### Parameters:
301
309
302
-
*`sfn_activity_arn` - AWS Step Functions activity task ARN. This ARN is used to query AWS Step Functions for new tasks (i.e. new AWS Glue jobs to run). The ARN is a combination of the AWS region, your AWS account Id, and the name property of the [AWS::StepFunctions::Activity](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-stepfunctions-activity.html) resource in the `stepfunctions-resources.yaml` CloudFormation template. An ARN looks as follows `arn:aws:states:<AWS-REGION>:<YOUR-AWS-ACCOUNT-ID>:activity:<STEPFUNCTIONS-ACTIVITY-NAME>`. By default, the activity name is `GlueRunnerActivity`.
310
+
*`sfn_activity_arn` - AWS Step Functions activity task ARN. This ARN is used to query AWS Step Functions for new tasks (i.e. new AWS Glue jobs to run). The ARN is a combination of the AWS region, your AWS account Id, and the name property of the [AWS::StepFunctions::Activity](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-stepfunctions-activity.html) resource in the `stepfunctions-resources.yaml` CloudFormation template. An ARN looks as follows `arn:aws:states:<AWS-REGION>:<AWS-ACCOUNT-ID>:activity:<STEPFUNCTIONS-ACTIVITY-NAME>`. By default, the activity name is `GlueRunnerActivity`.
303
311
304
312
*`sfn_worker_name` - A property that is passed to AWS Step Functions when getting activity tasks.
305
313
@@ -317,19 +325,19 @@ Specifies parameters for creation of the `step-functions-resources` CloudFormati
317
325
```json
318
326
[
319
327
{
320
-
"ParameterKey": "SourceS3BucketName",
328
+
"ParameterKey": "ArtifactBucketName",
321
329
"ParameterValue": "<NO-DEFAULT>"
322
330
},
323
331
{
324
-
"ParameterKey": "SourceS3Key",
332
+
"ParameterKey": "LambdaSourceS3Key",
325
333
"ParameterValue": "src/ons3objectcreated.zip"
326
334
},
327
335
{
328
336
"ParameterKey": "GlueRunnerActivityName",
329
337
"ParameterValue": "GlueRunnerActivity"
330
338
},
331
339
{
332
-
"ParameterKey": "SourceDataBucketName",
340
+
"ParameterKey": "DataBucketName",
333
341
"ParameterValue": "<NO-DEFAULT>"
334
342
}
335
343
]
@@ -340,11 +348,11 @@ Specifies parameters for creation of the `step-functions-resources` CloudFormati
340
348
341
349
Both parameters are also used by AWS CloudFormation during stack creation.
342
350
343
-
*`SourceS3BucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) to which the `ons3objectcreated` AWS Lambda function package (.zip file) will be deployed. If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.
351
+
*`ArtifactBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix) to which the `ons3objectcreated` AWS Lambda function package (.zip file) will be deployed. If a bucket with such a name does not exist, the `deploylambda` build command will create it for you with appropriate permissions.
344
352
345
-
*`SourceS3Key` - The Amazon S3 key (e.g. `src/ons3objectcreated.zip`) for your AWS Lambda function's .zip package.
353
+
*`LambdaSourceS3Key` - The Amazon S3 key (e.g. `src/ons3objectcreated.zip`) for your AWS Lambda function's .zip package.
346
354
347
-
*`SourceDataBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix). All OnS3ObjectCreated CloudWatch Events will for the bucket be handled by the `ons3objectcreated` AWS Lambda function. **This bucket will be created by CloudFormation. CloudFormation stack creation will fail if the bucket already exists.**
355
+
*`DataBucketName` - The Amazon S3 bucket name (without the `s3://...` prefix). All OnS3ObjectCreated CloudWatch Events will for the bucket be handled by the `ons3objectcreated` AWS Lambda function. **This bucket will be created by CloudFormation. CloudFormation stack creation will fail if the bucket already exists.**
Note that the `step-functions-resources` stack **must** be created first, before the `glue-resources` stack.
492
500
493
-
Now head to the AWS Step Functions console. Start and observe an execution of the 'MarketingAndSalesETLOrchestrator' state machine. Execution should halt at the 'Wait for XYZ Data' states. At this point, you should upload the sample .CSV files under the `samples` directory to the S3 bucket you specified as the `SourceDataBucketName` parameter value in `step-functions-resources-config.json` configuration file. This should allow the state machine to move on to next steps -- Process Sales Data and Process Marketing Data.
501
+
Now head to the AWS Step Functions console. Start and observe an execution of the 'MarketingAndSalesETLOrchestrator' state machine. Execution should halt at the 'Wait for XYZ Data' states. At this point, you should upload the sample .CSV files under the `samples` directory to the S3 bucket you specified as the `SourceDataBucketName` parameter value in `step-functions-resources-config.json` configuration file. **Upload the marketing sample file under prefix 'marketing' and the sales sample file under prefix 'sales'. To do that, you may issue the following AWS CLI commands while at the project's root directory:**
0 commit comments