Enrichment - Python Lambdas.
The enrichment wrangler is the start of the process. It first picks up the sng data from s3. It invokes the method lambda with this data. The method response contains two dataframes(data and anomalies), which are split out in the wrangler. Data is sent on to the sqs queue whereas the anomalies are sent via an sns topic.
The method is generic. As well as the data, it receives information about lookups to use and survey specific parameters. example:
"RuntimeVariables": {
"data":{ ...},
"lookups":{
"0": {
"file_name": "responder_county_lookup_prod.json",
"columns_to_keep": [
"responder_id",
"county"
],
"join_column": "responder_id",
"required": [
"county"
]
},
"1": {
"file_name": "county_lookup_county.json",
"columns_to_keep": [
"county_name",
"region",
"county",
"marine"
],
"join_column": "county",
"required": [
"region",
"marine"
]
}
},
"marine_mismatch_check": true,
"period_column": "period",
"identifier_column": "responder_id"
}
The 'file_name' dictates which file to get from s3.
The 'columns_to_keep' represents the columns from the lookup to join on.
The 'join_column' is the column to use to join onto the data.
The 'required' columns are used later in integrity tests, checking that no nulls exist in any required columns.
Parameters are taken from environment variables in the wrangler, packaged and sent over to the method. marine_mismatch_check - determines whether to run the marine mismatch check or not.
There are two integrity checks in the method.
Using a list of required columns that are constructed from the lookups section of the input. The missing column detector filters the original dataset to see any instances where required columns are null for a reference. It outputs a list of references with missing data for columns.
Detects references that are producing marine but from a county that doesnt produce marine by checking the 'land_or_marine' column against a specified column(marine) to confirm that if M, the marine column is y.
Marine mismatch detector is only suitable for sand and gravel. So far that is the only survey that differentiates between land and marine, so is the only survey that would benefit from this check.