city_of_boston_services

Problem statement - 311 Service Request Analytics

(This is a hypothetical and synthetic requirement formulated for the zoomcamp project).

City of Boston wishes to review how its residents requested city services and how departments responded to those requests. The city also aim to identify areas of imporvement on its newly deployed 311 system and enhance the efficiency of City departments that deliver basic city services.

To solve this need, the data engineering / analytics need to create a pipeline to convert the raw data collected into actionable dashboards for the management team to analyze and curate to make informed decision.

KPI's

Simplify and shorten time between a resident reporting a problem and its resolution.
Enhance the 311 system to become an effective management tool, generating real-time reports that help departments manage staff, track trends, highlight and monitor performance, target efficiency needs and maximize its resources.
Determine the impact of the 311 system on 911 Emergency System (more efficient by diverting non-emergency calls that could impede the City's emergency response.)

Project high level design

This project produces a pipeline which:

Uses Terraform to manage the infrastructure
Extract the raw data into GCS in Google cloud
Transforms the raw data into standard tables using Apache Spark
Load the transformed data into BigQuery
repartition and Write the raw data into an archival storage in parquet format
Produce dashboard tiles in Google Data studio.
Ochestrate the pipeline using Airflow

Architecture

How the Ochestration/DAG Work

The BashOperator if used to download the file from the API, the data is ingested into GCS with the pythonOperator. A sparkSubmitOperator is used to read the data from GCS, trasform it and load transform data to BigQuery. Data Studio was used to connect to the transformed data and dashboard i built from it. The pipeline refreshes data on a monthly basis with 311 request information recieved, extracting the raw data into data lake first for storage, transforming the data and loading into data warehouse(bigQuery) for easier dashboard construction and analytics.

Dataset

City of Boston Dataset

Technology choices

Cloud: GCP
Datalake: GCS Bucket
Infrastructure as code (IaC): Terraform
Workflow orchestration: Airflow
Data Warehouse: BigQuery
Transformations: PySpark

Steps to reproduce

Create a gcp account
Configure Google SDK, create a service account and download you secret key(json)
Clone the repo
Provision your google infrastructure(GCS, Bigquery) with Terraform
Update the location of your secret key in the docker file
Run the docker-compose.yaml file which contains Airflow, Spark, PostgreSQL (docker-compose up -d)
Confirm all service are started.
Connect to Airflow and turn on the DAG.
Trigger the dag or wait for its scheduled run(once a month)
Connect to bigQuery to explore loaded data and connect Data Studio to build dashboards.

Proposal to address the requirements

Data ingestion - Using Airflow to download the dataset and place it in google cloud storage(GCS)
Data warehouse - BigQuery will be used to host the tables
Transformations - Use pyspark to transform the data from GCS bucket and add to BigQuery (partitioned and clustered)
Dashboard - Use Google Data studio to build the dashboards

Qustions answered

Most used channel for requesting service
Percent of ticket resolved ontime
Maximum no of days to resolve a ticket in each category
Most frequent request
Average resolution time of a case type

Dashboard Tiles

Visualization

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
build		build
doc		doc
res		res
samples		samples
.DS_Store		.DS_Store
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

city_of_boston_services

Problem statement - 311 Service Request Analytics

Project high level design

How the Ochestration/DAG Work

Dataset

Technology choices

Steps to reproduce

Proposal to address the requirements

Qustions answered

Dashboard Tiles

About

Releases

Packages

Languages

gabidoye/city_of_boston_services-

Folders and files

Latest commit

History

Repository files navigation

city_of_boston_services

Problem statement - 311 Service Request Analytics

Project high level design

How the Ochestration/DAG Work

Dataset

Technology choices

Steps to reproduce

Proposal to address the requirements

Qustions answered

Dashboard Tiles

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages