Skip to content

gabidoye/city_of_boston_services-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

city_of_boston_services

Problem statement - 311 Service Request Analytics

(This is a hypothetical and synthetic requirement formulated for the zoomcamp project).

City of Boston wishes to review how its residents requested city services and how departments responded to those requests. The city also aim to identify areas of imporvement on its newly deployed 311 system and enhance the efficiency of City departments that deliver basic city services.

To solve this need, the data engineering / analytics need to create a pipeline to convert the raw data collected into actionable dashboards for the management team to analyze and curate to make informed decision.

KPI's

  1. Simplify and shorten time between a resident reporting a problem and its resolution.
  2. Enhance the 311 system to become an effective management tool, generating real-time reports that help departments manage staff, track trends, highlight and monitor performance, target efficiency needs and maximize its resources.
  3. Determine the impact of the 311 system on 911 Emergency System (more efficient by diverting non-emergency calls that could impede the City's emergency response.)

Project high level design

This project produces a pipeline which:

  1. Uses Terraform to manage the infrastructure
  2. Extract the raw data into GCS in Google cloud
  3. Transforms the raw data into standard tables using Apache Spark
  4. Load the transformed data into BigQuery
  5. repartition and Write the raw data into an archival storage in parquet format
  6. Produce dashboard tiles in Google Data studio.
  7. Ochestrate the pipeline using Airflow

Architecture image

How the Ochestration/DAG Work

The BashOperator if used to download the file from the API, the data is ingested into GCS with the pythonOperator. A sparkSubmitOperator is used to read the data from GCS, trasform it and load transform data to BigQuery. Data Studio was used to connect to the transformed data and dashboard i built from it. The pipeline refreshes data on a monthly basis with 311 request information recieved, extracting the raw data into data lake first for storage, transforming the data and loading into data warehouse(bigQuery) for easier dashboard construction and analytics.

Dataset

City of Boston Dataset

Technology choices

  1. Cloud: GCP
  2. Datalake: GCS Bucket
  3. Infrastructure as code (IaC): Terraform
  4. Workflow orchestration: Airflow
  5. Data Warehouse: BigQuery
  6. Transformations: PySpark

Steps to reproduce

  1. Create a gcp account
  2. Configure Google SDK, create a service account and download you secret key(json)
  3. Clone the repo
  4. Provision your google infrastructure(GCS, Bigquery) with Terraform
  5. Update the location of your secret key in the docker file
  6. Run the docker-compose.yaml file which contains Airflow, Spark, PostgreSQL (docker-compose up -d)
  7. Confirm all service are started.
  8. Connect to Airflow and turn on the DAG.
  9. Trigger the dag or wait for its scheduled run(once a month)
  10. Connect to bigQuery to explore loaded data and connect Data Studio to build dashboards.

Proposal to address the requirements

  1. Data ingestion - Using Airflow to download the dataset and place it in google cloud storage(GCS)
  2. Data warehouse - BigQuery will be used to host the tables
  3. Transformations - Use pyspark to transform the data from GCS bucket and add to BigQuery (partitioned and clustered)
  4. Dashboard - Use Google Data studio to build the dashboards

Qustions answered

  1. Most used channel for requesting service
  2. Percent of ticket resolved ontime
  3. Maximum no of days to resolve a ticket in each category
  4. Most frequent request
  5. Average resolution time of a case type

Dashboard Tiles

Visualization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published