world-headlines is a service, providing headline news from various countries. You can get to the web service here: world-headlines.vercel.app.
Especially this repository's code does automatic updating process for world-headlines service using apache aiflow.
- Default scheduling term is a hour
- Google news provides rss news information with url starting like
news.google.com/rss
airflow
- contains airflow codeairflow/dags/custom
- custom codes (e.g. opertators)airflow/tests
- test codes (pytest
)sql
- contains SQL queries initializing database
world-headlines service uses country codes to distinguish country dependent information.
ISO_3166-1_alpha-2 provides country identity as 2-characters. This service uses this, but small letters.
- Clone frontend github repository and create your github page with it
- Clone this repository and create
airflow/.env
file like below
AIRFLOW_IMAGE_NAME=world-headlines-airflow
AIRFLOW_UID=50000
MSSQL_CONN_STR=<your SQL Server connection string for sqlalchemy library>
GITHUB_REPO_CONN_STR=<your github page repository connection string>
- Run
run.sh
orrun.bat
file