A collection of Python scripts for processing, splitting, and importing contact data from Excel files into a PostgreSQL database.
This project provides tools to:
- Split large Excel files into manageable chunks
- Process and transform contact data
- Import contacts and company information into a PostgreSQL database
- Clean up data by removing empty records
- Update and maintain database integrity
- Python 3.6+
- PostgreSQL database
- Required Python packages (see below)
-
Clone this repository:
git clone <repository-url> cd people-db-py-data-dump
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install required packages:
pip install pandas psycopg2-binary python-dotenv openpyxl
-
Configure your database connection by creating a
.env
file:DATABASE_URL=postgres://username:password@hostname:port/database_name
Use split.py
to split large Excel files into smaller chunks:
python split.py
This will process files in the input_files
directory and output chunks to the split_output
directory.
Use import_to_db.py
to process and import data:
python import_to_db.py
This script:
- Processes Excel files from the specified directory
- Extracts contact and company information
- Transforms the data into the required format
- Imports the data into the PostgreSQL database
Use delete_empty_rows.py
to remove empty or invalid records:
python delete_empty_rows.py
Use update_company_ids.py
to generate and update company UUIDs:
python update_company_ids.py
The scripts process Excel files with contact information and import them into two main database tables:
contacts
: Individual contact informationcompanyProfilesData
: Company information
input_files/
: Directory containing large Excel files to be processedsplit_output/
: Directory containing split Excel chunkssplit.py
: Script for splitting large Excel filesimport_to_db.py
: Script for processing and importing datadelete_empty_rows.py
: Script for cleaning up empty recordsupdate_company_ids.py
: Script for updating company IDs
- The scripts are designed to handle large datasets efficiently
- Make sure your database connection is properly configured in the
.env
file - For very large files, adjust the chunk size in
split.py
as needed
[Specify your license here]
[List contributors here]