Skip to content

scriptscrypt/db-dump-py

Repository files navigation

People DB - Data Processing Tools

A collection of Python scripts for processing, splitting, and importing contact data from Excel files into a PostgreSQL database.

Overview

This project provides tools to:

  1. Split large Excel files into manageable chunks
  2. Process and transform contact data
  3. Import contacts and company information into a PostgreSQL database
  4. Clean up data by removing empty records
  5. Update and maintain database integrity

Requirements

  • Python 3.6+
  • PostgreSQL database
  • Required Python packages (see below)

Installation

  1. Clone this repository:

    git clone <repository-url>
    cd people-db-py-data-dump
    
  2. Set up a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install required packages:

    pip install pandas psycopg2-binary python-dotenv openpyxl
    
  4. Configure your database connection by creating a .env file:

    DATABASE_URL=postgres://username:password@hostname:port/database_name
    

Usage

Splitting Large Excel Files

Use split.py to split large Excel files into smaller chunks:

python split.py

This will process files in the input_files directory and output chunks to the split_output directory.

Importing Data to Database

Use import_to_db.py to process and import data:

python import_to_db.py

This script:

  • Processes Excel files from the specified directory
  • Extracts contact and company information
  • Transforms the data into the required format
  • Imports the data into the PostgreSQL database

Data Cleanup

Use delete_empty_rows.py to remove empty or invalid records:

python delete_empty_rows.py

Updating Company IDs

Use update_company_ids.py to generate and update company UUIDs:

python update_company_ids.py

Data Structure

The scripts process Excel files with contact information and import them into two main database tables:

  • contacts: Individual contact information
  • companyProfilesData: Company information

File Structure

  • input_files/: Directory containing large Excel files to be processed
  • split_output/: Directory containing split Excel chunks
  • split.py: Script for splitting large Excel files
  • import_to_db.py: Script for processing and importing data
  • delete_empty_rows.py: Script for cleaning up empty records
  • update_company_ids.py: Script for updating company IDs

Notes

  • The scripts are designed to handle large datasets efficiently
  • Make sure your database connection is properly configured in the .env file
  • For very large files, adjust the chunk size in split.py as needed

License

[Specify your license here]

Contributors

[List contributors here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages