This project involves analyzing and predicting the health index of power transformers using various machine learning models. The dataset used for this analysis includes key features like gas concentrations, dielectric rigidity, and health-related factors, which are essential in evaluating transformer health.
├── README.md
├── dataset
│ └── health-index.csv # Raw dataset
├── main.py # Entry point of the project
├── outputs
│ ├── graphs
│ │ ├── boxplots # Boxplot visualizations
│ │ ├── correlation_heatmap.png # Heatmap of feature correlations
│ │ ├── countplots # Countplot visualizations
│ │ ├── histograms # Histogram visualizations
│ │ ├── kdeplots # KDE plot visualizations
│ │ ├── pairplots # Pairplot visualizations
│ │ └── scatterplots # Scatterplot visualizations
│ └── models
│ ├── scaled_data.csv # Preprocessed scaled data
│ └── scaler.pkl # Scaler used for data normalization
├── requirements.txt # Python dependencies
└── src
├── __pycache__ # Compiled Python files
├── data_loader.py # Script to load the dataset
├── eda.py # Script for exploratory data analysis (EDA)
├── models
│ └── linear_regression.py # Linear regression model script
├── preprocess.py # Data preprocessing script
├── scaler.py # Script for scaling data
├── train.py # Script to train models
└── visualizations.py # Script for generating visualizations
-
Clone the repository:
git clone https://github.com/SanjoyPator1/power-transformer-ml.git
-
Install required dependencies
pip install -r requirements.txt
The dataset used for this project is located in the dataset/health-index.csv file. It contains data on various gases, health index values, and transformer-related features. The dataset is pre-processed before being used in machine learning models.
- Acetylene
- CO2
- CO
- DBDS
- Dielectric rigidity
- Ethane
- Ethylene
- Hydrogen
- Interfacial Voltage
- Life Expectation
- Methane
- Nitrogen
- Oxygen
- Power Factor
- Water Content
- Health Index
The first step is to load and preprocess the data. This involves handling missing values, scaling features, and splitting the dataset into training and test sets. The data preprocessing is managed by the preprocess.py
script.
Script to run:
preprocess.py
(Add your preprocessing code here)
In this step, you'll explore the dataset by visualizing relationships between features using boxplots, histograms, scatterplots, and other relevant graphs. The results of the analysis will be saved in the outputs/graphs
folder.
Script to run:
eda.py
(Add your EDA code here)
This stage involves training various machine learning models, such as linear regression, to predict the health index. The trained models will be saved in the outputs/models
folder as .pkl
files for future use.
Script to run:
train.py
(Add your model training code here)
To ensure that all features are on a comparable scale, feature scaling is applied. This step is handled by the scaler.py
script, which also saves the scaler as a .pkl
file for later use in predictions.
Script to run:
scaler.py
(Add your scaler code here)
Visualizations such as boxplots, scatter plots, and pair plots are generated to better understand the data and the results of the analysis. These plots are saved in the outputs/graphs
directory for further examination.
Script to run:
visualizations.py
(Add your visualizations code here)
The main.py
script ties everything together and runs the entire project. It sequentially calls the other scripts to preprocess the data, perform EDA, train models, scale features, and generate visualizations.
Script to run:
main.py
(Add your main execution code here)
The results of your analysis, including generated graphs and trained models, will be available in the
Feel free to fork the repository and contribute by creating pull requests. If you encounter any issues or have suggestions for improvements, please create an issue on GitHub.
This project is licensed under the MIT License.