Supervised Learning for Gene Expression Analysis

This repository showcases the application of Logistic Regression and Random Forest for gene expression analysis. It automates data processing, model training, and evaluation.

🚀 Features

Preprocessing: Formats gene expression data
Machine Learning: Uses Logistic Regression & Random Forest
Evaluation: Generates accuracy reports and confusion matrices
Automation: Includes a shell script & GitHub Actions

📂 Repository Structure

📂 **ml_gene_expression_project**  
 ┣ 📂 **data/**               → Stores gene expression data  
 ┣ 📂 **results/**            → Model reports and visualizations  
 ┣ 📜 **preprocess.py**       → Data processing script  
 ┣ 📜 **train.py**            → Model training script  
 ┣ 📜 **evaluate.py**         → Model evaluation with visualization  
 ┣ 📜 **run_pipeline.sh**     → Shell script to automate pipeline execution  
 ┣ 📜 **run_pipeline.yml**    → GitHub Actions workflow  
 ┣ 📜 **app.py**              → Streamlit app for interactive visualization  
 ┣ 📜 **README.md**           → Project documentation

🏃 Run the Pipeline

bash run_pipeline.sh

🖼️ Sample Outputs

Model Accuracy Reports in results/
Confusion Matrices saved as images

🤖 GitHub Actions

This repository supports automatic execution when new data is pushed.

📌 How to Use

Clone the repository:

git clone https://github.com/sivkri/GeneExpression-MachineLearning.git

Navigate to the directory:
```
cd GeneExpression-MachineLearning
```
Run the pipeline:
```
bash run_pipeline.sh
```

To launch the Streamlit app for interactive visualization:

streamlit run app.py

🔥 Key Enhancements:

✔ Streamlit app is properly emphasized
✔ Instructions for running the app are added
✔ Clear explanation of app features

This version sells your ML + Streamlit project effectively. Let me know if you need further refinements! 🚀

Results & Findings

Identified top genes differentiating wild-type (WT) vs knockout (KO) conditions
Evaluated the impact of Eltrombopag (E20) treatment
Achieved 75% accuracy with the Random Forest classifier

Project Overview

This project applies machine learning techniques to analyze gene expression data under different experimental conditions. Using logistic regression and random forest classifiers, we identify genes differentially expressed due to HuR knockout (ELAVL1 deletion) and Eltrombopag (E20) drug treatment.

Additionally, a Streamlit web application is integrated to provide an interactive visualization of the results.

Citation

If you use this dataset or findings, please cite the following study:
📖 DOI: 10.1186/s12915-025-02131-z

Experimental Design

This study investigates how HuR knockout (KO) and Eltrombopag (E20) treatment influence gene expression compared to wild-type (WT) and mock treatment (DMSO).

Sample Groups

The dataset consists of the following experimental conditions:

Sample Group	Description
WT-DMSO	Wild-type (WT) cells treated with mock (DMSO)
WT-E20	Wild-type (WT) cells treated with Eltrombopag (E20)
KO-DMSO	HuR knockout (KO) cells treated with mock (DMSO)
KO-E20	HuR knockout (KO) cells treated with Eltrombopag (E20)

Each sample contains gene expression data across thousands of genes. HuR (ELAVL1) is a key RNA-binding protein, and its knockout may significantly alter gene expression. Eltrombopag is a thrombopoietin receptor agonist that may influence transcriptional programs.

Comparisons & Research Questions

I have performed three key comparisons using supervised learning to classify gene expression profiles.

1️⃣ Effect of HuR Knockout (KO vs. WT)

Comparison: WT-DMSO vs. KO-DMSO
Objective: Identify genes affected by HuR deletion.
Machine Learning Approach:
- Features: Gene expression levels
- Labels: WT-DMSO (class 0) vs. KO-DMSO (class 1)

2️⃣ Effect of Eltrombopag in Wild-Type Cells

Comparison: WT-DMSO vs. WT-E20
Objective: Determine gene expression changes due to Eltrombopag in normal cells.
Machine Learning Approach:
- Features: Gene expression levels
- Labels: WT-DMSO (class 0) vs. WT-E20 (class 1)

3️⃣ Effect of Eltrombopag in HuR Knockout Cells

Comparison: KO-DMSO vs. KO-E20
Objective: Understand the HuR-dependent response to Eltrombopag.
Machine Learning Approach:
- Features: Gene expression levels
- Labels: KO-DMSO (class 0) vs. KO-E20 (class 1)

Data Processing & Machine Learning Workflow

Preprocessing:
- Normalize expression data
- Convert into a machine-learning-ready format
Model Training & Feature Selection:
- Train logistic regression and random forest classifiers
- Perform Principal Component Analysis (PCA)
Evaluation:
- Compute accuracy, confusion matrices, classification reports
- Identify top differentially expressed genes
Visualization & Reporting:
- Generate PCA scatter plots
- Save model performance metrics

📧 Contact

For queries, feel free to reach out! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Learning for Gene Expression Analysis

🚀 Features

📂 Repository Structure

🏃 Run the Pipeline

🖼️ Sample Outputs

🤖 GitHub Actions

📌 How to Use

To launch the Streamlit app for interactive visualization:

🔥 Key Enhancements:

Results & Findings

Project Overview

Citation

Experimental Design

Sample Groups

Comparisons & Research Questions

1️⃣ Effect of HuR Knockout (KO vs. WT)

2️⃣ Effect of Eltrombopag in Wild-Type Cells

3️⃣ Effect of Eltrombopag in HuR Knockout Cells

Data Processing & Machine Learning Workflow

📧 Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
app		app
data		data
results		results
GeneExpression-MachineLearning_colab.ipynb		GeneExpression-MachineLearning_colab.ipynb
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh
run_pipeline.yml		run_pipeline.yml
train.py		train.py

License

sivkri/GeneExpression-MachineLearning

Folders and files

Latest commit

History

Repository files navigation

Supervised Learning for Gene Expression Analysis

🚀 Features

📂 Repository Structure

🏃 Run the Pipeline

🖼️ Sample Outputs

🤖 GitHub Actions

📌 How to Use

To launch the Streamlit app for interactive visualization:

🔥 Key Enhancements:

Results & Findings

Project Overview

Citation

Experimental Design

Sample Groups

Comparisons & Research Questions

1️⃣ Effect of HuR Knockout (KO vs. WT)

2️⃣ Effect of Eltrombopag in Wild-Type Cells

3️⃣ Effect of Eltrombopag in HuR Knockout Cells

Data Processing & Machine Learning Workflow

📧 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages