TransGEM

Transformer-based model from gene expression to molecules.

TransGEM is a phenotype-based de novo drug design model, which can generate new bioactive molecules, independent from disease target information.

Setup

Install the environment

Create a conda environment:

conda env create -f environment.yaml

Activate the environment:

conda activate TransGEM

Download data

The data related to this study can be downloaded here.

Usage

TransGEM training

in subLINCS dataset

python train.py --data_path ./data/  --dataset subLINCS --gene_encoder tenfold_binary --gpu cuda:0 --epochs 200

in HCC515 dataset

python train.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0 --epochs 200

TransGEM fine-tuning

python ft_train.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0

Trained TransGEM testing

python test.py --data_path ./data/  --dataset subLINCS --gene_encoder tenfold_binary --gpu cuda:0

Fine-tuned TransGEM testing

python ft_test.py --data_path ./data/  --dataset HCC515 --gene_encoder tenfold_binary --gpu cuda:0

TransGEM application

for Prostate cancer

python app.py --data_path ./data/  --dataset PC --cell_line PC3 --gene_encoder tenfold_binary --gpu cuda:0 --seq_num 1000

for Non-small cell lung cancer

python app.py --data_path ./data/  --dataset nsclc --cell_line A549 --gene_encoder tenfold_binary --gpu cuda:0 --seq_num 1000

Get attention scores between 978 genes and generated molecules

python get_attention.py --data_path ./data/  --gene_encoder tenfold_binary --gpu cuda:0

Model options

usage:

python train.py --help

optional arguments:

-h, --help            show this help message and exit
--data_path           the directory where the data was inputted
--out_path            the directory where the trianing results was output
--dataset             the dataset used by the model (subLINCS/HCC515/PC/nsclc)
--gene_encoder        encoding form of gene expression (value/one_hot/binary/tenfold_binary)
--gpu                 CUDA device ids
--hidden_dim          hidden size of transformer decoder
--ff_dim              dimension number of the feed-forward layer
--PE_dropout          dropout of position coding
--TF_dropout          dropout of transformer layer
--TF_N                number of transformer decoder layer
--TF_H                number of transformer decoder head
--TF_act              activation function of transformer layer
--batch_size          number of batch_size
--epochs              number of epochs
--lr                  learning rate of adam
--cell_line           cell line names of disease
--pad_idx             id of pad symbol
--start_idx           id of start symbol
--end_idx             id of end symbol
--max_len             maximum length of generated molecule
--vocab_size          vocab size
--k                   number of molecules generated in a single beam search
--alpha               the weight of the length and score of molecules generated by bundle search
--seq_num             number of molecules ultimately retained

Model parameters of 4 encoding forms

Encoding form	hidden_dim	ff_dim	TF_N	TF_H
value	64	2048	6	8
one_hot	64	512	6	8
binary	64	512	6	8
tenfold_binary	64	512	6	8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransGEM

Setup

Install the environment

Download data

Usage

TransGEM training

TransGEM fine-tuning

Trained TransGEM testing

Fine-tuned TransGEM testing

TransGEM application

Get attention scores between 978 genes and generated molecules

Model options

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
TransGEM		TransGEM
assets		assets
ckpt		ckpt
README.md		README.md
app.py		app.py
environment.yaml		environment.yaml
ft_test.py		ft_test.py
ft_train.py		ft_train.py
get_attention.py		get_attention.py
rt_test.py		rt_test.py
test.py		test.py
train.py		train.py

hzauzqy/TransGEM

Folders and files

Latest commit

History

Repository files navigation

TransGEM

Setup

Install the environment

Download data

Usage

TransGEM training

TransGEM fine-tuning

Trained TransGEM testing

Fine-tuned TransGEM testing

TransGEM application

Get attention scores between 978 genes and generated molecules

Model options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages