Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: myzhengSIMM/PertKGE
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.0
Choose a base ref
...
head repository: myzhengSIMM/PertKGE
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
  • 2 commits
  • 2 files changed
  • 1 contributor

Commits on Nov 30, 2024

  1. Update README.md

    Leafaeolian authored Nov 30, 2024

    Verified

    This commit was signed with the committer’s verified signature.
    pjenvey Philip Jenvey
    Copy the full SHA
    61ebe63 View commit details

Commits on Dec 19, 2024

  1. Update README.md

    Leafaeolian authored Dec 19, 2024
    Copy the full SHA
    c8e4fc3 View commit details
Showing with 80 additions and 0 deletions.
  1. +2 −0 README.md
  2. +78 −0 src/README.md
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -84,6 +84,8 @@ To get other biomedical knowledge graphs for comparation, plz refer to [HetioNet
3/20 Upload data and model weight for "PertKGE identified five new scaffold hits for ALDH1B1"
4/24 Upload src code and data.
```
## Cite
Shengkun Ni, Xiangtai Kong, Yingying Zhang, et al. Identifying compound-protein interactions with knowledge graph embedding of perturbation transcriptomics. Cell Genom. (2024).

## Contact
nishengkun@simm.ac.cn
78 changes: 78 additions & 0 deletions src/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,79 @@
## For target inference

If there exist new interested compound with perturbation transcriptomics to conduct target inference, plz follow this step.

### Step1:Processing your own transcriptomics data

This step depends entirely on the format of your data. You can process it using your own differential expression gene (DEGs) analysis code. Generally, two strategies are recommended:

1. Use the "top strategy" to identify DEGs, as described in our paper.
2. Alternatively, use a standard DEG pipeline, such as Limma, etc.

In either case, the final data format should be as following triples:

```
<Interested_compound_name Downregulates mRNA:ATP1B1>
# Note: aligning gene name using our map_file
```

### Step2:Preparing training file

We need to prepare four files:**"cause.txt", "process.txt", "effect.txt", and "test.txt"**.

The simplest approach I recommend is to directly use the files from the directory "../processed_data/target_inference_2/". The only part that needs modification is **adding the triplets identified in the first step to the "effect.txt" file**.

Then, place "cause/effect/test" into a new folder, such as "../processed_data/<name>/", where <name> can be any name you choose to name the task.

## Step3:Training stage

Using following cmd to train the PertKGE by replacing **<name>**:

```
$ python train_pertkge.py --cause_file "../processed_data/<name>/cause.txt"\
--process_file "../processed_data/knowledge_graph/process.txt"\
--effect_file "../processed_data/<name>/effect.txt"\
--test_file "../processed_data/<name>/test.txt"\
--h_dim 300\
--margin 1.0\
--lr 1e-4\
--wd 1e-5\
--n_neg 100\
--mode 'reproduce'\
--batch_size 2048\
--patients 5\
--warm_up 10\
--processed_data_file "../processed_data/<name>/"\
--save_model_path "../best_model/<name>/"\
--task "target_inference"\
--run_name "new_target_inference"
```

### Step4:Inference stage

Please follow the steps sequentially as outlined in the `target_inference.ipynb`. Adjust the paths in the user-defined cells accordingly.

```
'''
This section is user defined !!!
'''
h_dim = 300
data_path = "../processed_data/<name>/"
save_model_path = '../best_model/<name>/'
output_path = "../results/<name>/"
ent_list = ['Interested_compound_name'] # The `ent_list` refers to the list of compounds you wish to infer, which should correspond to the compound names you processed in the step 1.
```

### Another notation

In target inference, we provide two metrics: "ti_score" and "confidence".

- **"ti_score"** is the score directly predicted by the model.
- **"confidence"** represents how many other compounds were ranked lower than this compound for the target. This metric can help filter out potential false positives caused by representational bias.



## ligand virtual screening

The simplest approach is to directly use the model weights we provided for virtual screening.