You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🪞🧬 Text representation for biomedical entities via PyOBO (pykeen#1055)
Similarly to the `WikidataTextRepresentation`, this uses `PyOBO` as a
service for looking up labels for entities encoded with `CURIE`s
appearing in biomedical knowledge graphs. Unfortunately, the semantics
of all of the existing biomedical knowledge graphs are garbage, and
don't use standardized identifiers, so this isn't applicable for
anything built-in at the moment. An example for generating a graph where
this works is given.
Requirements:
```shell
python -m pip install pyobo bioontologies
```
Example with very tiny dataet:
```python
import numpy as np
from pykeen.datasets import EagerDataset
from pykeen.nn import BiomedicalCURIERepresentation
from pykeen.triples import TriplesFactory
triples = [
('uberon:0000004', 'ro:0002216', 'go:0007608'),
]
triples = TriplesFactory.from_labeled_triples(np.array(triples))
dataset = EagerDataset(triples, triples, triples)
dataset.summarize()
entity_representations = BiomedicalCURIERepresentation.from_dataset(dataset=dataset, encoder="transformer")
print(entity_representations)
```
Example with full training:
```python
from pykeen.datasets import get_dataset
from pykeen.models import ERModel
from pykeen.nn import BiomedicalCURIERepresentation
from pykeen.pipeline import pipeline
import bioontologies
# Generate graph dataset from the Monarch Disease Ontology (MONDO)
obograph = bioontologies.get_obograph_by_prefix("mondo").squeeze(standardize=True)
triples = (edge.as_tuple() for edge in graph.obograph)
triples = [t for t in triples if all(t)]
triples_factory = TriplesFactory.from_labeled_triples(np.array(triples))
dataset = Dataset.from_tf(triples_factory)
entity_representations = BiomedicalCURIERepresentation.from_dataset(dataset=dataset, encoder="transformer")
result = pipeline(
dataset=dataset,
model=ERModel,
model_kwargs=dict(
interaction="distmult",
entity_representations=entity_representations,
relation_representation_kwargs=dict(
shape=entity_representations.shape,
),
),
)
```
---------
Co-authored-by: Max Berrendorf <[email protected]>
0 commit comments