Skip to content

Commit ad449ec

Browse files
cthoytsbonner0PyKEEN-bot
authoredFeb 13, 2021
♻️ 💉 Update docs and automate doctests (pykeen#291)
* Update docs * Update more examples * Add doctests * Update doctests * Add doctests to GHA Trigger CI * Update tutorials * Pass mypy * Update README acknowledgement * Update license year * Update AUTHORS.md Add link to GitHub authors Trigger CI * Update typing.py Co-Authored-By: Stephen Bonner <[email protected]> * Update typing.py Trigger CI Co-Authored-By: Stephen Bonner <[email protected]> * Update checkpoints.rst Trigger CI Co-Authored-By: Stephen Bonner <[email protected]> * Split out doctests Trigger CI * Bump version: 1.2.0-dev → 1.2.0 * Bump version: 1.2.0 → 1.2.1-dev * Bump versions Trigger CI following the previous release kerfuffle... * Trigger CI Co-authored-by: Stephen Bonner <[email protected]> Co-authored-by: PyKEEN_bot <[email protected]>
1 parent 8616369 commit ad449ec

20 files changed

+292
-317
lines changed
 

‎.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 1.2.0-dev
2+
current_version = 1.3.0-dev
33
commit = True
44
tag = False
55
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(?:-(?P<release>[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?(?:\+(?P<build>[0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?

‎.github/workflows/tests.yml

+2
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,8 @@ jobs:
8585
run: tox -e py
8686
- name: Run slow tests
8787
run: tox -e integration
88+
- name: Run doctests
89+
run: tox -e doctests
8890
windows:
8991
if: "contains(github.event.head_commit.message, 'Trigger CI')"
9092
name: Windows

‎.github/workflows/tests_master.yml

+2
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ jobs:
8484
run: tox -e py
8585
- name: Run slow tests
8686
run: tox -e integration
87+
- name: Run doctests
88+
run: tox -e doctests
8789
windows:
8890
if: "!contains(github.event.head_commit.message, 'skip ci')"
8991
name: Windows

‎.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -117,3 +117,4 @@ docs/source/api/*
117117
scratch/*
118118
wandb/*
119119
mlruns
120+
doctests/

‎AUTHORS.md

+2
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,5 @@
1616
- [Michael Galkin](https://github.com/migalkin)
1717
- [Felix Hamann](https://github.com/kantholtz)
1818
- [Sankranti Joshi](https://github.com/sunny1401)
19+
20+
See also: https://github.com/pykeen/pykeen/graphs/contributors

‎LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2019-2020 PyKEEN Project Team
3+
Copyright (c) 2019-2021 PyKEEN Project Team
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

‎README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ See [CONTRIBUTING.md](/CONTRIBUTING.md) for more information on getting involved
300300
This project has been supported by several organizations (in alphabetical order):
301301

302302
- [Bayer](https://www.bayer.com/)
303-
- [Enveda Therapeutics](https://envedatherapeutics.com/)
303+
- [Enveda Biosciences](https://www.envedabio.com/)
304304
- [Fraunhofer Institute for Algorithms and Scientific Computing](https://www.scai.fraunhofer.de)
305305
- [Fraunhofer Institute for Intelligent Analysis and Information Systems](https://www.iais.fraunhofer.de)
306306
- [Fraunhofer Center for Machine Learning](https://www.cit.fraunhofer.de/de/zentren/maschinelles-lernen.html)

‎docs/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
author = 'PyKEEN Project Team'
5353

5454
# The full version, including alpha/beta/rc tags.
55-
release = '1.2.0-dev'
55+
release = '1.3.0-dev'
5656

5757
# The short X.Y version.
5858
parsed_version = re.match(

‎docs/source/reference/constants.rst

+3
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,6 @@ Constants
22
=========
33
.. automodule:: pykeen.constants
44
:members:
5+
6+
.. automodule:: pykeen.typing
7+
:members:

‎docs/source/tutorial/byod.rst

+97-121
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,26 @@
11
Bring Your Own Data
22
===================
33
As an alternative to using a pre-packaged dataset, the training and testing can be set explicitly
4-
by file path or with instances of :class:`pykeen.triples.TriplesFactory`.
4+
by file path or with instances of :class:`pykeen.triples.TriplesFactory`. Throughout this
5+
tutorial, the paths to the training, testing, and validation sets for built-in
6+
:class:`pykeen.datasets.Nations` will be used as examples.
57

68
Pre-stratified Dataset
79
----------------------
810
You've got a training and testing file as 3-column TSV files, all ready to go. You're sure that there aren't
911
any entities or relations appearing in the testing set that don't appear in the training set. Load them in the
1012
pipeline like this:
1113

12-
.. code-block:: python
13-
14-
from pykeen.triples import TriplesFactory
15-
from pykeen.pipeline import pipeline
16-
17-
training_path: str = ...
18-
testing_path: str = ...
19-
20-
result = pipeline(
21-
training_triples_factory=training_path,
22-
testing_triples_factory=testing_path,
23-
model='TransE',
24-
)
25-
result.save_to_directory('test_pre_stratified_transe')
14+
>>> from pykeen.triples import TriplesFactory
15+
>>> from pykeen.pipeline import pipeline
16+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH, NATIONS_TEST_PATH
17+
>>> result = pipeline(
18+
... training=NATIONS_TRAIN_PATH,
19+
... testing=NATIONS_TEST_PATH,
20+
... model='TransE',
21+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
22+
... )
23+
>>> result.save_to_directory('doctests/test_pre_stratified_transe')
2624

2725
PyKEEN will take care of making sure that the entities are mapped from their labels to appropriate integer
2826
(technically, 0-dimensional :class:`torch.LongTensor`) indexes and that the different sets of triples
@@ -31,68 +29,54 @@ share the same mapping.
3129
This is equally applicable for the :func:`pykeen.hpo.hpo_pipeline`, which has a similar interface to
3230
the :func:`pykeen.pipeline.pipeline` as in:
3331

34-
.. code-block:: python
35-
36-
from pykeen.triples import TriplesFactory
37-
from pykeen.hpo import hpo_pipeline
38-
39-
training_path: str = ...
40-
testing_path: str = ...
41-
42-
result = hpo_pipeline(
43-
n_trials=30,
44-
training_triples_factory=training_path,
45-
testing_triples_factory=testing_path,
46-
model='TransE',
47-
)
48-
result.save_to_directory('test_hpo_pre_stratified_transe')
32+
>>> from pykeen.hpo import hpo_pipeline
33+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH, NATIONS_TEST_PATH, NATIONS_VALIDATE_PATH
34+
>>> result = hpo_pipeline(
35+
... n_trials=3, # you probably want more than this
36+
... training=NATIONS_TRAIN_PATH,
37+
... testing=NATIONS_TEST_PATH,
38+
... validation=NATIONS_VALIDATE_PATH,
39+
... model='TransE',
40+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
41+
... )
42+
>>> result.save_to_directory('doctests/test_hpo_pre_stratified_transe')
4943

5044
The remainder of the examples will be for :func:`pykeen.pipeline.pipeline`, but all work exactly the same
5145
for :func:`pykeen.hpo.hpo_pipeline`.
5246

5347
If you want to add dataset-wide arguments, you can use the ``dataset_kwargs`` argument
5448
to the :class:`pykeen.pipeline.pipeline` to enable options like ``create_inverse_triples=True``.
5549

56-
.. code-block:: python
57-
58-
from pykeen.triples import TriplesFactory
59-
from pykeen.pipeline import pipeline
60-
61-
training_path: str = ...
62-
testing_path: str = ...
63-
64-
result = pipeline(
65-
training_triples_factory=training_path,
66-
testing_triples_factory=testing_path,
67-
dataset_kwargs={'create_inverse_triples': True},
68-
model='TransE',
69-
)
70-
result.save_to_directory('test_pre_stratified_transe')
50+
>>> from pykeen.pipeline import pipeline
51+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH, NATIONS_TEST_PATH
52+
>>> result = pipeline(
53+
... training=NATIONS_TRAIN_PATH,
54+
... testing=NATIONS_TEST_PATH,
55+
... dataset_kwargs={'create_inverse_triples': True},
56+
... model='TransE',
57+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
58+
... )
59+
>>> result.save_to_directory('doctests/test_pre_stratified_transe')
7160

7261
If you want finer control over how the triples are created, for example, if they are not all coming from
7362
TSV files, you can use the :class:`pykeen.triples.TriplesFactory` interface.
7463

75-
.. code-block:: python
76-
77-
from pykeen.triples import TriplesFactory
78-
from pykeen.pipeline import pipeline
79-
80-
training_path: str = ...
81-
testing_path: str = ...
82-
83-
training = TriplesFactory(path=training_path)
84-
testing = TriplesFactory(
85-
path=testing_path,
86-
entity_to_id=training.entity_to_id,
87-
relation_to_id=training.relation_to_id,
88-
)
89-
90-
result = pipeline(
91-
training_triples_factory=training,
92-
testing_triples_factory=testing,
93-
model='TransE',
94-
)
95-
pipeline_result.save_to_directory('test_pre_stratified_transe')
64+
>>> from pykeen.triples import TriplesFactory
65+
>>> from pykeen.pipeline import pipeline
66+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH, NATIONS_TEST_PATH
67+
>>> training = TriplesFactory.from_path(NATIONS_TRAIN_PATH)
68+
>>> testing = TriplesFactory.from_path(
69+
... NATIONS_TEST_PATH,
70+
... entity_to_id=training.entity_to_id,
71+
... relation_to_id=training.relation_to_id,
72+
... )
73+
>>> result = pipeline(
74+
... training=training,
75+
... testing=testing,
76+
... model='TransE',
77+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
78+
... )
79+
>>> result.save_to_directory('doctests/test_pre_stratified_transe')
9680

9781
.. warning::
9882

@@ -106,31 +90,26 @@ The ``dataset_kwargs`` argument is ignored when passing your own :class:`pykeen.
10690
sure to include the ``create_inverse_triples=True`` in the instantiation of those classes if that's your
10791
desired behavior as in:
10892

109-
.. code-block:: python
110-
111-
from pykeen.triples import TriplesFactory
112-
from pykeen.pipeline import pipeline
113-
114-
training_path: str = ...
115-
testing_path: str = ...
116-
117-
training = TriplesFactory(
118-
path=training_path,
119-
create_inverse_triples=True,
120-
)
121-
testing = TriplesFactory(
122-
path=testing_path,
123-
entity_to_id=training.entity_to_id,
124-
relation_to_id=training.relation_to_id,
125-
create_inverse_triples=True,
126-
)
127-
128-
result = pipeline(
129-
training_triples_factory=training,
130-
testing_triples_factory=testing,
131-
model='TransE',
132-
)
133-
result.save_to_directory('test_pre_stratified_transe')
93+
>>> from pykeen.triples import TriplesFactory
94+
>>> from pykeen.pipeline import pipeline
95+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH, NATIONS_TEST_PATH
96+
>>> training = TriplesFactory.from_path(
97+
... NATIONS_TRAIN_PATH,
98+
... create_inverse_triples=True,
99+
... )
100+
>>> testing = TriplesFactory.from_path(
101+
... NATIONS_TEST_PATH,
102+
... entity_to_id=training.entity_to_id,
103+
... relation_to_id=training.relation_to_id,
104+
... create_inverse_triples=True,
105+
... )
106+
>>> result = pipeline(
107+
... training=training,
108+
... testing=testing,
109+
... model='TransE',
110+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
111+
... )
112+
>>> result.save_to_directory('doctests/test_pre_stratified_transe')
134113

135114
Triples factories can also be instantiated using the ``triples`` keyword argument instead of the ``path`` argument
136115
if you already have triples loaded in a :class:`numpy.ndarray`.
@@ -141,37 +120,34 @@ It's more realistic your real-world dataset is not already stratified into train
141120
PyKEEN has you covered with :func:`pykeen.triples.TriplesFactory.split`, which will allow you to create
142121
a stratified dataset.
143122

144-
.. code-block:: python
145-
146-
from pykeen.triples import TriplesFactory
147-
from pykeen.pipeline import pipeline
148-
149-
tf = TriplesFactory(path=...)
150-
training, testing = tf.split()
151-
152-
result = pipeline(
153-
training_triples_factory=training,
154-
testing_triples_factory=testing,
155-
model='TransE',
156-
)
157-
pipeline_result.save_to_directory('test_unstratified_transe')
123+
>>> from pykeen.triples import TriplesFactory
124+
>>> from pykeen.pipeline import pipeline
125+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH
126+
>>> tf = TriplesFactory.from_path(NATIONS_TRAIN_PATH)
127+
>>> training, testing = tf.split()
128+
>>> result = pipeline(
129+
... training=training,
130+
... testing=testing,
131+
... model='TransE',
132+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go higher
133+
... )
134+
>>> result.save_to_directory('doctests/test_unstratified_transe')
158135

159136
By default, this is an 80/20 split. If you want to use early stopping, you'll also need a validation set, so
160137
you should specify the splits:
161138

162-
.. code-block:: python
163-
164-
from pykeen.triples import TriplesFactory
165-
from pykeen.pipeline import pipeline
166-
167-
tf = TriplesFactory(path=...)
168-
training, testing, validation = tf.split([.8, .1, .1])
169-
170-
result = pipeline(
171-
training_triples_factory=training,
172-
testing_triples_factory=testing,
173-
validation_triples_factory=validation,
174-
model='TransE',
175-
stopper='early',
176-
)
177-
pipeline_result.save_to_directory('test_unstratified_stopped_transe')
139+
>>> from pykeen.triples import TriplesFactory
140+
>>> from pykeen.pipeline import pipeline
141+
>>> from pykeen.datasets.nations import NATIONS_TRAIN_PATH
142+
>>> tf = TriplesFactory.from_path(NATIONS_TRAIN_PATH)
143+
>>> training, testing, validation = tf.split([.8, .1, .1])
144+
>>> result = pipeline(
145+
... training=training,
146+
... testing=testing,
147+
... validation=validation,
148+
... model='TransE',
149+
... stopper='early',
150+
... training_kwargs=dict(num_epochs=5), # short epochs for testing - you should go
151+
... # higher, especially with early stopper enabled
152+
... )
153+
>>> result.save_to_directory('doctests/test_unstratified_stopped_transe')

‎docs/source/tutorial/checkpoints.rst

+99-126
Original file line numberDiff line numberDiff line change
@@ -17,55 +17,46 @@ Regular Checkpoints
1717
The tutorial :ref:`first_steps` showed how the :func:`pykeen.pipeline.pipeline` function can be used to set up an entire
1818
KGEM for training and evaluation in just two lines of code. A slightly extended example is shown below:
1919

20-
.. code-block:: python
21-
22-
from pykeen.pipeline import pipeline
23-
24-
pipeline_result = pipeline(
25-
dataset='Nations',
26-
model='TransE',
27-
optimizer='Adam',
28-
training_kwargs=dict(
29-
num_epochs=1000,
30-
),
31-
)
20+
>>> from pykeen.pipeline import pipeline
21+
>>> pipeline_result = pipeline(
22+
... dataset='Nations',
23+
... model='TransE',
24+
... optimizer='Adam',
25+
... training_kwargs=dict(
26+
... num_epochs=1000,
27+
... ),
28+
... )
3229

3330
To enable checkpoints, all you have to do is add a ``checkpoint_name`` argument to the ``training_kwargs``.
3431
This argument should have the name you would like the checkpoint files saved on your computer to be called.
3532

36-
.. code-block:: python
37-
38-
from pykeen.pipeline import pipeline
39-
40-
pipeline_result = pipeline(
41-
dataset='Nations',
42-
model='TransE',
43-
optimizer='Adam',
44-
training_kwargs=dict(
45-
num_epochs=1000,
46-
checkpoint_name='my_checkpoint.pt',
47-
),
48-
)
33+
>>> from pykeen.pipeline import pipeline
34+
>>> pipeline_result = pipeline(
35+
... dataset='Nations',
36+
... model='TransE',
37+
... optimizer='Adam',
38+
... training_kwargs=dict(
39+
... num_epochs=1000,
40+
... checkpoint_name='my_checkpoint.pt',
41+
... ),
42+
... )
4943

5044
Furthermore, you can set the checkpoint frequency, i.e. how often checkpoints should be saved given in minutes, by
5145
setting the argument ``checkpoint_frequency`` with an integer. The default frequency is 30 minutes and setting it to
5246
``0`` will cause the training loop to save a checkpoint after each epoch.
5347
Let's look at an example.
5448

55-
.. code-block:: python
56-
57-
from pykeen.pipeline import pipeline
58-
59-
pipeline_result = pipeline(
60-
dataset='Nations',
61-
model='TransE',
62-
optimizer='Adam',
63-
training_kwargs=dict(
64-
num_epochs=1000,
65-
checkpoint_name='my_checkpoint.pt',
66-
checkpoint_frequency=5,
67-
),
68-
)
49+
>>> from pykeen.pipeline import pipeline
50+
>>> pipeline_result = pipeline(
51+
... dataset='Nations',
52+
... model='TransE',
53+
... optimizer='Adam',
54+
... training_kwargs=dict(
55+
... num_epochs=1000,
56+
... checkpoint_name='my_checkpoint.pt',
57+
... checkpoint_frequency=5,
58+
... ),
59+
... )
6960

7061
Here we have defined a pipeline that will save training loop checkpoints in the checkpoint file called
7162
``my_checkpoint.pt`` every time an epoch finishes and at least `5` minutes have passed since saving previously.
@@ -78,20 +69,17 @@ or the early stopper stops it. Assuming that you successfully trained the KGEM a
7869
that you would like to test the model with `2000` epochs, all you have to do is to change the number of epochs and
7970
execute the code like:
8071

81-
.. code-block:: python
82-
83-
from pykeen.pipeline import pipeline
84-
85-
pipeline_result = pipeline(
86-
dataset='Nations',
87-
model='TransE',
88-
optimizer='Adam',
89-
training_kwargs=dict(
90-
num_epochs=2000, # more epochs than before
91-
checkpoint_name='my_checkpoint.pt',
92-
checkpoint_frequency=5,
93-
),
94-
)
72+
>>> from pykeen.pipeline import pipeline
73+
>>> pipeline_result = pipeline(
74+
... dataset='Nations',
75+
... model='TransE',
76+
... optimizer='Adam',
77+
... training_kwargs=dict(
78+
... num_epochs=2000, # more epochs than before
79+
... checkpoint_name='my_checkpoint.pt',
80+
... checkpoint_frequency=5,
81+
... ),
82+
... )
9583

9684
The above code will load the saved state after finishing `1000` epochs and continue to train to `2000` epochs, giving
9785
the exact same results as if you would have run it for `2000` epochs in the first place.
@@ -101,20 +89,17 @@ which is a subdirectory in your home directory, e.g. ``~/.data/pykeen/checkpoint
10189
Optionally, you can set the path to where you want the checkpoints to be saved by setting the ``checkpoint_directory``
10290
argument with a string or a :class:`pathlib.Path` object containing your desired root path, as shown in this example:
10391

104-
.. code-block:: python
105-
106-
from pykeen.pipeline import pipeline
107-
108-
pipeline_result = pipeline(
109-
dataset='Nations',
110-
model='TransE',
111-
optimizer='Adam',
112-
training_kwargs=dict(
113-
num_epochs=2000,
114-
checkpoint_name='my_checkpoint.pt',
115-
checkpoint_directory='/my/secret/dir',
116-
),
117-
)
92+
>>> from pykeen.pipeline import pipeline
93+
>>> pipeline_result = pipeline(
94+
... dataset='Nations',
95+
... model='TransE',
96+
... optimizer='Adam',
97+
... training_kwargs=dict(
98+
... num_epochs=2000,
99+
... checkpoint_name='my_checkpoint.pt',
100+
... checkpoint_directory='doctests/checkpoint_dir',
101+
... ),
102+
... )
118103

119104
.. _failure_checkpoints_how_to:
120105

@@ -123,16 +108,16 @@ Checkpoints on Failure
123108
In cases where you only would like to save checkpoints whenever the training loop might fail, you can use the argument
124109
``checkpoint_on_failure=True``, like:
125110

126-
.. code-block:: python
127-
128-
from pykeen.pipeline import pipeline
129-
130-
pipeline_result = pipeline(
131-
dataset='Nations',
132-
model='TransE',
133-
optimizer='Adam',
134-
training_kwargs=dict(num_epochs=2000, checkpoint_on_failure=True),
135-
)
111+
>>> from pykeen.pipeline import pipeline
112+
>>> pipeline_result = pipeline(
113+
... dataset='Nations',
114+
... model='TransE',
115+
... optimizer='Adam',
116+
... training_kwargs=dict(
117+
... num_epochs=2000,
118+
... checkpoint_on_failure=True,
119+
... ),
120+
... )
136121

137122
This option differs from regular checkpoints, since regular checkpoints are only saved
138123
after a successful epoch. When saving checkpoints due to failure of the training loop there is no guarantee that all
@@ -141,19 +126,17 @@ specific training loop. Therefore, these checkpoints are saved with a distinct c
141126
``PyKEEN_just_saved_my_day_{datetime}.pt`` in the given ``checkpoint_directory``, even when you also opted to use
142127
regular checkpoints as defined above, e.g. with this code:
143128

144-
.. code-block:: python
145-
146-
from pykeen.pipeline import pipeline
147-
pipeline_result = pipeline(
148-
dataset='Nations',
149-
model='TransE',
150-
optimizer='Adam',
151-
training_kwargs=dict(
152-
num_epochs=2000,
153-
checkpoint_name='my_checkpoint.pt',
154-
checkpoint_on_failure=True,
155-
),
156-
)
129+
>>> from pykeen.pipeline import pipeline
130+
>>> pipeline_result = pipeline(
131+
... dataset='Nations',
132+
... model='TransE',
133+
... optimizer='Adam',
134+
... training_kwargs=dict(
135+
... num_epochs=2000,
136+
... checkpoint_name='my_checkpoint.pt',
137+
... checkpoint_on_failure=True,
138+
... ),
139+
... )
157140

158141
Note: Use this argument with caution, since every failed training loop will create a distinct checkpoint file.
159142

@@ -193,21 +176,17 @@ the same compared to running uninterrupted without checkpoints, also for the eva
193176

194177
To show how to use the checkpoint functionality without the pipeline, we define a KGEM first:
195178

196-
.. code-block:: python
197-
198-
from pykeen.models import TransE
199-
from pykeen.training import SLCWATrainingLoop
200-
from pykeen.triples import TriplesFactory
201-
from torch.optim import Adam
202-
203-
triples_factory = Nations().training
204-
model = TransE(
205-
triples_factory=triples_factory,
206-
random_seed=123,
207-
)
208-
209-
optimizer = Adam(params=model.get_grad_params())
210-
training_loop = SLCWATrainingLoop(model=model, optimizer=optimizer)
179+
>>> from pykeen.models import TransE
180+
>>> from pykeen.training import SLCWATrainingLoop
181+
>>> from pykeen.triples import TriplesFactory
182+
>>> from torch.optim import Adam
183+
>>> triples_factory = Nations().training
184+
>>> model = TransE(
185+
... triples_factory=triples_factory,
186+
... random_seed=123,
187+
... )
188+
>>> optimizer = Adam(params=model.get_grad_params())
189+
>>> training_loop = SLCWATrainingLoop(model=model, optimizer=optimizer)
211190

212191
At this point we have a model, dataset and optimizer all setup in a training loop and are ready to train the model with
213192
the ``training_loop``'s method :func:`pykeen.training.TrainingLoop.train`. To enable checkpoints all you have to do is
@@ -222,13 +201,11 @@ argument with a string or a :class:`pathlib.Path` object containing your desired
222201

223202
Here is an example:
224203

225-
.. code-block:: python
226-
227-
losses = training_loop.train(
228-
num_epochs=1000,
229-
checkpoint_name='my_checkpoint.pt',
230-
checkpoint_frequency=5,
231-
)
204+
>>> losses = training_loop.train(
205+
... num_epochs=1000,
206+
... checkpoint_name='my_checkpoint.pt',
207+
... checkpoint_frequency=5,
208+
... )
232209

233210
With this code we have started the training loop with the above defined KGEM. The training loop will save a checkpoint
234211
in the ``my_checkpoint.pt`` file, which will be saved in the ``~/.data/pykeen/checkpoints/`` directory, since we haven't
@@ -249,26 +226,22 @@ E.g. the above training loop finished successfully after 1000 epochs, but you wo
249226
train the same model from that state for 2000 epochs. All you have have to do is to change the argument
250227
``num_epochs`` in the above code to:
251228

252-
.. code-block:: python
253-
254-
losses = training_loop.train(
255-
num_epochs=2000,
256-
checkpoint_name='my_checkpoint.pt',
257-
checkpoint_frequency=5,
258-
)
229+
>>> losses = training_loop.train(
230+
... num_epochs=2000,
231+
... checkpoint_name='my_checkpoint.pt',
232+
... checkpoint_frequency=5,
233+
... )
259234

260235
and now the training loop will resume from the state at 1000 epochs and continue to train until 2000 epochs.
261236

262237
As shown in :ref:`failure_checkpoints_how_to`, you can also save checkpoints only in cases where the
263238
training loop fails. To do this you just have to set the argument `checkpoint_on_failure=True`, like:
264239

265-
.. code-block:: python
266-
267-
losses = training_loop.train(
268-
num_epochs=2000,
269-
checkpoint_directory='/my/secret/dir',
270-
checkpoint_on_failure=True,
271-
)
240+
>>> losses = training_loop.train(
241+
... num_epochs=2000,
242+
... checkpoint_directory='/my/secret/dir',
243+
... checkpoint_on_failure=True,
244+
... )
272245

273246
This code will save a checkpoint in case the training loop fails. Note how we also chose a new checkpoint directory by
274247
setting the `checkpoint_directory` argument to ``/my/secret/dir``.

‎docs/source/tutorial/making_predictions.rst

+21-34
Original file line numberDiff line numberDiff line change
@@ -26,30 +26,22 @@ This example shows using the :func:`pykeen.pipeline.pipeline` to train a model
2626
which will already be in memory. Each of the high-level interfaces are exposed through the
2727
model:
2828

29-
.. code-block:: python
30-
31-
from pykeen.pipeline import pipeline
32-
33-
pipeline_result = pipeline(dataset='Nations', model='RotatE')
34-
model = pipeline_result.model
35-
36-
# Predict tails
37-
predicted_tails_df = model.get_tail_prediction_df('brazil', 'intergovorgs')
38-
39-
# Predict relations
40-
predicted_relations_df = model.get_relation_prediction_df('brazil', 'uk')
41-
42-
# Predict heads
43-
predicted_heads_df = model.get_head_prediction_df('conferences', 'brazil')
44-
45-
# Score all triples (memory intensive)
46-
predictions_df = model.get_all_prediction_df()
47-
48-
# Score top K triples
49-
predictions_df = model.get_all_prediction_df(k=150)
50-
51-
# save the model
52-
pipeline_result.save_to_directory('nations_rotate')
29+
>>> from pykeen.pipeline import pipeline
30+
>>> # Run the pipeline
31+
>>> pipeline_result = pipeline(dataset='Nations', model='RotatE')
32+
>>> model = pipeline_result.model
33+
>>> # Predict tails
34+
>>> predicted_tails_df = model.get_tail_prediction_df('brazil', 'intergovorgs')
35+
>>> # Predict relations
36+
>>> predicted_relations_df = model.get_relation_prediction_df('brazil', 'uk')
37+
>>> # Predict heads
38+
>>> predicted_heads_df = model.get_head_prediction_df('conferences', 'brazil')
39+
>>> # Score all triples (memory intensive)
40+
>>> predictions_df = model.get_all_prediction_df()
41+
>>> # Score top K triples
42+
>>> top_k_predictions_df = model.get_all_prediction_df(k=150)
43+
>>> # save the model
44+
>>> pipeline_result.save_to_directory('doctests/nations_rotate')
5345

5446
Loading a Model
5547
~~~~~~~~~~~~~~~
@@ -58,16 +50,11 @@ This example shows how to reload a previously trained model. The
5850
a file named ``trained_model.pkl``, so we will use the one from the
5951
previous example.
6052

61-
.. code-block:: python
62-
63-
import torch
64-
65-
model = torch.load('nations_rotate/trained_model.pkl')
66-
67-
# Predict tails
68-
predicted_tails_df = model.get_tail_prediction_df('brazil', 'intergovorgs')
69-
70-
# everything else is the same as above
53+
>>> import torch
54+
>>> model = torch.load('doctests/nations_rotate/trained_model.pkl')
55+
>>> # Predict tails
56+
>>> predicted_tails_df = model.get_tail_prediction_df('brazil', 'intergovorgs')
57+
>>> # everything else is the same as above
7158

7259
There's an example model available at
7360
https://github.com/pykeen/pykeen/blob/master/notebooks/hello_world/nations_transe/trained_model.pkl

‎src/pykeen/datasets/__init__.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def get_dataset(
125125
raise TypeError(f'Dataset is invalid type: {type(dataset)}')
126126

127127
if isinstance(training, str) and isinstance(testing, str):
128-
if isinstance(validation, str):
128+
if validation is None or isinstance(validation, str):
129129
return PathDataset(
130130
training_path=training,
131131
testing_path=testing,
@@ -146,7 +146,12 @@ def get_dataset(
146146
validation=validation,
147147
)
148148

149-
raise TypeError('Training and testing must both be given as strings or Triples Factories')
149+
raise TypeError(
150+
f'''Training and testing must both be given as strings or Triples Factories.
151+
- Training: {type(training)}: {training}
152+
- Testing: {type(testing)}: {testing}
153+
''',
154+
)
150155

151156

152157
def has_dataset(key: str) -> bool:

‎src/pykeen/datasets/base.py

+13-11
Original file line numberDiff line numberDiff line change
@@ -177,13 +177,12 @@ def testing(self) -> TriplesFactory: # type:ignore # noqa: D401
177177
return self._testing
178178

179179
@property
180-
def validation(self) -> TriplesFactory: # type:ignore # noqa: D401
180+
def validation(self) -> Optional[TriplesFactory]: # type:ignore # noqa: D401
181181
"""The validation triples factory that shares indices with the training triples factory."""
182182
if not self._loaded:
183183
self._load()
184184
if not self._loaded_validation:
185185
self._load_validation()
186-
assert self._validation is not None
187186
return self._validation
188187

189188
@property
@@ -224,7 +223,7 @@ def __init__(
224223
self,
225224
training_path: Union[str, TextIO],
226225
testing_path: Union[str, TextIO],
227-
validation_path: Union[str, TextIO],
226+
validation_path: Union[None, str, TextIO],
228227
eager: bool = False,
229228
create_inverse_triples: bool = False,
230229
load_triples_kwargs: Optional[Mapping[str, Any]] = None,
@@ -269,14 +268,17 @@ def _load_validation(self) -> None:
269268
# don't call this function by itself. assumes called through the `validation`
270269
# property and the _training factory has already been loaded
271270
assert self._training is not None
272-
self._validation = TriplesFactory.from_path(
273-
path=self.validation_path,
274-
entity_to_id=self._training.entity_to_id, # share entity index with training
275-
relation_to_id=self._training.relation_to_id, # share relation index with training
276-
# do not explicitly create inverse triples for testing; this is handled by the evaluation code
277-
create_inverse_triples=False,
278-
load_triples_kwargs=self.load_triples_kwargs,
279-
)
271+
if self.validation_path is None:
272+
self._validation = None
273+
else:
274+
self._validation = TriplesFactory.from_path(
275+
path=self.validation_path,
276+
entity_to_id=self._training.entity_to_id, # share entity index with training
277+
relation_to_id=self._training.relation_to_id, # share relation index with training
278+
# do not explicitly create inverse triples for testing; this is handled by the evaluation code
279+
create_inverse_triples=False,
280+
load_triples_kwargs=self.load_triples_kwargs,
281+
)
280282

281283
def __repr__(self) -> str: # noqa: D105
282284
return (

‎src/pykeen/datasets/dbpedia.py

+1-5
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,4 @@ def __init__(self, create_inverse_triples: bool = False, **kwargs):
4545

4646

4747
if __name__ == '__main__':
48-
_d = DBpedia50()
49-
_d.summarize()
50-
print(_d.training.triples[:5])
51-
print(_d.testing.triples[:5])
52-
print(_d.validation.triples[:5])
48+
DBpedia50().summarize()

‎src/pykeen/pipeline.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@
174174
import pickle
175175
import time
176176
from dataclasses import dataclass, field
177+
from pathlib import Path
177178
from typing import Any, Collection, Dict, Iterable, List, Mapping, Optional, Set, Type, Union
178179

179180
import pandas as pd
@@ -423,7 +424,12 @@ def _get_results(self) -> Mapping[str, Any]:
423424
results['stopper'] = self.stopper.get_summary_dict()
424425
return results
425426

426-
def save_to_directory(self, directory: str, save_metadata: bool = True, save_replicates: bool = True) -> None:
427+
def save_to_directory(
428+
self,
429+
directory: Union[str, Path],
430+
save_metadata: bool = True,
431+
save_replicates: bool = True,
432+
) -> None:
427433
"""Save all artifacts in the given directory."""
428434
os.makedirs(directory, exist_ok=True)
429435

‎src/pykeen/templates/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ See [CONTRIBUTING.md](/CONTRIBUTING.md) for more information on getting involved
202202
This project has been supported by several organizations (in alphabetical order):
203203

204204
- [Bayer](https://www.bayer.com/)
205-
- [Enveda Therapeutics](https://envedatherapeutics.com/)
205+
- [Enveda Biosciences](https://www.envedabio.com/)
206206
- [Fraunhofer Institute for Algorithms and Scientific Computing](https://www.scai.fraunhofer.de)
207207
- [Fraunhofer Institute for Intelligent Analysis and Information Systems](https://www.iais.fraunhofer.de)
208208
- [Fraunhofer Center for Machine Learning](https://www.cit.fraunhofer.de/de/zentren/maschinelles-lernen.html)

‎src/pykeen/typing.py

+21-11
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,20 @@
1212
'Hint',
1313
'Mutation',
1414
'OneOrSequence',
15-
# Others
15+
# Triples
1616
'LabeledTriples',
1717
'MappedTriples',
1818
'EntityMapping',
1919
'RelationMapping',
20+
# Others
21+
'DeviceHint',
22+
'TorchRandomHint',
23+
# Tensor Functions
2024
'Initializer',
2125
'Normalizer',
2226
'Constrainer',
2327
'cast_constrainer',
24-
'InteractionFunction',
25-
'DeviceHint',
26-
'TorchRandomHint',
28+
# Tensors
2729
'HeadRepresentation',
2830
'RelationRepresentation',
2931
'TailRepresentation',
@@ -34,6 +36,7 @@
3436

3537
X = TypeVar('X')
3638
Hint = Union[None, str, X]
39+
#: A function that mutates the input and returns a new object of the same type as output
3740
Mutation = Callable[[X], X]
3841
OneOrSequence = Union[X, Sequence[X]]
3942

@@ -42,25 +45,32 @@
4245
EntityMapping = Mapping[str, int]
4346
RelationMapping = Mapping[str, int]
4447

45-
# comment: TypeVar expects none, or at least two super-classes
46-
TensorType = TypeVar("TensorType", torch.Tensor, torch.FloatTensor)
47-
InteractionFunction = Callable[[TensorType, TensorType, TensorType], TensorType]
48-
49-
Initializer = Mutation[TensorType]
50-
Normalizer = Mutation[TensorType]
51-
Constrainer = Mutation[TensorType]
48+
#: A function that can be applied to a tensor to initialize it
49+
Initializer = Mutation[torch.FloatTensor]
50+
#: A function that can be applied to a tensor to normalize it
51+
Normalizer = Mutation[torch.FloatTensor]
52+
#: A function that can be applied to a tensor to constrain it
53+
Constrainer = Mutation[torch.FloatTensor]
5254

5355

5456
def cast_constrainer(f) -> Constrainer:
5557
"""Cast a constrainer function with :func:`typing.cast`."""
5658
return cast(Constrainer, f)
5759

5860

61+
#: A hint for a :class:`torch.device`
5962
DeviceHint = Hint[torch.device]
63+
#: A hint for a :class:`torch.Generator`
6064
TorchRandomHint = Hint[torch.Generator]
6165

66+
#: A type variable for head representations used in :class:`pykeen.models.Model`,
67+
#: :class:`pykeen.nn.modules.Interaction`, etc.
6268
HeadRepresentation = TypeVar("HeadRepresentation", bound=OneOrSequence[torch.FloatTensor])
69+
#: A type variable for relation representations used in :class:`pykeen.models.Model`,
70+
#: :class:`pykeen.nn.modules.Interaction`, etc.
6371
RelationRepresentation = TypeVar("RelationRepresentation", bound=OneOrSequence[torch.FloatTensor])
72+
#: A type variable for tail representations used in :class:`pykeen.models.Model`,
73+
#: :class:`pykeen.nn.modules.Interaction`, etc.
6474
TailRepresentation = TypeVar("TailRepresentation", bound=OneOrSequence[torch.FloatTensor])
6575

6676

‎src/pykeen/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
'get_git_hash',
1212
]
1313

14-
VERSION = '1.2.0-dev'
14+
VERSION = '1.3.0-dev'
1515

1616

1717
def get_git_hash() -> str:

‎tox.ini

+10
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ envlist =
1717
doc8
1818
docs
1919
# the actual tests
20+
doctests
2021
py
2122
integration
2223
# always keep coverage-report last
@@ -49,6 +50,15 @@ deps =
4950
extras =
5051
mlflow
5152

53+
[testenv:doctests]
54+
commands =
55+
# TODO make this automatic for all RST in a loop (but not using xargs since doctest uses multiprocessing)
56+
python -m doctest docs/source/tutorial/first_steps.rst
57+
python -m doctest docs/source/tutorial/byod.rst
58+
python -m doctest docs/source/tutorial/making_predictions.rst
59+
# python -m doctest src/pykeen/pipeline.py
60+
# python -m doctest src/pykeen/hpo/__init__.py
61+
5262
[testenv:coverage-clean]
5363
deps = coverage
5464
skip_install = true

0 commit comments

Comments
 (0)
Please sign in to comment.