PyKEEN uses a combination of techniques to promote efficient calculations during training/evaluation and tries to maximize the utilization of the available hardware (currently focused on single GPU usage).
Entities and relations in triples are usually stored as strings. Because KGEMs aim at learning vector representations for these entities and relations such that the chosen interaction function learns a useful scoring on top of them, we need a mapping from the string representations to vectors. Moreover, for computational efficiency, we would like to store all entity/relation embeddings in matrices. Thus, the mapping process comprises two parts: Mapping strings to IDs, and using the IDs to access the embeddings (=row indices).
In PyKEEN, the mapping process takes place in :class:`pykeen.triples.TriplesFactory`. The triples factory maintains
the sets of unique entity and relation labels and ensures that they are mapped to unique integer IDs on
pykeen.triples.TriplesFactory.entity_label_to_id
and
:data:pykeen.triples.TriplesFactory.relation_label_to_id
.
To improve the performance, the mapping process takes place only once, and the ID-based
triples are stored in a tensor :data:pykeen.triples.TriplesFactory.mapped_triples
.
Interaction functions are usually only given for the standard case of scoring a single triple
For example, we want to rank all entities for a single tuple
To make this technique possible, PyKEEN models have to provide an explicit broadcasting function via following methods in the model class:
- :func:`pykeen.models.base.Model.score_h` - Scoring all possible head entities for a given
tuple - :func:`pykeen.models.base.Model.score_r` - Scoring all possible relations for a given
tuple - :func:`pykeen.models.base.Model.score_t` - Scoring all possible tail entities for a given
tuple
The PyKEEN architecture natively supports these methods and makes use of this technique wherever possible without any additional modifications. Providing these methods is completely optional and not required when implementing new models.
In this example, it is given a knowledge graph
Two calculations are performed for each test triple
-
is combined with all possible tail entities to make triples -
is combined with all possible head entities to make triples
Finally, the ranking of
In the filtered setting,
While this easily defined theoretically, it poses several practical challenges.
For example, it leads to the computational challenge that all new possible triples
To obtain very fast filtering, PyKEEN combines the technique presented above in :ref:`entity_and_relation_ids` and :ref:`tuple_broadcasting` together with the following mechanism, which in our case has led to a 600,000 fold increase in speed for the filtered evaluation compared to the mechanisms used in previous versions.
As a starting point, PyKEEN will always compute scores for all triples in
Following, the sparse filters
- Take
and compare it to the relations of all triples in the train dataset, leading to a boolean vector of the size of number of triples contained in the train dataset, being true where any triple had the relation - Take
and compare it to the head entities of all triples in the train dataset, leading to a boolean vector of the size of number of triples contained in the train dataset, being true where any triple had the head entity - Combine both boolean vectors, leading to a boolean vector of the size of number of triples contained in the train
dataset, being true where any triple had both the head entity
and the relation - Convert the boolean vector to a non-zero index vector, stating at which indices the train dataset contains triples
that contain both the head entity h and the relation
, having the size of the number of non-zero elements - The index vector is now applied on the tail entity column of the train dataset, returning all tail entity IDs
that combined with and lead to triples contained in the train dataset - Finally, the
tail entity ID index vector is applied on the initially mentioned vector returned by :func:`pykeen.models.base.Model.score_t` for all possible triples and all affected scores are set to float('nan')
following the IEEE-754 specification, which makes these scores non-comparable, effectively leading to the score vector for all possible novel triples.
With growing model and dataset sizes the KGEM at hand is likely to exceed the memory provided by GPUs. Especially during training it might be desired to train using a certain batch size. When this batch size is too big for the hardware at hand, PyKEEN allows to set a sub-batch size in the range of [1, \text{batch_size}]. When the sub-batch size is set, PyKEEN automatically accumulates the gradients after each sub-batch and clears the computational graph during training. This allows to train KGEMs on GPU that otherwise would be too big for the hardware at hand, while the obtained results are identical to training without sub-batching.
Note
In order to guarantee equivalent results, not all models support sub-batching, since certain components, e.g. batch normalization, require the entire batch to be calculated in one pass to avoid altering statistics.
Note
Sub-batching is sometimes also called Gradient Accumulation, e.g., by huggingface's transformer library, since we accumulate the gradients over multiple sub-batches before updating the parameters.
For some large configurations, even after applying the sub-batching trick, out-of-memory errors may still occur. In this case, PyKEEN implements another technique, called slicing. Note that we often compute more than one score for each batch element: in sLCWA, we have 1 + \text{num_negative_samples} scores, and in LCWA, we have \text{num_entities} scores for each batch element. In slicing, we do not compute all of these scores at once, but rather in smaller "batches". For old-style models, i.e., those subclassing from :class:`pykeen.models.base._OldAbstractModel`, this has to be implemented individually for each of them. New-style models, i.e., those deriving from :class:`pykeen.models.nbase.ERModel` have a generic implementation enabling slicing for all interactions.
Note
Slicing computes the scores in smaller batches, but still needs to compute the gradient over all scores, since some loss functions require access to them.
Allowing high computational throughput while ensuring that the available hardware memory is not exceeded during training and evaluation requires the knowledge of the maximum possible training and evaluation batch size for the current model configuration. However, determining the training and evaluation batch sizes is a tedious process, and not feasible when a large set of heterogeneous experiments are run. Therefore, PyKEEN has an automatic memory optimization step that computes the maximum possible training and evaluation batch sizes for the current model configuration and available hardware before the actual calculation starts. If the user-provided batch size is too large for the used hardware, the automatic memory optimization determines the maximum sub-batch size for training and accumulates the gradients with the above described process :ref:`sub_batching`. The batch sizes are determined using binary search taking into consideration the CUDA architecture which ensures that the chosen batch size is the most CUDA efficient one.
Usually the evaluation is performed on the GPU for faster speeds. In addition, users might choose a batch size upfront in their evaluation configuration to fully utilize the GPU to achieve the fastest evaluation speeds possible. However, during larger setups testing different model configurations and dataset partitions such as e.g. HPO the hardware requirements might change drastically, which might cause that the evaluation no longer can be run with the pre-set batch size or not on the GPU at all for larger datasets and memory intense models. Since PyKEEN will abide by the user configurations, the evaluation will crash in these cases even though the training finished successfully and thus loose the progress achieved and/or leave trials unfinished. Given that the batch size and the device have no impact on the evaluation results, PyKEEN offers a way to overcome this problem through the evaluation fallback option of the pipeline. This will cause the evaluation to fall back to using a smaller batch size in cases where the evaluation failed using the GPU with a set batch size and in the last instance to evaluate on the CPU, if even the smallest possible batch size is too big for the GPU. Note: This can lead to significantly longer evaluation times in cases where the evaluation falls back to using the CPU.