datasets/
CIFAR100/
toy-dataset, 100 classes, 6-level hierarchyclasses/
contains classes.txt (each line is a class name)descriptions/
contains descriptions of classes generated by writersembeddings/
contains embeddings of descriptions generated by embeddersencodings/
contains encodings (i.e. targets) generated from hierarchy or from embeddings by encodershierarchy/
contains various representations of the hierarchyinputs/
empty directory to fill with the actual dataset (e.g. soft link to a directory containing images)
iNaturalist19/
1010 classes, 8-level hierarchy- ...
tieredImageNet/
608 classes, 13-level hierarchy- ...
embedders/
takes descriptions and produce embeddingsencoders/
takes embeddings or hierarchy and produce encodingswriters/
generate descriptions of classes from classes.txt
-
python utils/hierarchy.py
generate lca.npy and hierarchy.npy for all datasets -
python utils/classes.py
generate classes.txt for all datasets -
python writers/[writer].py --dataset [DATASET]
generate description of class names given the dataset -
python embedders/[embedders].py --dataset [DATASET] --writer [WRITER]
generate embeddings from descriptions given the dataset and the writer -
python encoders/[encoder].py --dataset [DATASET] {--writer [WRITER]}
generate encodings from descriptions embeddings or from hierarchy give the dataset -
bash encode.sh [DATASET]
provide a wrapper to encode a dataset using all encoders and fixed hyperparameters.
Suppose that you have a CIFAR100 dataset downloaded by torchvision at
/data/user/dataset-inputs/cifar-100-python
. To generate the softlink use
ln -s /data/user/dataset-inputs/cifar-100-python datasets/CIFAR100/inputs/cifar-100-python
Using softlinks is a convenient way to have all the datasets in one location, allowing you to access them from different locations without the need to copy them.
Softlinks can also be defined for folder-datasets, e.g. iNaturalist19:
ln -s /data/user/datasets-inputs/iNaturalist19/train datasets/iNaturalist19/inputs/train
ln -s /data/user/datasets-inputs/iNaturalist19/val datasets/iNaturalist19/inputs/val
ln -s /data/user/datasets-inputs/iNaturalist19/test datasets/iNaturalist19/inputs/test
Encodings can visualize by projecting them onto 2D space using some algorithm for dimensionality reduction. Here we provide projections using UMAP which can be explored using
python projectors/explore.py --dataset CIFAR100
This will spawn a web interface at http://localhost:8050/ where you can select the encoding and color them at different levels of the hierarchy by moving a slider.