Skip to content

🔥🔥🔥 KDD2024 Best Student Paper

Notifications You must be signed in to change notification settings


Repository files navigation

Dataset Regeneration for Sequential Recommendation (KDD'2024 Best Student Paper)

Framework overview of DR4SR & DR4SR+


Key idea of our work




The exact version used for the paper. Upgraded versions should also be applicable.

  • python == 3.9.19
  • torch == 1.13.1+cu117
  • seq2pat == 1.4.0
  • numpy == 1.26.4
  • scipy == 1.12.0

You can install these required packages by

conda create -n DR4SR python=3.9
conda activate DR4SR
pip install -r requirements.txt

Dataset preprocessing

We have uploaded the preprocessed datasets in dataset/. You can reproduced these preprocessed datasets with the following steps.

  1. Download the used Amazon and Yelp datasets and put them in dataset/.

  2. Preprocess these datasets with scripts in dataset/

  • Amazon: dataset/preprocess_amazon.ipynb
  • Yelp: dataset/preprocess_yelp.ipynb

Model-agnostic dataset regeneration

# 0. Select a target dataset, e.g., Amazon-toys

# 1. Build the pre-training dataset.
python --root_path $ROOT_PATH

# 2. Generate pre-trained item embeddings.
python --model SASRec --dataset $DATASET

# 3. Move the corresponding ckpt file to the dataset folder, and rename it to pre-trained_embedding.ckpt.
mv CKPT_FILE_PATH ${ROOT_PATH}/pre-trained_embedding.ckpt

# 4. Pre-train the data regenerator.
python --root_path $ROOT_PATH --K 5

# 5. Obtain regenerated dataset with hybrid inference.
# Note: This process can be greatly accelerated with multi-processing. 
# We will provide a clean implementation of multi-processing in the future.
python 3.Hybrid_inference --root_path $ROOT_PATH

# 6. (Optional) Transform datasets for FMLP with dataset/dataset_transform.ipynb

# 7. (DR4SR) Train a target model on regenerated dataset
# (Note 1)Please set the `train_file` option to '_regen' in the corresponding config file `configs/amazon-toys.yaml`.
# (Note 2) You can test the original dataset by setting `train_file` option to '_ori'
python -m SASRec -d amazon-toys

# 8. (DR4SR+) Train a target model on regenerated and personalized dataset. We should first change 'sub_model' option to one of the target models in `configs/metamodel.yaml`
python -m MetaModel -d amazon-toys

Note: We use post padding ([1,2,3] -> [1,2,3,0,0]) for all target models except FMLP. And we use pre padding for FMLP ([1,2,3] -> [0,0,1,2,3]), which is consistent with the original implementation of FMLP. This is because we find the previous pre-processing will lead to terrible results of FMLP. This may be related to property of the FFT operation. Therefore, we should run dataset/dataset_transform.ipynb to transform all datasets for FMLP.


  1. Poster presented in KDD2024.
  2. Slides presented in KDD2024.


If you find DR4SR useful, please cite it as:

  title={Dataset Regeneration for Sequential Recommendation},
  author={Yin, Mingjia and Wang, Hao and Guo, Wei and Liu, Yong and Zhang, Suojuan and Zhao, Sirui and Lian, Defu and Chen, Enhong},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},


This project is primarily built upon the foundation provided by RecStudio, which is a unified, highly-modularized and recommendation-efficient recommendation library based on PyTorch. The creation of the pre-training dataset for data regeneration relies on the capabilities of Seq2Pat. The implicit gradient optimization framework is modified from AuxiLearn. We extend our gratitude to the developers of these outstanding repositories for their dedicated efforts and contributions.


No releases published


No packages published