Ankit Satpute* , Andre Greiner-Petter , Noah Gießing , Olaf Teschke, Moritz Schubotz, Akiko Aizawa and Bela Gipp.
*Corresponding Author
This repository contains an aspect-aware content-based research paper recommendation system code and dataset designed specifically for mathematics, where relatedness is often conceptual rather than based on textual similarity or citation overlap. Unlike existing approaches that work well in domains such as computer science or biomedicine, this project addresses the unique challenges of mathematical literature by modeling connections through shared proof techniques, logical implications, and theoretical generalizations. The project introduces GoldRiM and SilverRiM, the first datasets for aspect-aware mathematical paper recommendation, and presents AchGNN, an aspect-conditioned heterogeneous graph neural network that integrates textual semantics, citation networks, and author relationships. Experimental results show that AchGNN consistently outperforms prior recommendation methods across multiple aspects and also generalizes effectively to machine learning literature. The system has been deployed on the MaRDI platform An example document with recommendations to support mathematical research discovery.
This repository includes the complete source code (src), datasets (data), and supplementary materials (material).
This repository includes scripts to obtain and prepare the two datasets:
- SilverRiM: Automatically generated aspect-aware recommendations
- GoldRiM: Gold standard manually curated recommendations
See data/README.md for detailed setup instructions and dataset descriptions.
We offer supplementary materials in (material), such as a summary table of existing CbRPR datasets, alternative visualizations of the results, definitions of aspect labels, and additional plots.
All source code is available in src. Any experiments, scripts, or other attempts to reproduce the data require to load the python virtual environment below.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtAll results from the paper can be reproduced using the scripts below:
| Figure/Table | Script | Description |
|---|---|---|
| Figure 3 | python src/data_stats/n_gram_overlap.py |
N-gram overlap analysis |
| Figure 4 | python src/data_stats/cosScores_overlap.py |
Cosine similarity scores (pre-computed; run src/eval/createIndexes.py for fresh scores) |
| Table 1 | python src/data_stats/print_data_stats.py |
Dataset statistics (requires loaded dataframes from data/) |
| Table 2 | bash src/get_eval_goldRiM_silverRiM.sh |
GoldRiM vs SilverRiM evaluation |
| Table 3 | bash src/get_eval_pwc.sh |
Papers with Code evaluation |
| Figures 7–8 | bash src/ablation_.sh |
Ablation study results |
If you're using or referring to our paper in your research or applications, please cite using this BibTeX:
@inproceedings{Satpute2026,
title = {Aspect-Aware Content-Based Recommendations for Mathematical Research Papers},
author = {Satpute, Ankit and Greiner-Petter, Andre and Giessing, Noah and Teschke, Olaf and Schubotz, Moritz and Aizawa, Akiko and Gipp, Bela},
year = 2026,
month = {July.},
booktitle = {Proceedings of 49th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’26)},
publisher = {ACM},
address = {Melbourne | Naarm, Australia},
topic = {rec}
}CC-BY-SA 4.0. The dataset includes non-copyrighted bibliographic metadata and reference data derived from I4OSC (CC0).