Official implementation of "LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs".
Overview of RAMP: Raw-text Anchored Message Passing.
Text-rich graphs integrate complex structural dependencies with abundant textual information. Conventional methods compress text into static embeddings before structural reasoning, creating an information bottleneck.
We introduce RAMP (Raw-text Anchored Message Passing), which recasts the LLM as a graph-native aggregation operator. RAMP anchors inference on each node's raw text while propagating optimized messages from neighbors, handling both discriminative and generative tasks under a unified formulation.
- [2026-03] Code and pretrained models released.
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.7+ (for GPU training)
# Clone the repository
git clone https://github.com/codefuse-ai/CodeFuse-RAMP.git
cd CodeFuse-RAMP
# Install dependencies
pip install -r requirements.txtWe provide pretrained RAMP weights based on Qwen2.5-7B-Instruct:
| Model | Base Model | Pretrain Data | Link |
|---|---|---|---|
| RAMP_7B_mp_1_ratio_0.1 | Qwen2.5-7B-Instruct | Academic Graphs | HuggingFace |
| RAMP_7B_mp_2_ratio_0.1 | Qwen2.5-7B-Instruct | Academic Graphs | HuggingFace |
Note: RAMP is built upon Qwen2.5 architecture. Other Qwen2.5 variants should also be compatible.
The
mpin model names refers to the number of message passing layers. Due to an initialization layer in the implementation, the corresponding training argument ismp + 1(e.g.,RAMP_7B_mp_1corresponds to--mp 2in training scripts).
The pretrain dataset is available on HuggingFace. Download and place them in ./data/pretrained_data/:
data/
└── pretrained_data/
├── pretrained_Economics.json
├── pretrained_Mathematics.json
└── pretrained_Geology.json
The Cora finetuning dataset is available on HuggingFace. Download and place them in ./data/finetuned_data/cora/:
data/
└── finetuned_data/
└── cora/
├── finetuned_cora_v1.json # Training set
├── finetuned_cora_val_v1.json # Validation set
└── eval_cora_v1.json # Test set
We also provide the data preprocessing script in data_preprocess/cora/ as a reference for constructing your own text-rich graph datasets.
Download Raw Data for Cora:
Download the raw Cora dataset from here and place it in ./data_preprocess/cora/raw/.
Provided Raw Data:
data_preprocess/cora/raw/
├── data.csv # Paper labels
├── graph.csv # Citation edges
└── node_info.csv # Paper titles and abstracts
Generate Finetuning Data:
cd data_preprocess/cora
python generate_fine_tuned_data_cora.pyRAMP training consists of two stages: pretrain and finetune.
Pretrain RAMP on academic text-rich graphs:
bash pretrain_multi_node.sh qwen_7b 2 0.1Arguments:
$1: Base model name (qwen_7b)$2: Number of message passing layers$3: Compression ratio
Finetune on downstream text-rich graph tasks:
bash finetune_multi_node.sh qwen_7b cora 2Arguments:
$1: Base model name (qwen_7b)$2: Dataset name (cora)$3: Number of message passing layers
After finetuning, merge LoRA weights into the base model:
bash merge_model.sh qwen_7b cora 2Evaluate the finetuned model on the test set:
bash eval_graph.sh qwen_7b cora 2Arguments:
$1: Model name$2: Dataset name$3: Number of message passing layers
Results will be saved to ./qwen_7b/preds_academic_cora_v1.jsonl.
If you find RAMP useful in your research, please consider citing:
@article{zhang2026llm,
title={LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs},
author={Zhang, Ying and Yu, Hang and Zhang, Haipeng and Di, Peng},
journal={arXiv preprint arXiv:2603.14937},
year={2026}
}This codebase is developed based on FocusLLM and Activation-Beacon. We thank the authors for their great work.
For questions and feedback, please open an issue on GitHub.
