LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

Official implementation of "LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs".

Overview of RAMP: Raw-text Anchored Message Passing.

Abstract

Text-rich graphs integrate complex structural dependencies with abundant textual information. Conventional methods compress text into static embeddings before structural reasoning, creating an information bottleneck.

We introduce RAMP (Raw-text Anchored Message Passing), which recasts the LLM as a graph-native aggregation operator. RAMP anchors inference on each node's raw text while propagating optimized messages from neighbors, handling both discriminative and generative tasks under a unified formulation.

🔥 News

[2026-03] Code and pretrained models released.

Installation

Requirements

Python 3.8+
PyTorch 2.0+
CUDA 11.7+ (for GPU training)

Setup

# Clone the repository
git clone https://github.com/codefuse-ai/CodeFuse-RAMP.git
cd CodeFuse-RAMP

# Install dependencies
pip install -r requirements.txt

Pretrained Models

We provide pretrained RAMP weights based on Qwen2.5-7B-Instruct:

Model	Base Model	Pretrain Data	Link
RAMP_7B_mp_1_ratio_0.1	Qwen2.5-7B-Instruct	Academic Graphs	HuggingFace
RAMP_7B_mp_2_ratio_0.1	Qwen2.5-7B-Instruct	Academic Graphs	HuggingFace

Note: RAMP is built upon Qwen2.5 architecture. Other Qwen2.5 variants should also be compatible.

The mp in model names refers to the number of message passing layers. Due to an initialization layer in the implementation, the corresponding training argument is mp + 1 (e.g., RAMP_7B_mp_1 corresponds to --mp 2 in training scripts).

Data Preparation

Pretrain Data

The pretrain dataset is available on HuggingFace. Download and place them in ./data/pretrained_data/:

data/
└── pretrained_data/
    ├── pretrained_Economics.json
    ├── pretrained_Mathematics.json
    └── pretrained_Geology.json

Finetune Data (Cora Example)

The Cora finetuning dataset is available on HuggingFace. Download and place them in ./data/finetuned_data/cora/:

data/
└── finetuned_data/
    └── cora/
        ├── finetuned_cora_v1.json      # Training set
        ├── finetuned_cora_val_v1.json  # Validation set
        └── eval_cora_v1.json           # Test set

We also provide the data preprocessing script in data_preprocess/cora/ as a reference for constructing your own text-rich graph datasets.

Download Raw Data for Cora:

Download the raw Cora dataset from here and place it in ./data_preprocess/cora/raw/.

Provided Raw Data:

data_preprocess/cora/raw/
├── data.csv          # Paper labels
├── graph.csv         # Citation edges
└── node_info.csv     # Paper titles and abstracts

Generate Finetuning Data:

cd data_preprocess/cora
python generate_fine_tuned_data_cora.py

Training

RAMP training consists of two stages: pretrain and finetune.

Stage 1: Pretrain

Pretrain RAMP on academic text-rich graphs:

bash pretrain_multi_node.sh qwen_7b 2 0.1

Arguments:

$1: Base model name (qwen_7b)
$2: Number of message passing layers
$3: Compression ratio

Stage 2: Finetune

Finetune on downstream text-rich graph tasks:

bash finetune_multi_node.sh qwen_7b cora 2

Arguments:

$1: Base model name (qwen_7b)
$2: Dataset name (cora)
$3: Number of message passing layers

After finetuning, merge LoRA weights into the base model:

bash merge_model.sh qwen_7b cora 2

Evaluation

Evaluate the finetuned model on the test set:

bash eval_graph.sh qwen_7b cora 2

Arguments:

$1: Model name
$2: Dataset name
$3: Number of message passing layers

Results will be saved to ./qwen_7b/preds_academic_cora_v1.jsonl.

Citation

If you find RAMP useful in your research, please consider citing:

@article{zhang2026llm,
  title={LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs},
  author={Zhang, Ying and Yu, Hang and Zhang, Haipeng and Di, Peng},
  journal={arXiv preprint arXiv:2603.14937},
  year={2026}
}

Acknowledgments

This codebase is developed based on FocusLLM and Activation-Beacon. We thank the authors for their great work.

Contact

For questions and feedback, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data_preprocess/cora		data_preprocess/cora
eval_graph_src		eval_graph_src
img		img
main		main
src		src
.gitignore		.gitignore
README.md		README.md
eval_graph.sh		eval_graph.sh
eval_graph_model.py		eval_graph_model.py
finetune_multi_node.sh		finetune_multi_node.sh
merge.py		merge.py
merge_model.sh		merge_model.sh
pretrain_multi_node.sh		pretrain_multi_node.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

Abstract

🔥 News

Installation

Requirements

Setup

Pretrained Models

Data Preparation

Pretrain Data

Finetune Data (Cora Example)

Training

Stage 1: Pretrain

Stage 2: Finetune

Evaluation

Citation

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

Abstract

🔥 News

Installation

Requirements

Setup

Pretrained Models

Data Preparation

Pretrain Data

Finetune Data (Cora Example)

Training

Stage 1: Pretrain

Stage 2: Finetune

Evaluation

Citation

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages