Skip to content

codefuse-ai/CodeFuse-RAMP

Repository files navigation

LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs

Paper License

Official implementation of "LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs".

Overview of RAMP: Raw-text Anchored Message Passing.

Abstract

Text-rich graphs integrate complex structural dependencies with abundant textual information. Conventional methods compress text into static embeddings before structural reasoning, creating an information bottleneck.

We introduce RAMP (Raw-text Anchored Message Passing), which recasts the LLM as a graph-native aggregation operator. RAMP anchors inference on each node's raw text while propagating optimized messages from neighbors, handling both discriminative and generative tasks under a unified formulation.

🔥 News

  • [2026-03] Code and pretrained models released.

Installation

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA 11.7+ (for GPU training)

Setup

# Clone the repository
git clone https://github.com/codefuse-ai/CodeFuse-RAMP.git
cd CodeFuse-RAMP

# Install dependencies
pip install -r requirements.txt

Pretrained Models

We provide pretrained RAMP weights based on Qwen2.5-7B-Instruct:

Model Base Model Pretrain Data Link
RAMP_7B_mp_1_ratio_0.1 Qwen2.5-7B-Instruct Academic Graphs HuggingFace
RAMP_7B_mp_2_ratio_0.1 Qwen2.5-7B-Instruct Academic Graphs HuggingFace

Note: RAMP is built upon Qwen2.5 architecture. Other Qwen2.5 variants should also be compatible.

The mp in model names refers to the number of message passing layers. Due to an initialization layer in the implementation, the corresponding training argument is mp + 1 (e.g., RAMP_7B_mp_1 corresponds to --mp 2 in training scripts).

Data Preparation

Pretrain Data

The pretrain dataset is available on HuggingFace. Download and place them in ./data/pretrained_data/:

data/
└── pretrained_data/
    ├── pretrained_Economics.json
    ├── pretrained_Mathematics.json
    └── pretrained_Geology.json

Finetune Data (Cora Example)

The Cora finetuning dataset is available on HuggingFace. Download and place them in ./data/finetuned_data/cora/:

data/
└── finetuned_data/
    └── cora/
        ├── finetuned_cora_v1.json      # Training set
        ├── finetuned_cora_val_v1.json  # Validation set
        └── eval_cora_v1.json           # Test set

We also provide the data preprocessing script in data_preprocess/cora/ as a reference for constructing your own text-rich graph datasets.

Download Raw Data for Cora:

Download the raw Cora dataset from here and place it in ./data_preprocess/cora/raw/.

Provided Raw Data:

data_preprocess/cora/raw/
├── data.csv          # Paper labels
├── graph.csv         # Citation edges
└── node_info.csv     # Paper titles and abstracts

Generate Finetuning Data:

cd data_preprocess/cora
python generate_fine_tuned_data_cora.py

Training

RAMP training consists of two stages: pretrain and finetune.

Stage 1: Pretrain

Pretrain RAMP on academic text-rich graphs:

bash pretrain_multi_node.sh qwen_7b 2 0.1

Arguments:

  • $1: Base model name (qwen_7b)
  • $2: Number of message passing layers
  • $3: Compression ratio

Stage 2: Finetune

Finetune on downstream text-rich graph tasks:

bash finetune_multi_node.sh qwen_7b cora 2

Arguments:

  • $1: Base model name (qwen_7b)
  • $2: Dataset name (cora)
  • $3: Number of message passing layers

After finetuning, merge LoRA weights into the base model:

bash merge_model.sh qwen_7b cora 2

Evaluation

Evaluate the finetuned model on the test set:

bash eval_graph.sh qwen_7b cora 2

Arguments:

  • $1: Model name
  • $2: Dataset name
  • $3: Number of message passing layers

Results will be saved to ./qwen_7b/preds_academic_cora_v1.jsonl.

Citation

If you find RAMP useful in your research, please consider citing:

@article{zhang2026llm,
  title={LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs},
  author={Zhang, Ying and Yu, Hang and Zhang, Haipeng and Di, Peng},
  journal={arXiv preprint arXiv:2603.14937},
  year={2026}
}

Acknowledgments

This codebase is developed based on FocusLLM and Activation-Beacon. We thank the authors for their great work.

Contact

For questions and feedback, please open an issue on GitHub.

About

Official implementation of "LLM as Graph Kernel: Rethinking Message Passing on Text-Rich Graphs".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages