Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

guanwei49/GLOW

Repository files navigation

GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction

Official Implementation of "GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction"

architecture.png

πŸš€ Getting Started

1. Set Up the Environment.

πŸ’» Software Requirements:

  • Python: 3.11.13
  • CUDA: 12.1

πŸ“¦ Install dependencies:

Install from requirements.txt

 conda install --yes --file requirements.txt # You may need to downgrade the torch using pip to match the CUDA version

2. Download Pretrained Models

Download the following pretrained models from Hugging Face:

 β”œβ”€β”€ Qwen3-1.7B
 β”‚ β”œβ”€β”€ config.json
 β”‚ β”œβ”€β”€ generation_config.json
 β”‚ β”œβ”€β”€ LICENSE
 β”‚ β”œβ”€β”€ merges.txt
 β”‚ β”œβ”€β”€ model-00001-of-00002.safetensors
 β”‚ β”œβ”€β”€ model-00002-of-00002.safetensors
 β”‚ β”œβ”€β”€ model.safetensors.index.json
 β”‚ β”œβ”€β”€ tokenizer.json
 β”‚ β”œβ”€β”€ tokenizer_config.json
 β”‚ └── vocab.json
 β”œβ”€β”€ all-MiniLM-L6-v2
 β”‚ β”œβ”€β”€ config.json
 β”‚ β”œβ”€β”€ config_sentence_transformers.json
 β”‚ β”œβ”€β”€ data_config.json
 β”‚ β”œβ”€β”€ model.safetensors
 β”‚ β”œβ”€β”€ modules.json
 β”‚ β”œβ”€β”€ sentence_bert_config.json
 β”‚ β”œβ”€β”€ special_tokens_map.json
 β”‚ β”œβ”€β”€ tokenizer.json
 β”‚ β”œβ”€β”€ tokenizer_config.json
 β”‚ β”œβ”€β”€ train_script.py
 β”‚ └── vocab.txt

3. Prepare Training and Testing Data

πŸ“₯ Download Dataset

Download FLORA-Bench dataset from here and place it in the data directory.

 β”œβ”€β”€ data
 β”‚ β”œβ”€β”€ Coding-AF
 β”‚ β”‚ β”œβ”€β”€ test.jsonl
 β”‚ β”‚ β”œβ”€β”€ train.jsonl
 β”‚ β”‚ β”œβ”€β”€ val.jsonl
 β”‚ β”œβ”€β”€ Coding-GD
 β”‚ β”œβ”€β”€ Math-AF
 β”‚ β”œβ”€β”€ Math-GD
 β”‚ β”œβ”€β”€ Reason-AF
 β”‚ └── Reason-GD

πŸ› οΈ Construct Pre-finetuning Data

To build the dataset for LLM Pre-finetuning, run the following command:

 python make_llm_prefinetuning_data.py --data_path ./data

βš™οΈ Arguments

  • --data_path: Path to the dataset directory.

This script processes the raw dataset and constructs formatted data suitable for LLM Pre-finetuning.

4. LLM Pre-Finetuning (Generating graph-oriented LLM)

βš™οΈ Configuration

Open the script pre-finetuning_LLM.sh and set the following parameters:

  1. Visible GPUs Modify the line:

    export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

    to match the GPUs you want to use.

  2. Model path β€” specify your base LLM model, e.g:

    --model llmmodel/Qwen3-1.7B
  3. Dataset path β€” specify the dataset path generated in the previous step, e.g:

    --dataset GLOW/data/prefinetuning.jsonl

πŸ“ˆ Run

To pre-finetune the LLM using LoRA, simply run:

bash pre-finetuning_LLM.sh

This script automatically detects the number of available GPUs and launches distributed training.

🧾 Output

All training logs and checkpoints will be saved in:

outputs/prefinetuning/

🧠 Merge LoRA Weights with the Base Model

After pre-finetuning is completed, you need to merge the LoRA-adapted parameters with the base LLM checkpoint to obtain a standalone pretrained model.

Run the following Python script:

python combine_lora.py \
 --peft FLORA/outputs/prefinetuning/v0-20251015-105151/checkpoint-7300 \
 --checkpoint llmmodel/Qwen3-1.7B \
 --save_path FLORA/outputs/prefinetuning/graph_oriented_LLM

βš™οΈ Arguments

Argument Description Example
--peft Path to the LoRA fine-tuned checkpoint FLORA/outputs/prefinetuning/v0-20251015-105151/checkpoint-7300
--checkpoint Path to the base LLM model llmmodel/Qwen3-1.7B
--save_path Path to save the merged model FLORA/outputs/prefinetuning/graph_oriented_LLM

This script merges the LoRA adapter weights into the base model and saves a standalone model (graph-oriented LLM) ready for downstream training or inference.

5. Train and Evaluate

Run the following command (e.g., for the Coding-AF domain):

python train.py \
 --data_path ./data/Coding-AF \
 --llm_model_path FLORA/outputs/prefinetuning/graph_oriented_LLM \
 --st_model_path llmmodel/all-MiniLM-L6-v2

βš™οΈ Arguments

Argument Description Example / Default
--data_path Path to the root dataset directory ./data/Coding-AF
--llm_model_path Path to the pre-finetuned (merged) LLM model (graph-oriented LLM) FLORA/outputs/prefinetuning/graph_oriented_LLM
--st_model_path Path to the sentence transformer model llmmodel/all-MiniLM-L6-v2
--hidden_dim Hidden layer dimension 256
--n_gnn_layers Number of GNN layers 2
--dropout Dropout rate 0.2
--batch_size Batch size for training and evaluation 512
--pretrain_batch_size Batch size for pretraining 64
--epochs Number of training epochs 200
--lr Learning rate 1e-4
--weight_decay Weight decay for optimizer 1e-4
--seed Random seed for reproducibility 42
--n_mlplayers Number of MLP layers 2
--patience Early stopping patience 30
--contrastive_weight Weight for contrastive loss 1
--margin contrastive loss margin 0.2
--pretrain_steps Number of pretraining steps 1000
--eval_steps Evaluation interval (in epochs) 1

About

GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /