Official Implementation of "GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction"
π» Software Requirements:
- Python: 3.11.13
- CUDA: 12.1
π¦ Install dependencies:
Install from requirements.txt
conda install --yes --file requirements.txt # You may need to downgrade the torch using pip to match the CUDA version
Download the following pretrained models from Hugging Face:
- LLM: Qwen3-1.7B
βββ Qwen3-1.7B
β βββ config.json
β βββ generation_config.json
β βββ LICENSE
β βββ merges.txt
β βββ model-00001-of-00002.safetensors
β βββ model-00002-of-00002.safetensors
β βββ model.safetensors.index.json
β βββ tokenizer.json
β βββ tokenizer_config.json
β βββ vocab.json
- Sentence-transformer: all-MiniLM-L6-v2
βββ all-MiniLM-L6-v2
β βββ config.json
β βββ config_sentence_transformers.json
β βββ data_config.json
β βββ model.safetensors
β βββ modules.json
β βββ sentence_bert_config.json
β βββ special_tokens_map.json
β βββ tokenizer.json
β βββ tokenizer_config.json
β βββ train_script.py
β βββ vocab.txt
π₯ Download Dataset
Download FLORA-Bench dataset from here and place it in the data directory.
βββ data
β βββ Coding-AF
β β βββ test.jsonl
β β βββ train.jsonl
β β βββ val.jsonl
β βββ Coding-GD
β βββ Math-AF
β βββ Math-GD
β βββ Reason-AF
β βββ Reason-GD
π οΈ Construct Pre-finetuning Data
To build the dataset for LLM Pre-finetuning, run the following command:
python make_llm_prefinetuning_data.py --data_path ./data
βοΈ Arguments
--data_path: Path to the dataset directory.
This script processes the raw dataset and constructs formatted data suitable for LLM Pre-finetuning.
Open the script pre-finetuning_LLM.sh and set the following parameters:
-
Visible GPUs Modify the line:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7to match the GPUs you want to use.
-
Model path β specify your base LLM model, e.g:
--model llmmodel/Qwen3-1.7B
-
Dataset path β specify the dataset path generated in the previous step, e.g:
--dataset GLOW/data/prefinetuning.jsonl
To pre-finetune the LLM using LoRA, simply run:
bash pre-finetuning_LLM.sh
This script automatically detects the number of available GPUs and launches distributed training.
All training logs and checkpoints will be saved in:
outputs/prefinetuning/
After pre-finetuning is completed, you need to merge the LoRA-adapted parameters with the base LLM checkpoint to obtain a standalone pretrained model.
Run the following Python script:
python combine_lora.py \
--peft FLORA/outputs/prefinetuning/v0-20251015-105151/checkpoint-7300 \
--checkpoint llmmodel/Qwen3-1.7B \
--save_path FLORA/outputs/prefinetuning/graph_oriented_LLM
βοΈ Arguments
| Argument | Description | Example |
|---|---|---|
--peft |
Path to the LoRA fine-tuned checkpoint | FLORA/outputs/prefinetuning/v0-20251015-105151/checkpoint-7300 |
--checkpoint |
Path to the base LLM model | llmmodel/Qwen3-1.7B |
--save_path |
Path to save the merged model | FLORA/outputs/prefinetuning/graph_oriented_LLM |
This script merges the LoRA adapter weights into the base model and saves a standalone model (graph-oriented LLM) ready for downstream training or inference.
Run the following command (e.g., for the Coding-AF domain):
python train.py \
--data_path ./data/Coding-AF \
--llm_model_path FLORA/outputs/prefinetuning/graph_oriented_LLM \
--st_model_path llmmodel/all-MiniLM-L6-v2
βοΈ Arguments
| Argument | Description | Example / Default |
|---|---|---|
--data_path |
Path to the root dataset directory | ./data/Coding-AF |
--llm_model_path |
Path to the pre-finetuned (merged) LLM model (graph-oriented LLM) | FLORA/outputs/prefinetuning/graph_oriented_LLM |
--st_model_path |
Path to the sentence transformer model | llmmodel/all-MiniLM-L6-v2 |
--hidden_dim |
Hidden layer dimension | 256 |
--n_gnn_layers |
Number of GNN layers | 2 |
--dropout |
Dropout rate | 0.2 |
--batch_size |
Batch size for training and evaluation | 512 |
--pretrain_batch_size |
Batch size for pretraining | 64 |
--epochs |
Number of training epochs | 200 |
--lr |
Learning rate | 1e-4 |
--weight_decay |
Weight decay for optimizer | 1e-4 |
--seed |
Random seed for reproducibility | 42 |
--n_mlplayers |
Number of MLP layers | 2 |
--patience |
Early stopping patience | 30 |
--contrastive_weight |
Weight for contrastive loss | 1 |
--margin |
contrastive loss margin | 0.2 |
--pretrain_steps |
Number of pretraining steps | 1000 |
--eval_steps |
Evaluation interval (in epochs) | 1 |