Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

FlyingFeather/DEA-SQL

Repository files navigation

Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm

πŸ”₯πŸ”₯ 2024.05. DEA-SQL is accepted by Findings of ACL 2024!

Based on the idea that Decomposition for Enhancing Attention, we propose the workflow paradigm method named DEA-SQL with five major steps as shown in Figure. Check out our paper for more information.

model

Set Up

Environment

# 1. Clone the repo
git clone https://github.com/FlyingFeather/DEA-SQL.git
cd DEA-SQL && mkdir data
# 2. Make a conda environment
conda create -n deasql python=3.9
conda activate deasql
# 3. Install requirements
pip install -r requirements.txt
python nltk_downloader.py

Dataset

Download the data set from the spider official website under DEA-SQL , unzip it and put it into the data folder. We provide the data in drive if it is unable to download dataset from spider official website.

mkdir data
unzip spider.zip -d data

The directory structure should be as follows:

.
β”œβ”€β”€ argsparser.py
β”œβ”€β”€ common
β”œβ”€β”€ correct_sql.py
β”œβ”€β”€ data
β”‚  └── spider
β”‚		β”œβ”€β”€ ...
β”‚		└── database
β”œβ”€β”€ data_preprocess.py
β”œβ”€β”€ docs
β”œβ”€β”€ evaluation
β”œβ”€β”€ fewshot
β”œβ”€β”€ filter_characters.py
β”œβ”€β”€ gen_sql.py
β”œβ”€β”€ get_ner.py
β”œβ”€β”€ hardness_eval.py
β”œβ”€β”€ __init__.py
β”œβ”€β”€ LICENSE
β”œβ”€β”€ llm
β”œβ”€β”€ logger.py
β”œβ”€β”€ main.py
β”œβ”€β”€ nltk_downloader.py
β”œβ”€β”€ outputs
β”œβ”€β”€ prompt
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
└── single_eval.py

Usage

Please modify the OpenAI configuration in common/static_config.py and configure the relevant environment variables for the Azure OpenAI API.

Several important parameters:

  • dataset: The name of dataset.
  • few_shot_mode: The method of retrieving fewshot can be selected from [random, ques_tim, masked_ques_sim].
  • few_shot_data: The data of retrieving fewshot can be selected from [train_merge_v1, train_merge_v5]
  • insert_value: The number of lines that are inserted in database prompt.
  • embedding_base_model: The base embedding model in retrieving few-shot step.
  • sc_filter_nums: The number of information filter layer.

Quick Start

prediction on the Spider Dev datasets

python main.py --save_file_name "dea-sql.txt" --dataset "spider" --mode "dev" --sample "False" --few_shot_mode "masked_ques_sim" --insert_value 3 --embedding_base_model "openai" --sc_filter_nums 3 --few_shot_data "train_merge_v5"

evaluation on the Spider Dev datasets

For the first evaluation, please perform: python nltk_downloader.py

python evaluation/test-suite-sql-eval/evaluation.py --gold "evaluation/gold_files/spider_dev_gold.sql" --pred "outputs/spider/dea-sql.txt" --db ./data/spider/database --print_file_name "outputs/spider/spider-dea-sql.txt" --table './data/spider/tables.json' --etype exec

Citing DEA-SQL

@article{xie2024decomposition,
 title={Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm}, 
 author={Yuanzhen Xie and Xinzhou Jin and Tao Xie and MingXiong Lin and Liang Chen and Chenyun Yu and Lei Cheng and ChengXiang Zhuo and Bo Hu and Zang Li},
 journal={arXiv preprint arXiv:2402.10671},
 year={2024}
}

About

[ACL Findings 2024] Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /