Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

avnlp/biothink

Repository files navigation

BioThink: Self-Reflective Reasoning for Biomedical Question Answering

Introduction

Recent Large Language Models (LLMs) have achieved remarkable success in a wide range of tasks, including question answering, text generation, and reasoning. However, these LLMs often struggle with domain-specific tasks, such as biomedical question answering, without extensive pre-training on domain-specific data.

Inspired by Self-RAG and building upon Self-BioRAG, we introduce BioThink, a framework that enhances LLMs for biomedical question answering through self-reflection, context grading, relevance assessment, and utility rating. BioThink uses a novel training approach with GRPO (Group Relative Policy Optimization) to fine-tune LLMs to generate structured outputs that include step-by-step reasoning, concise answers, and self-reflection tokens.

Key Features

  • Self-Reflective Generation: BioThink generates outputs in a structured format that includes:
    • Step-by-step reasoning (<think>)
    • Concise answer (<answer>)
    • Contextual relevance assessment (<contextual-relevance>)
    • Answer utility rating (<answer-utility>)
    • Groundness evaluation (<groundness>)
  • Training with GRPO: We use Group Relative Policy Optimization (GRPO) to train the model, incorporating multiple reward functions to ensure:
    • Correctness of the answer
    • Accuracy of self-reflection tokens (utility, relevance, groundness)
    • Proper XML structure and order of tags
    • Faithfulness and relevancy of the answer
  • Efficiency: The model is trained using QLoRA and Unsloth for efficient fine-tuning.

Quick Start

Install dependencies and run the main BioThink workflow scripts:

make sync
uv run python src/biothink/self_reflection/data_process/process_data.py
uv run python src/biothink/self_reflection/train.py
uv run python src/biothink/self_reflection/inference/inference_biothink_qwen3.py

Training Steps

1. Data Processing

The Self-BioRAG dataset is processed using the script process_data.py. This script extracts questions, answers, and context, and also prepares labels for groundness, relevance, and utility tokens. The processed dataset is available at avnlp/self_biorag_processed.

2. Model Training

The model is trained using the script train.py. The training process involves:

Structured Generation: The model is trained to generate outputs in the following format:

<think>
... step-by-step reasoning ...
</think>
<answer>
... concise answer ...
</answer>
<contextual-relevance>
[Relevant] or [Irrelevant]
</contextual-relevance>
<answer-utility>
[Utility:5] or [Utility:4] or ... [Utility:1]
</answer-utility>
<groundness>
[Fully supported] or [Partially supported] or [No support/Contradictory]
</groundness>

Reward Functions: The training uses GRPO with the following rewards:

  • Correctness Reward: Measures answer correctness using DeepEval's GEval metric with a custom LLM-as-a-Judge instruction tailored for Bio-Medical Question Answering.
  • Utility Reward: Ensures the correct Utility token is generated.
  • Relevance Reward: Ensures the correct Relevance token is generated.
  • Groundness Reward: Ensures the correct Groundness token is generated.
  • XML Structure Reward: Checks for the presence and proper opening/closing of all required tags.
  • Structure Order Reward: Ensures the tags appear in the correct order and that no extra text is present outside the tags.

3. Model

We fine-tune the Qwen-3-1.7B model using GRPO and QLoRA. The trained model is available on Hugging Face: avnlp/BioThink-Qwen3-1.7B.

Training defaults are defined in train_config.py. Set the model choice, dataset source, LoRA parameters, and GRPO settings before launching a run.

4. Evaluation

The model is evaluated using the following metrics:

  1. XML Structure: Checks for the presence of the opening and closing of all reasoning, answer, contextual-relevance, answer-utility, groundness tags.
  2. Utility: Checks that the correct utility token has been generated.
  3. Relevance: Checks that the correct relevance token has been generated.
  4. Groundness: Checks that the correct groundness token has been generated.
  5. Answer Correctness: Checks that the answer is correct using DeepEval's GEval metric with a custom instruction for LLM-as-a-Judge.
  6. Faithfulness: Checks that the answer is faithful to the provided context using DeepEval's Faithfulness LLM-as-a-Judge metric.
  7. Answer Relevancy: Checks that the answer is relevant to the original question using DeepEval's Answer Relevancy LLM-as-a-Judge metric.

Repository Structure

src/biothink/
├── __init__.py
└── self_reflection/
 ├── __init__.py
 ├── data_process/
 │ ├── __init__.py
 │ ├── process_data.py
 │ └── subset_data.py
 ├── evaluation/
 │ ├── __init__.py
 │ ├── evaluate_biothink_qwen3.py
 │ ├── evaluate_qwen3.py
 │ └── metrics.py
 ├── inference/
 │ ├── __init__.py
 │ ├── inference_biothink_qwen3.py
 │ └── inference_qwen3.py
 ├── prompts.py
 ├── reward_functions.py
 ├── train.py
 └── train_config.py

Development

Run the local quality checks with:

make lint-check
make lint-typing
make lint-typos

Security checks are available with:

make security-bandit
make security-audit

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Self-Reflective Question Answering for Biomedical Reasoning. GRPO fine-tuning via QLoRA & Unsloth with rewards for correctness, relevance, groundness, utility & XML structure. Structured think → answer → self-reflection with context grading, relevance assessment & groundness evaluation. DeepEval LLM-as-a-Judge (GEval, Faithfulness, Relevancy).

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /