Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Tooooa/AgentMarkBeta

Repository files navigation

AgentMark Logo AgentMark

Behavioral Watermarking Framework for LLM Agents

简体中文 | English

Python Version License


AgentMark is an experimental and evaluation framework for behavioral watermarking of LLM agents, implementing the utility-preserving and distribution-preserving watermark algorithms proposed in the Agent Mark paper.

The project provides a reproducible, modular, and extensible codebase to evaluate watermark performance, robustness, and stealth in complex agent tasks. It decomposes agent decision-making into planning behavior and execution action, embedding watermarks at the planning stage via distribution-preserving sampling to maintain downstream utility while enabling verifiable ownership protection.

✨ Key Features

  • 💎 Utility Preservation: Strict distribution-preserving sampling keeps watermarked behavior statistically indistinguishable from the original.
  • 🛡️ Robustness: Erasure-resilient coding and context-bound randomness handle missing logs and truncated trajectories.
  • 🌍 Multi-environment Support: Covers tool use, embodied intelligence, and social simulations.

🎮 Supported Environments

  • 🛠️ ToolBench: Complex tool-using scenarios with real-world API calls.
  • 🏠 ALFWorld: Text-based interactive household decision tasks.
  • 📱 Oasis (Twitter/Reddit): Social-media behavior watermarking experiments.

📖 Table of Contents


📂 Project Structure

AgentMark/
├── assets/ # Project assets (images, PDF)
├── agentmark/ # Core library: watermark algorithms
│ ├── core/ # Core watermark logic (ECC, sampling)
│ ├── environments/ # Environment adapters (ToolBench, ALFWorld)
│ └── data/ # Bitstreams and configuration data
├── experiments/ # Experimental implementations
│ ├── toolbench/ # ToolBench API tool-use experiments
│ │ ├── scripts/ # Pipeline and analysis scripts
│ │ ├── configs/ # Pipeline config files
│ │ ├── tools/ # Evaluation tools (StableToolBench)
│ │ ├── MarkLLM/ # SynthID watermark library (local mode)
│ ├── alfworld/ # ALFWorld embodied intelligence experiments
│ │ ├── scripts/ # Experiment and analysis scripts
│ │ └── configs/ # Config files
│ ├── oasis_watermark/ # Social-media experiments
│ │ ├── twitter_watermark_experiment/ # Twitter simulation
│ │ ├── reddit_watermark_experiment/ # Reddit simulation
│ │ └── oasis/ # Modified Oasis framework
│ ├── rlnc_trajectory/ # RLNC robustness evaluation
│ │ ├── scripts/ # Erasure eval and FPR analysis
│ │ └── *.json # Config files
│ └── semantic_rewriting/ # Semantic rewriting robustness tests
│ ├── scripts/ # Robustness test scripts
│ └── data/ # Sample task data
├── output/ # Logs, predictions, analysis outputs
├── environment.yml # Conda environment (Python 3.9)
├── requirements.txt # Python dependencies (pip)
├── .env.example # Environment variable template
├── LICENSE # MIT License
└── README.md # English README

🚀 Quick Start

1. ⚙️ Environment Setup

For ToolBench and ALFWorld experiments (Python 3.9)

Use Conda to manage the environment:

# Create and activate environment
conda env create -f environment.yml
conda activate AgentMark
# Or install manually
pip install -r requirements.txt

2. Environment Variables

Copy and edit the environment template:

cp .env.example .env
vim .env
# Fill in your API key (OpenAI / DeepSeek etc.)
# Use 'export KEY=VALUE' format or apply with:
export $(grep -v '^#' .env | xargs)

3. Dataset Preparation

ToolBench

Important

ToolBench dataset is required! You must complete the steps below before running ToolBench experiments.

Download steps:

  1. Download the ToolBench dataset

    From the ToolBench repository, download the full dataset including:

    • queries: test query tasks
    • tools: API tool definitions (16,000+ tools)
    • reference answers: evaluation references
    # Recommended: use Git LFS or download from Releases
    # Dataset size ~2-3 GB
  2. Place into the correct directory

    Put the extracted data folder under experiments/toolbench/data/:

    # Expected structure
    AgentMark/
    └── experiments/
     └── toolbench/
     └── data/
     └── data/ # extracted data folder
     ├── test_query/
     ├── toolenv/
     │ └── tools/ # tool JSON definitions
     └── answer/
  3. Verify dataset

    Make sure experiments/toolbench/data/data/toolenv/tools contains multiple category subfolders (e.g., Search/, Social_Media/) and JSON tool files inside.

ALFWorld

The dataset is downloaded automatically to ~/.cache/alfworld, or run manually:

alfworld-download

experiments/alfworld/configs/base_config.yaml is preconfigured to /root/.cache/alfworld.

Note

Oasis (social media) experiments require a separate environment (Python 3.10+). Please refer to the Oasis Social Media Experiments section below.

4. Dashboard Visualization

The dashboard provides interactive watermark experiments with real-time comparison and decoding analysis.

Requirements

  • Node.js: 18.0+ (LTS recommended)
  • NPM: comes with Node.js
  • Python: backend runs in AgentMark environment

Steps

Step 1: Start backend

# Ensure you are in the project root
conda activate AgentMark
python dashboard/server/app.py

When you see Uvicorn running on http://0.0.0.0:8000, the backend is running.

Note: backend listens on port 8000 by default.

Step 2: Start frontend

cd dashboard
npm install # first time only
npm run dev

You will see a local URL, typically: http://localhost:5173

Step 3: Open the app

Visit http://localhost:5173 or http://127.0.0.1:5173 in your browser.

Common Issues

  • Port in use: if 8000 or 5173 is occupied, stop the conflicting process or change config (frontend: dashboard/vite.config.ts, backend: dashboard/server/app.py).
  • Missing dependency: if you see ModuleNotFoundError, install the missing package with pip install <package>.

✅ Using Our Plugin

This flow validates the "tool calling + watermark sampling" plugin route: external agents don't modify business code, only change the endpoint address (OPENAI_BASE_URL).

Workflow: User input (Add Agent mode) → Gateway performs watermark sampling → Tool calls executed.

Step 1: Start Gateway Proxy (AgentMark Proxy)

Open Terminal 1:

Linux/macOS:

cd AgentMark
source ~/miniconda3/etc/profile.d/conda.sh && conda activate AgentMark
export DEEPSEEK_API_KEY=sk-your-key
export TARGET_LLM_MODEL=deepseek-chat
export AGENTMARK_DEBUG=1
export AGENTMARK_TOOL_MODE=proxy # Use "proxy constructs tool_calls" plugin mode
uvicorn agentmark.proxy.server:app --host 0.0.0.0 --port 8001

Windows PowerShell:

cd AgentMark
conda activate AgentMark
$env:DEEPSEEK_API_KEY="sk-your-key"
$env:TARGET_LLM_MODEL="deepseek-chat"
$env:AGENTMARK_DEBUG="1"
$env:AGENTMARK_TOOL_MODE="proxy"
uvicorn agentmark.proxy.server:app --host 0.0.0.0 --port 8001

Step 2: Start Dashboard Backend

Open Terminal 2:

cd AgentMark
conda activate AgentMark
python dashboard/server/app.py

Step 3: Start Frontend (Visualization Dashboard)

Open Terminal 3:

cd AgentMark
cd dashboard
npm install # Only needed first time
npm i @react-three/fiber @react-three/drei three
npm run dev

Browser access: http://localhost:5173

You can view sessions and watermark visualizations on the frontend.

Step 4: Use Add Agent Mode in the Dashboard

  • Open the dashboard in your browser.
  • Select Add Agent on the welcome screen.
  • Fill in your API key (DeepSeek/OpenAI) and optional repo URL, then send a message.

Step 5: Verify Watermark Injection

In the gateway proxy terminal you should see:

  • [agentmark:scoring_request]: Scoring instruction injection
  • [agentmark:tool_calls_proxy]: Gateway-constructed tool calls (with parameters)
  • [watermark]: Watermark results and visualization data

In the frontend dashboard you can:

  • View current session and conversation history
  • Visualize watermark distribution and statistics
  • Analyze watermark decoding results

Note: The gateway extracts candidate tools from the request's tools parameter and performs watermark sampling selection.

Troubleshooting

  • 502 Bad Gateway Error: If you encounter 502 Bad Gateway when calling the API, it is often caused by a global proxy configuration (e.g., http_proxy) interfering with localhost connections.

    Fix: set no_proxy when starting the services to ensure local traffic bypasses the proxy.

    export no_proxy=localhost,127.0.0.1,0.0.0.0
    # Then restart the proxy and backend

📚 Experiment Guide

Detailed experimental guides are as follows:

1. ToolBench Tool Calling Experiment

  • Overview: Simulates real-world API calling scenarios to evaluate watermark impact on tool usage and robustness.
  • Directory: experiments/toolbench/
  • Two Running Modes:
    Mode Config (use_local_model) Description
    API Mode false (default) Calls remote LLM APIs (e.g., DeepSeek, OpenAI), watermark embedded via behavioral sampling
    Local Mode true Loads local models (e.g., Llama-3), combines with SynthID text watermarking
  • Run Pipeline:
    conda activate AgentMark
    # Run full pipeline (baseline/watermark/evaluation)
    python experiments/toolbench/scripts/run_pipeline.py
  • Key Config: experiments/toolbench/configs/pipeline_config.json
    • Switch mode: modify common_config.use_local_model to true or false
    • Local mode requires local_model_path pointing to model weights

2. ALFWorld Embodied Intelligence Experiment

  • Overview: Text-based interactive household decision tasks, evaluating watermark impact on agent planning and execution.
  • Directory: experiments/alfworld/
  • Environment Install:
    pip install alfworld # Install on top of AgentMark environment
  • Run Pipeline:
    conda activate AgentMark
    # Run full pipeline (baseline/watermark/evaluation)
    python experiments/alfworld/scripts/run_experiment.py --config experiments/alfworld/configs/config.json
  • Key Config: experiments/alfworld/configs/config.json

3. Oasis Social Media Experiment

Note

  1. The oasis/ directory is a modified submodule containing customized watermark logic.
  2. Use a separate oasis environment (Python 3.10+).
  • Environment Install:

    # 1. Create environment (Python 3.10+ recommended)
    conda create -n oasis python=3.10 -y
    conda activate oasis
    # 2. Install Oasis package
    pip install camel-oasis

    See Oasis README for details.

  • Overview: Simulates user behavior and watermark injection on Twitter and Reddit.

  • Directory: experiments/oasis_watermark/

  • Twitter Experiment:

    • Directory: experiments/oasis_watermark/twitter_watermark_experiment/
    • Run:
      cd experiments/oasis_watermark/twitter_watermark_experiment
      # Configure config.py or set DEEPSEEK_API_KEY environment variable
      python run_experiment.py
      # Run evaluation
      python evaluate_metrics_llm.py
  • Reddit Experiment:

    • Directory: experiments/oasis_watermark/reddit_watermark_experiment/
    • Run:
      cd experiments/oasis_watermark/reddit_watermark_experiment
      python run_experiment.py
      # Run evaluation
      python evaluate_metrics_llm.py
    • Note: Simulates AI-related discussions in the r/TechFuture community.

4. RLNC Robustness Evaluation

  • Overview: Tests RLNC (Random Linear Network Coding) watermark scheme recovery under packet loss/erasure scenarios.
  • Directory: experiments/rlnc_trajectory/
  • Core Scripts:
    Script Function
    scripts/rlnc_step_erasure_eval.py Erasure robustness evaluation (simulates various packet loss rates)
    scripts/analyze_fpr.py False Positive Rate (FPR) analysis - simulates "no watermark" and "wrong key" attack scenarios
  • Run Robustness Evaluation:
    cd experiments/rlnc_trajectory
    python scripts/rlnc_step_erasure_eval.py --config rlnc_eval_config.json
  • Run FPR Analysis:
    python scripts/analyze_fpr.py --config rlnc_fpr_config.json
  • Key Configs: rlnc_eval_config.json, rlnc_fpr_config.json

5. Semantic Rewriting Robustness Evaluation

  • Overview: Tests differential watermark robustness against semantic rewriting attacks.
  • Directory: experiments/semantic_rewriting/
  • Run:
    cd experiments/semantic_rewriting
    python scripts/robustness_test.py \
     --task data/001_task_0.json \
     --bits data/decoded_bits.json \
     --steps 5

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /