Name	Name	Last commit message	Last commit date
Latest commit History 5 Commits
Agent	Agent
AgentServer	AgentServer
CVEs	CVEs
LICENSE	LICENSE
README.md	README.md
README_CN.md	README_CN.md

LLMPentest

Chinese version: README_CN.md

Project Structure

Agent/: our proposed agent scaffolding. It contains the LLM agent, local tools, MCP integrations, the batch runner, and log reporting logic.
CVEs/: the target server side. Each CVE-* directory is a runnable vulnerable environment.
AgentServer/: the base image definition used to run the agent scaffolding. After building it, Agent/batch_main.py uses the image agentserver.

Recommended Execution Order

Prepare and run the system in this order:

Build the AgentServer image.
Create the pennet Docker networks.
Prefer pulling the CVE images in advance, although runtime pulling also works.
Start the log receiver Agent/log_monitor.py.
Start the batch entrypoint Agent/batch_main.py.

Quick Start

1. Build the agent scaffolding base image

Run from the repository root:

docker build -f AgentServer/Dockerfile -t agentserver AgentServer

The current AgentServer/Dockerfile installs the latest Metasploit Framework through the official msfinstall script. Our experiments used Metasploit 6.4.102-dev-; for strict reproduction, replace the Metasploit install step in the Dockerfile with the corresponding 6.4.102-dev- version.

batch_main.py is hardcoded to use the image agentserver. If you change the tag or image name, update Agent/batch_main.py accordingly.

2. Install host-side Python dependencies

log_monitor.py and batch_main.py run on the host, not inside AgentServer.

pip install -r Agent/requirements.txt

On Windows, make sure both docker and bash are available in PATH, because batch_main.py directly invokes bash start.sh.

3. Fill the agent environment config

Before starting log_monitor.py and batch_main.py, fill in Agent/configs/env.yaml:

target_server:
 server_ip: "10.1.2.143"
 server_port: 9000
 attack_ip: "10.1.2.143"
logging:
 server_url: "http://10.1.2.143:8823/log"
 batch_name: "ljq_new_v1_0108"

Field meanings:

target_server.server_ip: server IP passed into the CVE target server.
target_server.server_port: server port passed into the CVE target server.
target_server.attack_ip: attacker IP passed into the CVE target server. In many setups this can be the same as server_ip.
logging.server_url: log reporting endpoint used by the Agent. It should point to the /log endpoint exposed by log_monitor.py.
logging.batch_name: batch log directory name used to separate experiment runs.

4. Create the experiment networks

cd CVEs
bash create_pennet.sh

This creates pennet01 through pennet50. The current Agent/batch_main.py uses pennet11 through pennet30.

5. Pre-pull the CVE images

cd CVEs
bash pull_all_images.sh

You can skip this step, since most CVE-* start.sh scripts will try to pull images on demand, but the first run will be slower.

To only list the images:

bash pull_all_images.sh --list

6. Start the log monitor

Start it from Agent/, because it relies on relative paths:

cd Agent
python log_monitor.py

By default it listens on 0.0.0.0:8823 and writes logs into ../traj/.

7. Agent Model Configuration

Model config files live under Agent/configs/. Environment parameters are filled in Agent/configs/env.yaml, and the model config template is Agent/configs/model.yaml.

Each config file must contain at least:

api_base_url: "https://your-api-base/v1"
model_name: "your-model-name"
api_key: "sk-xxxx"
conda_base: "/opt/conda/envs/pentest"
toolkit:
 - done

Field meanings:

api_base_url: base URL of your model service.
model_name: model name passed into ChatOpenAI.
api_key: API key for the model service.
conda_base: Conda path passed into the toolkit. This should match the Conda path inside the scaffolding container.
toolkit: enabled local tool bundles. The repo currently includes done, while MCP tools are registered automatically by Agent/main.py.

8. Configure Runtime Parameters (`batch_main.py`)

Location: Agent/batch_main.py

1. `DOCKER_NETWORK`

Location: Agent/batch_main.py

Purpose: defines the reusable experiment network pool. Each entry contains:

available: whether the network is currently free.
target: fixed IP for the vulnerable target container.
starter_ip: fixed IP for the agent scaffolding container.
index: scheduler-side index marker.

Notes:

Every network name listed here must already exist, so run CVEs/create_pennet.sh first.
MAX_WORKERS should not exceed the number of usable networks.

2. `ROOTDIC`

Location: Agent/batch_main.py

Purpose: absolute path to the repository root. batch_main.py uses it to:

locate CVEs/<CVE-ID>/start.sh;
copy the local Agent/ directory into the scaffolding container.

You must change it to the actual repository path on your machine.

3. `MAX_WORKERS`

Location: Agent/batch_main.py

Purpose: number of concurrent workers.

Do not set it higher than:

the number of available experiment networks;
the host's Docker, CPU, memory, and model-service throughput capacity.

4. `MODEL_CONFIGS`

Location: Agent/batch_main.py

Purpose: mapping from model display name to config file path.

key: model name shown in logs.
value: config file path relative to the Agent/ working directory.

5. `SERVICE_CONFIGS`

Location: Agent/batch_main.py

Purpose: list of supervisord_*.conf files used to control target-side service layouts.

These files live under 1A1/ or 3A1/ inside each CVE-* directory. Each item is combined with task_des to create different experiment conditions.

6. `TEST_TIMES`

Location: Agent/batch_main.py

Purpose: repeat count for each task_des entry.

7. `task_des`

Location: Agent/batch_main.py

Purpose: defines which CVEs and which target environments to run.

Each item contains:

cve_id: target vulnerability directory name.
cve_config: experiment config directory, currently mainly 1A1 or 3A1.
test_times: repeat count for that entry, typically reusing TEST_TIMES.

9. Start the batch runner

Also start this from Agent/:

cd Agent
python batch_main.py

batch_main.py will:

acquire an available pennet network;
start an agentserver-based agent scaffolding container;
copy the local Agent/ directory into that container;
start the corresponding CVE target server;
run python main.py ... inside the scaffolding container.

Logs and Outputs

log_monitor.py listens on 8823 by default.
Logs are typically written under traj/ at the repository root.
utils.py includes CVE_NUMBER, MODEL, CVE_CONFIG, and SERVICE_CONFIG in the log payload.
The batch log directory name comes from logging.batch_name in Agent/configs/env.yaml.

Cleanup

Remove the pennet networks:

cd CVEs
bash remove_pennet.sh

Remove experiment containers created within the last day:

cd Agent
bash docker_remove.sh

Minimum Pre-Run Checklist

Before running a full batch experiment, verify at least the following:

agentserver has been built successfully.
Agent/configs/env.yaml has been filled with the correct target_server and logging values.
bash CVEs/create_pennet.sh has been executed.
bash CVEs/pull_all_images.sh has been run if you want warm image caches.
the required model config files under Agent/configs/ are ready.
ROOTDIC, MODEL_CONFIGS, MAX_WORKERS, and DOCKER_NETWORK in Agent/batch_main.py have been reviewed.
log_monitor.py is already running before batch_main.py starts.

License

This project is licensed under the MIT License. See LICENSE for details.

Folders and files

Latest commit

History

Repository files navigation

LLMPentest

Project Structure

Recommended Execution Order

Quick Start

1. Build the agent scaffolding base image

2. Install host-side Python dependencies

3. Fill the agent environment config

4. Create the experiment networks

5. Pre-pull the CVE images

6. Start the log monitor

7. Agent Model Configuration

8. Configure Runtime Parameters (batch_main.py)

1. DOCKER_NETWORK

2. ROOTDIC

3. MAX_WORKERS

4. MODEL_CONFIGS

5. SERVICE_CONFIGS

6. TEST_TIMES

7. task_des

9. Start the batch runner

Logs and Outputs

Cleanup

Minimum Pre-Run Checklist

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

8. Configure Runtime Parameters (`batch_main.py`)

1. `DOCKER_NETWORK`

2. `ROOTDIC`

3. `MAX_WORKERS`

4. `MODEL_CONFIGS`

5. `SERVICE_CONFIGS`

6. `TEST_TIMES`

7. `task_des`

Packages