Name	Name	Last commit message	Last commit date
Latest commit History 124 Commits
.github/workflows	.github/workflows
cmd	cmd
example	example
fixture	fixture
src	src
.gitignore	.gitignore
.golangci.yml	.golangci.yml
LICENSE	LICENSE
Makefile	Makefile
README.md	README.md
go.mod	go.mod
go.sum	go.sum
install.sh	install.sh
introduction-zh.md	introduction-zh.md
introduction.md	introduction.md
llms.txt	llms.txt
main.go	main.go

本项目转移到内部开发。

Dodo

Main features:

Dump schema and query
Generate fake data for tables with AI powered
Replay audit log
Anonymize database, table, column and comment in SQL

Important

See Introduction & FAQ / 中文版 for more details.

Install

curl -sSL https://raw.githubusercontent.com/Thearas/dodo/master/install.sh | bash

Usage

There are two types of workflows, with each step representing a dodo command:

No data generation needed: Dump -> Replay -> Diff Replay Results
Data generation needed: Dump -> Create Schemas (Optional) -> Generate and Import Data -> Replay -> Diff Replay Results

By default, only SELECT statments will be dumped. Use --only-select=false to dump all.

# Dump
dodo dump --help
# dump schemas of database db1 and db2
dodo dump --dump-schema --dbs db1,db2 --host <host> --port <port> --user root --password '***' 
# also dump queries from audit logs of db1 and db2
dodo dump --dump-schema --dump-query --dbs db1,db2 --audit-logs 'fe.audit.log,fe.audit.log.20240802-1'
# dump queries from audit log table instead of files, need enable <https://doris.apache.org/docs/admin-manual/audit-plugin>
dodo dump --dump-query --audit-log-table <db.table> --from '2024年11月14日 18:45:25' --to '2024年11月14日 18:45:26'
# Create dump schemas in another DB server
dodo create --help
# create all tables and views of db1 and db2, it auto finds dump schemas under 'output/' dir
dodo create --dbs db1,db2 --host <host> --port <port> --user root --password '***'
# run any create table/view SQL in db1
dodo create --ddl 'dir/*.sql' --db db1
# Generate data (Totally offline!)
dodo gendata --help
# gen data from any create-table SQL (MySQL, Hive, ...)
dodo gendata --ddl table.sql
# gen data for db1 and db2, it auto finds dump schemas under 'output/' dir
dodo gendata --dbs db1,db2 --host <host> --port <port> --user root --password '***'
# gen data with config
dodo gendata --dbs db1 --genconf example/gendata.yaml
# gen data with AI (Deepseek LLM)
dodo gendata -l 'deepseek-chat' -k '<deepseek-api-key>' --ddl table.sql --query 'select xxx'
# Import data (Require curl command)
dodo import --help
# import data for db1, it auto finds generated data under 'output/' dir
dodo import --dbs db1,db2 --host <host> --http-port <http-port> --user root --password '***'
# import data for t1 and t2 in db1
dodo import --dbs db1 --table t1,t2
# import data from any CSV file
dodo import --tables db1.t1 --data data.csv
# Replay
dodo replay --help
# replay queries in dump sql file (from audit logs)
dodo replay --host <host> --port <port> --user root --password '***' -f output/sql/q0.sql
# replay with args
dodo replay -f output/sql/q0.sql \
 --from '2024年09月20日 08:00:00' --to '2024年09月20日 09:00:00' \
 --users 'readonly,root' --dbs 'db1,db2' \  # filter sql by users and databases
 --speed 0.5 \  # increase(< 1.0) or decrease(> 1.0) the time between two serial sqls proportionally, default 1
 --result-dir output/replay \
 --clean # clean 'output/replay' dir before replay
# Diff replay result
dodo diff --help
# diff replay result which is slower more than 200ms than original
dodo diff --min-duration-diff 200ms --original-sqls 'output/sql/*.sql' output/replay
# diff of two replay result directories
dodo diff replay1/ replay2/
# Export table data
dodo export --help

Config

You may want to pass parameters by config file or environment, see Environment Variables and Configuration Files.

Generate Data

Generate CSV data from create-table SQLs. All databases with similar syntax as Doris are supported, like MySQL, Hive, etc.

Here is an example. See Custom Generation Rules and AI Generation for more:

echo 'create table t1 (
 a varchar(2),
 b struct<foo:tinyint>,
 c date
)' > t1.sql
dodo gendata --ddl t1.sql --rows 5
cat output/gendata/t1/*
sO☆{"foo":-66}☆2020年07月23日
lg☆{"foo":-121}☆2021年06月15日
4☆{"foo":-117}☆2015年06月17日
8h☆{"foo":-83}☆2024年09月06日
KW☆{"foo":7}☆2019年02月02日

Anonymize

This feature is experimental, case-insensitive, which means table1 and TABLE1 will have the same result. Two ways:

Use dodo anonymize:

echo "select * from table1" | dodo anonymize -f -

Use --anonymize flag while dumping:
```
dodo dump ... --anonymize
```

Note

Keep ./dodo_hashdict.yaml if you want the result to be consistent (put it at current directory, or specify by --anonymize-minihash-dict).

Build

Install optional dependences:
- On macOS: vectorscan with Chimera support
- On Linux: hyperscan with Chimera support
Run make (or make build-hyper if the dependences in step 1 are installed)

Update Doris Parser

make gen

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thearas/dodo

Folders and files

Latest commit

History

Repository files navigation

Dodo

Install

Usage

Config

Generate Data

Anonymize

Build

Update Doris Parser

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors

Uh oh!

Languages