Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

tarantool/sdvg

Repository files navigation

Synthetic Data Values Generator (SDVG)

Release Pre-release CI Coverage Status Language License

Documentation version

Description

SDVG (Synthetic Data Values Generator) is a tool for generating synthetic data. It supports various run modes, data types for generation, and output formats.

scheme.png

Run modes:

  • CLI - generate data, create configs, and validate them via the console;
  • HTTP server - accepts generation requests through an HTTP API.

Data types:

  • strings (english, russian);
  • integers and floating-point numbers;
  • dates with timestamps;
  • UUID.

String subtypes:

  • random strings;
  • texts;
  • first names;
  • last names;
  • phone numbers;
  • patterns.

Each data type can be generated with the following options:

  • specify percentage/number of unique values per column;
  • ordered generation (sequence);
  • foreign key reference;
  • idempotent generation using a seed number;
  • value generation from ranges with percentage-based distribution.

Output formats:

  • devnull;
  • CSV files;
  • Parquet files;
  • HTTP API;
  • Tarantool Column Store HTTP API.

Installation

Standard installation

You can install SDVG by downloading the appropriate binary version from the GitHub Releases page.

Download binary for your OS:

# Linux (x86-64)
curl -Lo sdvg https://github.com/tarantool/sdvg/releases/latest/download/sdvg-linux-amd64
# Linux (ARM64)
curl -Lo sdvg https://github.com/tarantool/sdvg/releases/latest/download/sdvg-linux-arm64
# macOS (x86-64)
curl -Lo sdvg https://github.com/tarantool/sdvg/releases/latest/download/sdvg-darwin-amd64
# macOS (ARM64)
curl -Lo sdvg https://github.com/tarantool/sdvg/releases/latest/download/sdvg-darwin-arm64

Install binary in your system:

chmod +x sdvg
sudo mv sdvg /usr/local/bin/sdvg

Check that everything works correctly:

sdvg version

Compile and install from sources

To compile and install this tool, you can use go install command:

# To get the specified version
go install github.com/tarantool/sdvg@0.0.2
# To get a version from the master branch
go clean -modcache
go install github.com/tarantool/sdvg@latest

Check that everything works correctly:

sdvg version

Quick Start

Here's an example of a data model that generates 10,000 user rows and writes them to a CSV file:

output:
 type: csv
models:
 user:
 rows_count: 10000
 columns:
 - name: id
 type: uuid
 - name: name
 type: string
 type_params:
 logical_type: first_name

Save this as simple_model.yml, then run:

sdvg generate simple_model.yml

This will create a CSV file with fake user data like id and name:

id,name
c8a53cfd-1089-4154-9627-560fbbea2fef,Sutherlan
b5c024f8-3f6f-43d3-b021-0bb2305cc680,Hilton
5adf8218-7b53-41bb-873d-c5768ca6afa2,Craggy
...

To launch the generator in interactive mode:

sdvg

To view available commands and arguments:

sdvg -h
sdvg --help
sdvg generate -h

More information can be found in the user guide.

Next Steps

Maintainers

AltStyle によって変換されたページ (->オリジナル) /