I recently decided to try to learn the Go language. In order to do this, I wrote a small program which parses a JSON configuration file, and generates N bson documents according to its properties.
For example, with this config file:
{
"collection": "test",
"count": 100,
"content" : {
"name": {
"type": "string",
"nullPercentage": 10,
"length": 8
},
"count": {
"type": "int",
"nullPercentage": 30,
"min": 1,
"max": 200
},
"verified": {
"type": "boolean",
"nullPercentage": 0
},
"firstArray": {
"type": "array",
"nullPercentage": 10,
"size": 3,
"arrayContent": {
"type": "string",
"length": 3
}
},
"firstObject": {
"type": "object",
"nullPercentage": 10,
"objectContent": {
"key1": {
"type": "string",
"nullPercentage": 0,
"length": 12
},
"key2": {
"type": "int",
"nullPercentage": 50,
"min": 10,
"max": 20
}
}
}
}
}
the program will generate 100000 bson objects which would look like this:
{
"count": 55,
"firstArray": [
"tco",
"nua",
"uim"
],
"name": "zninfepa",
"verified": false
},
{
"count": 67,
"firstArray": [
"djt",
"cei",
"lty"
],
"firstObject": {
"key1": "nbbogspsvqsw",
"key2": 19
},
"verified": true
},
...
These documents are then stored in a database (mongoDB), but I skip that part in this question to focus on the generation of bson documents. I wrote a small prototype in Java, and then tried to convert it to Go. Here is what I have so far:
package main
import (
"encoding/json"
"fmt"
"gopkg.in/mgo.v2/bson"
"io/ioutil"
"math/rand"
"os"
"time"
)
const (
letterBytes = "abcdefghijklmnopqrstuvwxyz"
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits
)
// GeneratorJSON structure containing all possible options
type GeneratorJSON struct {
// Type of object to genereate. string | int | boolean supported for the moment
Type string `json:"type"`
// For `string` type only. Specify the length of the string to generate
Length int `json:"length"`
// Percentage of documents that won't contains this field
NullPercentage int `json:"nullPercentage"`
// For `int` type only. Lower bound for the int to generate
Min int `json:"min"`
// For `int` type only. Higher bound for the int to generate
Max int `json:"max"`
// For `array` only. Size of the array
Size int `json:"size"`
// For `array` only. GeneratorJSON to fill the array. Need to
// pass a pointer here to avoid 'invalid recursive type' error
ArrayContent *GeneratorJSON `json:"arrayContent"`
// For `object` only. List of GeneratorJSON to generate the content
// of the object
ObjectContent map[string]GeneratorJSON `json:"objectContent"`
}
// Collection structure storing global collection info
type Collection struct {
// Collection name in the database
Collection string `json:"collection"`
// Number of documents to insert in the collection
Count int `json:"count"`
// Schema of the documents for this collection
Content map[string]GeneratorJSON `json:"content"`
}
// Generatorer interface for all generator objects
type Generatorer interface {
// Get a random value according to the generator type. string | int | boolean supported for the moment
getValue(r rand.Rand) interface{}
getCommonProperties() CommonProperties
}
// CommonProper interface for commonProperties object ( methods with same behavior for each generator)
type CommonProper interface {
// Generate a pseudo-random boolean with `nullPercentage` chance of being false
exist(r rand.Rand) bool
// Get the key of the generator
getKey() string
}
// CommonProperties store
type CommonProperties struct {
key string
nullPercentage int
}
// StringGenerator struct that implements Generatorer. Used to
// generate random string of `length` length
type StringGenerator struct {
common CommonProperties
length int
}
// IntGenerator struct that implements Generatorer. Used to
// generate random int between `min` and `max`
type IntGenerator struct {
common CommonProperties
min int
max int
}
// BoolGenerator struct that implements Generatorer. Used to
// generate random bool
type BoolGenerator struct {
common CommonProperties
}
// ArrayGenerator struct that implements Generatorer. Used to
// generate random array
type ArrayGenerator struct {
common CommonProperties
size int
generator Generatorer
}
// ObjectGenerator struct that implements Generatorer. Used to
// generate random object
type ObjectGenerator struct {
common CommonProperties
generatorList []Generatorer
}
func (c CommonProperties) exist(r rand.Rand) bool { return r.Intn(100) > c.nullPercentage }
func (c CommonProperties) getKey() string { return c.key }
// getValue returns a random String of `g.length` length
func (g StringGenerator) getValue(r rand.Rand) interface{} {
by := make([]byte, g.length)
for i, cache, remain := g.length-1, r.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = r.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
by[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(by)
}
// getValue returns a random int between `g.min` and `g.max`
func (g IntGenerator) getValue(r rand.Rand) interface{} { return r.Intn(g.max-g.min) + g.min }
// getValue returns a random boolean
func (g BoolGenerator) getValue(r rand.Rand) interface{} { return r.Int()%2 == 0 }
// getValue returns a random array
func (g ArrayGenerator) getValue(r rand.Rand) interface{} {
array := make([]interface{}, g.size)
for i := 0; i < g.size; i++ {
array[i] = g.generator.getValue(r)
}
return array
}
// getValue returns a random object
func (g ObjectGenerator) getValue(r rand.Rand) interface{} {
m := bson.M{}
for _, gen := range g.generatorList {
if gen.getCommonProperties().exist(r) {
m[gen.getCommonProperties().getKey()] = gen.getValue(r)
}
}
return m
}
func (g StringGenerator) getCommonProperties() CommonProperties { return g.common }
func (g IntGenerator) getCommonProperties() CommonProperties { return g.common }
func (g BoolGenerator) getCommonProperties() CommonProperties { return g.common }
func (g ArrayGenerator) getCommonProperties() CommonProperties { return g.common }
func (g ObjectGenerator) getCommonProperties() CommonProperties { return g.common }
// GetPropertyGenerator returns an array of generator from
// a list of GeneratorJSON
func GetPropertyGenerator(k string, v GeneratorJSON) Generatorer {
switch v.Type {
case "string":
return StringGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}, length: v.Length}
case "int":
return IntGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}, min: v.Min, max: v.Max}
case "boolean":
return BoolGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}}
case "array":
return ArrayGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}, size: v.Size, generator: GetPropertyGenerator("", *v.ArrayContent)}
case "object":
var genArr = GeneratePropertyGeneratorList(v.ObjectContent)
return ObjectGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}, generatorList: genArr}
default:
return BoolGenerator{common: CommonProperties{key: k, nullPercentage: v.NullPercentage}}
}
}
// GeneratePropertyGeneratorList create an array of generators from a JSON GeneratorJSON document
func GeneratePropertyGeneratorList(content map[string]GeneratorJSON) []Generatorer {
genArr := make([]Generatorer, 0)
for k, v := range content {
genArr = append(genArr, GetPropertyGenerator(k, v))
}
return genArr
}
func main() {
// Create a rand.Rand object to generate our random values
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
// read the json config file
file, e := ioutil.ReadFile("./config.json")
if e != nil {
fmt.Printf("File error: %v\n", e)
os.Exit(1)
}
// map to a json object
var collection Collection
err := json.Unmarshal(file, &collection)
if err != nil {
panic(err)
}
// arrays that store all generators
var genArr = GeneratePropertyGeneratorList(collection.Content)
// counter for already generated documents
count := 0
// array that store 10 bson documents
var docList [10]bson.M
for count < collection.Count {
for i := 0; i < 10; i++ {
m := bson.M{}
// iterate over generators to create values for each key of the bson document
for _, v := range genArr {
// check for exist before generating a value to avoid unneccessary computations
if v.getCommonProperties().exist(*randSource) {
m[v.getCommonProperties().getKey()] = v.getValue(*randSource)
}
}
docList[i] = m
// insert docs in database
count += 10
}
}
// pretty print last 10 generated Objects
rawjson, err := json.MarshalIndent(docList, "", " ")
if err != nil {
panic("failed")
}
fmt.Printf("generated: %s", string(rawjson))
}
go vet
and golint
both return no warnings for this code. How can this be improved, in terms of readability first, and then in terms of performance?
EDIT
I also created some benchmarks:
test file :
package main
import (
"encoding/json"
"fmt"
"gopkg.in/mgo.v2/bson"
"io/ioutil"
"math/rand"
"os"
"testing"
"time"
)
func BenchmarkRandomString(b *testing.B) {
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
stringGenerator := StringGenerator{common: CommonProperties{key: "key", nullPercentage: 0}, length: 5}
for n := 0; n < b.N; n++ {
stringGenerator.getValue(*randSource)
}
}
func BenchmarkRandomInt(b *testing.B) {
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
intGenerator := IntGenerator{common: CommonProperties{key: "key", nullPercentage: 0}, min: 0, max: 100}
for n := 0; n < b.N; n++ {
intGenerator.getValue(*randSource)
}
}
func BenchmarkRandomBool(b *testing.B) {
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
boolGenerator := BoolGenerator{common: CommonProperties{key: "key", nullPercentage: 0}}
for n := 0; n < b.N; n++ {
boolGenerator.getValue(*randSource)
}
}
func BenchmarkJSONGeneration(b *testing.B) {
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
file, e := ioutil.ReadFile("./config.json")
if e != nil {
fmt.Printf("File error: %v\n", e)
os.Exit(1)
}
var collection Collection
err := json.Unmarshal(file, &collection)
if err != nil {
panic(err)
}
var genArr = GeneratePropertyGeneratorList(collection.Content)
var docList [1000]bson.M
for n := 0; n < b.N; n++ {
for i := 0; i < 1000; i++ {
m := bson.M{}
for _, v := range genArr {
if v.getCommonProperties().exist(*randSource) {
m[v.getCommonProperties().getKey()] = v.getValue(*randSource)
}
}
docList[i] = m
}
}
}
and here are the results (go test -bench=.
):
BenchmarkRandomString-4 10000000 221 ns/op
BenchmarkRandomInt-4 30000000 59.2 ns/op
BenchmarkRandomBool-4 50000000 37.8 ns/op
BenchmarkJSONGeneration-4 500 2516883 ns/op
And 'real scenario' benchmark (1000000 documents with above config.json file) give this :
generating json doc only : 4.5s
generating json doc + inserting in db : 17s
Edit 2:
I've spend some time improving this program and open sourced it on github. It's available here: feliixx/mgodatagen
Thanks everybody!
2 Answers 2
In your main, you "iterate over generators to create values for each key". You could actually create a "base" generator for this!
It would be an array generator (10 elements, 0 nullPercentage) of objects (being the objects that you generate).
Using @ferada's answer leads also to much less code (I also renamed CommonProper
to EmptyGenerator
). It uses embedding (one could further optimize, by using the BoolGenerator
as base - instead of EmptyGenerator
).
There was a typo on the exist
function : it should be >= g.nullPercentage
and not >
(because r.Intn(100)
returns between 0 and 99 included).
Here is what I come up with:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"math/rand"
"os"
"time"
"gopkg.in/mgo.v2/bson"
)
const (
letterBytes = "abcdefghijklmnopqrstuvwxyz"
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits
)
// GeneratorJSON structure containing all possible options
type GeneratorJSON struct {
// Type of object to genereate. string | int | boolean supported for the moment
Type string `json:"type"`
// For `string` type only. Specify the length of the string to generate
Length int `json:"length"`
// Percentage of documents that won't contains this field
NullPercentage int `json:"nullPercentage"`
// For `int` type only. Lower bound for the int to generate
Min int `json:"min"`
// For `int` type only. Higher bound for the int to generate
Max int `json:"max"`
// For `array` only. Size of the array
Size int `json:"size"`
// For `array` only. GeneratorJSON to fill the array. Need to
// pass a pointer here to avoid 'invalid recursive type' error
ArrayContent *GeneratorJSON `json:"arrayContent"`
// For `object` only. List of GeneratorJSON to generate the content
// of the object
ObjectContent map[string]GeneratorJSON `json:"objectContent"`
}
// Collection structure storing global collection info
type Collection struct {
// Collection name in the database
Collection string `json:"collection"`
// Number of documents to insert in the collection
Count int `json:"count"`
// Schema of the documents for this collection
Content map[string]GeneratorJSON `json:"content"`
}
// Generator interface for all generator objects
type Generator interface {
Key() string
// Get a random value according to the generator type. string | int | boolean supported for the moment
Value(r rand.Rand) interface{}
Exists(r rand.Rand) bool
}
// EmptyGenerator serves as base for the actual generators
type EmptyGenerator struct {
key string
nullPercentage int
}
// Key returns the key of the object
func (g EmptyGenerator) Key() string { return g.key }
// Exists returns true if the generation should be performed
func (g EmptyGenerator) Exists(r rand.Rand) bool { return r.Intn(100) >= g.nullPercentage }
// StringGenerator struct that implements Generator. Used to
// generate random string of `length` length
type StringGenerator struct {
EmptyGenerator
length int
}
// Value returns a random String of `g.length` length
func (g StringGenerator) Value(r rand.Rand) interface{} {
by := make([]byte, g.length)
for i, cache, remain := g.length-1, r.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = r.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
by[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(by)
}
// IntGenerator struct that implements Generator. Used to
// generate random int between `min` and `max`
type IntGenerator struct {
EmptyGenerator
min int
max int
}
// Value returns a random int between `g.min` and `g.max`
func (g IntGenerator) Value(r rand.Rand) interface{} { return r.Intn(g.max-g.min) + g.min }
// BoolGenerator struct that implements Generator. Used to
// generate random bool
type BoolGenerator struct {
EmptyGenerator
}
// Value returns a random boolean
func (g BoolGenerator) Value(r rand.Rand) interface{} { return r.Int()%2 == 0 }
// ArrayGenerator struct that implements Generator. Used to
// generate random array
type ArrayGenerator struct {
EmptyGenerator
size int
generator Generator
}
// Value returns a random array
func (g ArrayGenerator) Value(r rand.Rand) interface{} {
array := make([]interface{}, g.size)
for i := 0; i < g.size; i++ {
array[i] = g.generator.Value(r)
}
return array
}
// ObjectGenerator struct that implements Generator. Used to
// generate random object
type ObjectGenerator struct {
EmptyGenerator
generators []Generator
}
// Value returns a random object
func (g ObjectGenerator) Value(r rand.Rand) interface{} {
m := bson.M{}
for _, gen := range g.generators {
if gen.Exists(r) {
m[gen.Key()] = gen.Value(r)
}
}
return m
}
// NewGenerator returns a new Generator based on a JSON configuration
func NewGenerator(k string, v GeneratorJSON) Generator {
eg := EmptyGenerator{key: k, nullPercentage: v.NullPercentage}
switch v.Type {
case "string":
return StringGenerator{EmptyGenerator: eg, length: v.Length}
case "int":
return IntGenerator{EmptyGenerator: eg, min: v.Min, max: v.Max}
case "boolean":
return BoolGenerator{EmptyGenerator: eg}
case "array":
return ArrayGenerator{EmptyGenerator: eg, size: v.Size, generator: NewGenerator("", *v.ArrayContent)}
case "object":
return ObjectGenerator{EmptyGenerator: eg, generators: NewGeneratorsFromMap(v.ObjectContent)}
default:
return BoolGenerator{EmptyGenerator: eg}
}
}
// NewGeneratorsFromMap creates a slice of generators based on a JSON configuration map
func NewGeneratorsFromMap(content map[string]GeneratorJSON) []Generator {
genArr := make([]Generator, 0)
for k, v := range content {
genArr = append(genArr, NewGenerator(k, v))
}
return genArr
}
func main() {
// Create a rand.Rand object to generate our random values
var randSource = rand.New(rand.NewSource(time.Now().UnixNano()))
// read the json config file
file, e := ioutil.ReadFile("./config.json")
if e != nil {
fmt.Printf("File error: %v\n", e)
os.Exit(1)
}
// map to a json object
var collection Collection
err := json.Unmarshal(file, &collection)
if err != nil {
panic(err)
}
// arrays that store all generators
generator := baseGenerator(collection.Content)
// counter for already generated documents
count := 0
// array that store 10 bson documents
var docList []interface{}
for count < collection.Count {
docList = generator.Value(*randSource).([]interface{})
// insert docs in database
count += generator.size
}
// pretty print last 10 generated Objects
rawjson, err := json.MarshalIndent(docList, "", " ")
if err != nil {
panic("failed")
}
fmt.Printf("generated: %s", string(rawjson))
}
func baseGenerator(content map[string]GeneratorJSON) ArrayGenerator {
return ArrayGenerator{
size: 10,
generator: ObjectGenerator{
generators: NewGeneratorsFromMap(content),
},
}
}
edit regarding the performance
I don't see simple changes which could drastically improve the speed.
There are some minor changes (from looking at the source of rand.go
):
func (g BoolGenerator) Value(r rand.Rand) interface{} { return r.Int63()&1 == 0 }
The Exists
method could also be faster if it didn't use Intn
. For instance, it could use the last 7 bits (from 0 to 127) of Int63
(or multiply the probability by 10 and use the last 10 bits - from 0 to 1023).
But there is probably a bigger optimization regarding the generation + insertion in the database.
Currently the data is generated and inserted into the database sequentially. You could do it concurrently with a record chan bson.M
channel and two goroutines:
- a producer which fills the
record
- a consumer which inserts them into the database (could be the main goroutine)
Possible code:
record := make(chan []interface{}, 3) // 3 is the buffer size
// generate records concurrently
go func() {
for count < collection.Count {
record <- generator.Value(*randSource).([]interface{})
count += generator.size
}
close(record)
}()
// save the records
for r := range record {
_ = r
// insert record in DB
}
If you have a multicore processor, this should improve the overall performance: instead of having 4.5s + 12.5s, it should be much closer to 12.5s (with some overhead for the first run and the synchronization)
-
\$\begingroup\$ Thanks for the time and effort you put in your answer, it definitely helped me a lot! \$\endgroup\$felix– felix2017年07月17日 11:21:42 +00:00Commented Jul 17, 2017 at 11:21
A few notes, in general it looks good to me.
Generatorer
should probably just beGenerator
,exist
should beexists
.CommonProper
is harder, perhapsWithCommonProperties
orHasCommonProperties
.CommonProperties
doesn't need acommon
name in all the structs, you could simply mention it inline:type StringGenerator struct { CommonProperties length int }
Though with that you'll have to mention the name twice when creating the objects, e.g.
StringGenerator{CommonProperties: CommonProperties{key: k, nullPercentage: v.NullPercentage}, length: v.Length}
.Also you'll only need one
getCommonProperties
with that:func (g CommonProperties) getCommonProperties() CommonProperties { return g }
basically.
Explore related questions
See similar questions with these tags.
// insert docs in database
andcount += 10
should probably be after the next}
(outside thefor i := 0; i < 10; i++
loop) \$\endgroup\$-cpuprofile
show that gccollection take 1/3 of the time, followed bygetValue()
functions, but I don't know how to optimize that. I'll take a look at buffered channels, it seems to be an interesting idea ! \$\endgroup\$