Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

opt(stream): add option to directly copy over tables from lower levels (#1700) #1872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mangalaman93 wants to merge 1 commit into main
base: main
Choose a base branch
Loading
from aman/copy-levels

Conversation

@mangalaman93
Copy link
Member

@mangalaman93 mangalaman93 commented Feb 14, 2023
edited
Loading

Also takes a bug fix from PR #1712, commit 58d0674

This PR adds FullCopy option in Stream. This allows sending the table entirely to the writer. If this option is set to true we directly copy over the tables from the last 2 levels. This option increases the stream speed while also lowering the memory consumption on the DB that is streaming the KVs.

For 71GB, compressed and encrypted DB we observed 3x improvement in speed. The DB contained ~65GB in the last 2 levels while remaining in the above levels.

To use this option, the following options should be set in Stream.

stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true

If we use stream writer for receiving the KVs, the encryption mode has to be the same in sender and receiver. This will restrict db.StreamDB() to use the same encryption mode in both input and output DB. Added TODO for allowing different encryption modes.

Copy link

CLAassistant commented Feb 14, 2023
edited
Loading

CLA assistant check
All committers have signed the CLA.

@mangalaman93 mangalaman93 force-pushed the aman/block-length branch 2 times, most recently from e8583aa to a3c70d1 Compare February 14, 2023 14:39
@mangalaman93 mangalaman93 force-pushed the aman/copy-levels branch 2 times, most recently from 5e7a48d to 1f99a5c Compare February 14, 2023 15:52
@mangalaman93 mangalaman93 force-pushed the aman/block-length branch 2 times, most recently from 1375846 to 12a0e42 Compare February 18, 2023 11:40
@mangalaman93 mangalaman93 force-pushed the aman/copy-levels branch 3 times, most recently from f1a05da to 63a4948 Compare February 20, 2023 05:37
Base automatically changed from aman/block-length to release/v4.0 February 21, 2023 06:22
Copy link

coveralls commented Feb 22, 2023
edited
Loading

Coverage Status

Coverage: 61.149% (+0.4%) from 60.713% when pulling 081b41b on aman/copy-levels into a89c52c on main.

Copy link
Member Author

This code has a data race, I am looking into it:

==================
WARNING: DATA RACE
Write at 0x00c000b100e8 by goroutine 2957:
 github.com/dgraph-io/badger/v3.(*StreamWriter).Write.func1()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:212 +0x167
 github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
 github.com/dgraph-io/badger/v3.(*StreamWriter).Write()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:146 +0x194
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).writeIndex()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:188 +0x9c7
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).addCountEntry.func1()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:100 +0x47
Previous write at 0x00c000b100e8 by goroutine 228:
 github.com/dgraph-io/badger/v3.(*StreamWriter).Write.func1()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:212 +0x167
 github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
 github.com/dgraph-io/badger/v3.(*StreamWriter).Write()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/badger/v3@v3.2103.6-0.20230214155941-0e7c6a7a614a/stream_writer.go:146 +0x194
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func2()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:349 +0xd1
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:389 +0x284
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce.func4()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x64
Goroutine 2957 (running) created at:
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*countIndexer).addCountEntry()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/count_index.go:100 +0x4c9
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func1.2()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:338 +0x4e
 github.com/dgraph-io/ristretto/z.(*Buffer).SliceIterate()
 /home/aman/gocode/pkg/mod/github.com/dgraph-io/ristretto@v0.1.1/z/buffer.go:290 +0x30a
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func1()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:336 +0x1cb
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:392 +0x2bc
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce.func4()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x64
Goroutine 228 (running) created at:
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:475 +0x4c5
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func1()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:109 +0xbe4
 github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run.func2()
 /home/aman/gocode/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:123 +0x66
==================

@mangalaman93 mangalaman93 changed the base branch from main-deprecated-v4 to main March 1, 2023 07:18
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a benchmark for this?

@mangalaman93 mangalaman93 force-pushed the aman/sw branch 2 times, most recently from 77973c6 to 908259a Compare June 12, 2023 04:07
Base automatically changed from aman/sw to aman/bug June 12, 2023 04:07
Base automatically changed from aman/bug to main June 12, 2023 04:37
Copy link

This PR has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

Copy link

netlify bot commented Jan 27, 2025
edited
Loading

Deploy Preview for badger-docs canceled.

Name Link
🔨 Latest commit ffd74f3
🔍 Latest deploy log https://app.netlify.com/sites/badger-docs/deploys/67976e13711d860008cabd6a

@mangalaman93 mangalaman93 force-pushed the aman/copy-levels branch 2 times, most recently from c024d71 to ffd74f3 Compare January 27, 2025 11:29
Copy link
Member

closing PR for now, keeping branch active

#1700)
Also takes a bug fix from PR #1712, commit 58d0674
This PR adds FullCopy option in Stream. This allows sending the
table entirely to the writer. If this option is set to true we
directly copy over the tables from the last 2 levels. This option
increases the stream speed while also lowering the memory
consumption on the DB that is streaming the KVs.
For 71GB, compressed and encrypted DB we observed 3x improvement
in speed. The DB contained ~65GB in the last 2 levels while
remaining in the above levels.
To use this option, the following options should be set in Stream.
stream.KeyToList = nil
stream.ChooseKey = nil
stream.SinceTs = 0
db.managedTxns = true
If we use stream writer for receiving the KVs, the encryption mode
has to be the same in sender and receiver. This will restrict
db.StreamDB() to use the same encryption mode in both input and
output DB. Added TODO for allowing different encryption modes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@akon-dey akon-dey Awaiting requested review from akon-dey

@billprovince billprovince Awaiting requested review from billprovince

@skrdgraph skrdgraph Awaiting requested review from skrdgraph

@joshua-goldstein joshua-goldstein Awaiting requested review from joshua-goldstein

1 more reviewer
Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /