Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

jf-tech merged 1 commit into master from csv2f

Sep 19, 2022

Merged

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

jf-tech merged 1 commit into master from csv2f

Sep 19, 2022

Conversation

@jf-tech

Copy link

Owner

@jf-tech jf-tech commented Sep 19, 2022

Because we use encoding/csv.Reader.ReuseRecord, each call of csv.Reader.Read() return value []string is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to Read(), what's in the linesBuf.record is always duplicate!!

We could fix this problem trivially by turning ReuseRecord off, but that would incur an allocation cost for vast majority of single-line csv operation. That is completely undesired.

So instead, we ourselves cache all the returned strings from (potentially multiple) calls to csv.Reader.Rdad() into reader.records []string slice, that managing that buffer ourselves, thus practically eliminate the over-allocation problem. Accordingly, in the reader.linesBuf, instead of having a record []string, we have recordStart and recordNum to reference into the reader.records.

@jf-tech


 Adding csv2 file format. Fixing a major bug in the csv2.reader impl...

f450a84

...ementation
Because we use `encoding/csv.Reader.ReuseRecord`, each call of `csv.Reader.Read()` return value `[]string`
is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to `Read()`,
what's in the `linesBuf.record` is always duplicate!!
We could fix this problem trivially by turning `ReuseRecord` off, but that would incur an allocation
cost for vast majority of single-line csv operation. That is completely undesired.
So instead, we ourselves cache all the returned strings from (potentially multiple) calls to
`csv.Reader.Rdad()` into `reader.records []string` slice, that managing that buffer ourselves, thus
practically eliminate the over-allocation problem. Accordingly, in the `reader.linesBuf`, instead of
having a `record []string`, we have `recordStart` and `recordNum` to reference into the `reader.records`.

@jf-tech jf-tech requested review from liangxibing and wangjia007bond

September 19, 2022 05:04

@jf-tech jf-tech self-assigned this

Sep 19, 2022

@codecov

Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Merging #179 (f450a84) into master (e9d3be8) will not change coverage.
The diff coverage is 100.00%.

@@ Coverage Diff @@
## master #179 +/- ##
=========================================
 Coverage 100.00% 100.00% 
=========================================
 Files 52 53 +1 
 Lines 2971 3020 +49 
=========================================
+ Hits 2971 3020 +49

Impacted Files	Coverage Δ
extensions/omniv21/fileformat/flatfile/csv/decl.go	`100.00% <100.00%> (ø)`
...tensions/omniv21/fileformat/flatfile/csv/format.go	`100.00% <100.00%> (ø)`
...tensions/omniv21/fileformat/flatfile/csv/reader.go	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jf-tech jf-tech merged commit d24ee6a into master

Sep 19, 2022

@jf-tech jf-tech deleted the csv2f branch

September 19, 2022 05:06

Labels

None yet

1 participant

@jf-tech

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Uh oh!

Conversation

@jf-tech jf-tech commented Sep 19, 2022

Uh oh!

codecov bot commented Sep 19, 2022

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

Uh oh!

Conversation

@jf-tech jf-tech commented Sep 19, 2022

Uh oh!

codecov bot commented Sep 19, 2022

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179

Adding csv2 file format. Fixing a major bug in the `csv2.reader` implementation #179