Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Adding csv2 file format. Fixing a major bug in the csv2.reader implementation #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
jf-tech merged 1 commit into master from csv2f
Sep 19, 2022

Conversation

@jf-tech
Copy link
Owner

@jf-tech jf-tech commented Sep 19, 2022

Because we use encoding/csv.Reader.ReuseRecord, each call of csv.Reader.Read() return value []string is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to Read(), what's in the linesBuf.record is always duplicate!!

We could fix this problem trivially by turning ReuseRecord off, but that would incur an allocation cost for vast majority of single-line csv operation. That is completely undesired.

So instead, we ourselves cache all the returned strings from (potentially multiple) calls to csv.Reader.Rdad() into reader.records []string slice, that managing that buffer ourselves, thus practically eliminate the over-allocation problem. Accordingly, in the reader.linesBuf, instead of having a record []string, we have recordStart and recordNum to reference into the reader.records.

...ementation
Because we use `encoding/csv.Reader.ReuseRecord`, each call of `csv.Reader.Read()` return value `[]string`
is a cached slice. Given we need to potentially cache multiple lines thus multiple calls to `Read()`,
what's in the `linesBuf.record` is always duplicate!!
We could fix this problem trivially by turning `ReuseRecord` off, but that would incur an allocation
cost for vast majority of single-line csv operation. That is completely undesired.
So instead, we ourselves cache all the returned strings from (potentially multiple) calls to
`csv.Reader.Rdad()` into `reader.records []string` slice, that managing that buffer ourselves, thus
practically eliminate the over-allocation problem. Accordingly, in the `reader.linesBuf`, instead of
having a `record []string`, we have `recordStart` and `recordNum` to reference into the `reader.records`.
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Merging #179 (f450a84) into master (e9d3be8) will not change coverage.
The diff coverage is 100.00%.

@@ Coverage Diff @@
## master #179 +/- ##
=========================================
 Coverage 100.00% 100.00% 
=========================================
 Files 52 53 +1 
 Lines 2971 3020 +49 
=========================================
+ Hits 2971 3020 +49 
Impacted Files Coverage Δ
extensions/omniv21/fileformat/flatfile/csv/decl.go 100.00% <100.00%> (ø)
...tensions/omniv21/fileformat/flatfile/csv/format.go 100.00% <100.00%> (ø)
...tensions/omniv21/fileformat/flatfile/csv/reader.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@jf-tech jf-tech merged commit d24ee6a into master Sep 19, 2022
@jf-tech jf-tech deleted the csv2f branch September 19, 2022 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@wangjia007bond wangjia007bond Awaiting requested review from wangjia007bond

@liangxibing liangxibing Awaiting requested review from liangxibing

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

AltStyle によって変換されたページ (->オリジナル) /