Keyboard Shortcuts

File
u :up to issue
m :publish + mail comments
M :edit review message
j / k :jump to file after / before current file
J / K :jump to next file with a comment after / before current file
Side-by-side diff
i :toggle intra-line diffs
e :expand all comments
c :collapse all comments
s :toggle showing all comments
n / p :next / previous diff chunk or comment
N / P :next / previous comment
<Up> / <Down> :next / previous line
<Enter> :respond to / edit current comment
d :mark current comment as done
Issue
u :up to list of issues
m :publish + mail comments
j / k :jump to patch after / before current patch
o / <Enter> :open current patch in side-by-side view
i :open current patch in unified diff view
Issue List
j / k :jump to issue after / before current issue
o / <Enter> :open current issue
# : close issue
Comment/message editing
<Ctrl> + s or <Ctrl> + Enter :save comment
<Esc> :cancel edit
Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(195)
Issues Repositories Search
Open Issues | Closed Issues | All Issues | Sign in with your Google Account to create issues and add comments

Issue 13659043: code review 13659043: encoding/csv: Fix handling of quoted substrings at star...

Can't Edit
Can't Publish+Mail
Start Review
Created:
12 years, 4 months ago by Sudhir
Modified:
11 years ago
Reviewers:
CC:
golang-codereviews, rsc
Visibility:
Public.
encoding/csv: Fix handling of quoted substrings at start of field in a CSV file. The CSV reader could not properly parse a field that began with a quoted substring even when LazyQuotes was turned on. The field needed to be completely quoted. For example, the string |"Quoted Substring" other chars| when parsed would process the second quote as a bare quote and then consume characters (including newlines)until a closing quote before a Comma was encountered. This could sometimes result in memory exhaustion in the case of a large file. The change here fixes this behaviour if (and only if) LazyQuotes is true. If the character following the second quote character in a field is not a Comma or NewLine,the rest of the field is processed as a normal unquoted field until a Comma, Newline or EOF are reached (quotes are treated as BareQuotes and inserted in the field). If the last character in the field is a quote character, it is dropped (to match the quote at the beginning). If not, the leading quote is inserted as part of the field value. Change in behaviour: Input: `abc,"def"ghi,jkl\nMNO,PQR",STU` Old Output: `abc`, `def"ghi,jkl\nMNO,PQR`, `STU` New Output: `abc`, `"def"ghi`, `jkl` `MNO`, `PQR"`, `STU` Note how the previous behaviour sucks in a whole bunch of characters before terminating at the end of the second field on line 2. The new (fixed) handling terminates processing at the comma after 'i' and returns two records instead of one. Two new tests are added in the test package to describe this behaviour (all existing tests pass unchanged). Fixes issue 6352. Fixes issue 3150.

Patch Set 1 #

Patch Set 2 : diff -r 77a4d225cc7e https://code.google.com/p/go #

Patch Set 3 : diff -r 77a4d225cc7e https://code.google.com/p/go #

Total comments: 2

Patch Set 4 : diff -r d71a954bca35 https://code.google.com/p/go #

Patch Set 5 : diff -r d71a954bca35 https://code.google.com/p/go #

Created: 12 years, 4 months ago
Download [raw] [tar.bz2]
Unified diffs Side-by-side diffs Delta from patch set Stats (+44 lines, -1 line) Patch
M src/pkg/encoding/csv/reader.go View 1 2 3 1 chunk +29 lines, -1 line 0 comments Download
M src/pkg/encoding/csv/reader_test.go View 1 2 3 1 chunk +15 lines, -0 lines 0 comments Download
Total messages: 10
|
Sudhir
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go
12 years, 4 months ago (2013年09月11日 05:26:20 UTC) #1
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com),
I'd like you to review this change to
https://code.google.com/p/go 
Sign in to reply to this message.
rsc
Please find a way to preserve the current behavior for the tests listed. I think ...
12 years, 4 months ago (2013年09月11日 19:07:24 UTC) #2
Please find a way to preserve the current behavior for the tests listed.
I think it is much more reasonable for the lazy mode to parse "x"y" as x"y or
"x"y" than as x"y". Why is the leading quote different from the trailing quote?
Also please add some tests of the behavior you are trying to check, in
particular the choice of termination at comma or \n.
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
File src/pkg/encoding/csv/reader_test.go (right):
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
src/pkg/encoding/csv/reader_test.go:115: Output: [][]string{{`a "word"`,
`1"2"`, `a"`, `b`}},
I don't believe this should change.
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
src/pkg/encoding/csv/reader_test.go:121: Output: [][]string{{`a "word"`,
`1"2"`, `a"`}},
I don't believe this should change.
Sign in to reply to this message.
Sudhir
Thanks for the comment. The current behaviour on the tests cannot be preserved without leading ...
12 years, 4 months ago (2013年09月11日 22:43:14 UTC) #3
Thanks for the comment.
The current behaviour on the tests cannot be preserved without leading to
pathological cases as in the problem that triggered this issue (as well as issue
3150).
It is possible to do "x"y" => "x"y" or xy" but not x"y. I think the first option
is preferable, i.e., rewrite the byte.Buffer at the point the second quote is
read to include the leading quote.
I will try and modify the code to do that and modify the tests accordingly.
Cheers
Sudhir
On Sep 12, 2013, at 4:07 AM, rsc@golang.org wrote:
> Please find a way to preserve the current behavior for the tests listed.
> I think it is much more reasonable for the lazy mode to parse "x"y" as
> x"y or "x"y" than as x"y". Why is the leading quote different from the
> trailing quote?
> 
> Also please add some tests of the behavior you are trying to check, in
> particular the choice of termination at comma or \n.
> 
> 
> 
> 
>
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
> File src/pkg/encoding/csv/reader_test.go (right):
> 
>
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
> src/pkg/encoding/csv/reader_test.go:115: Output: [][]string{{`a
> "word"`, `1"2"`, `a"`, `b`}},
> I don't believe this should change.
> 
>
https://codereview.appspot.com/13659043/diff/6001/src/pkg/encoding/csv/reader...
> src/pkg/encoding/csv/reader_test.go:121: Output: [][]string{{`a
> "word"`, `1"2"`, `a"`}},
> I don't believe this should change.
> 
> https://codereview.appspot.com/13659043/
Sign in to reply to this message.
rsc
You don't have to rewrite the bytes.Buffer. Just put the " in always, and then ...
12 years, 4 months ago (2013年09月11日 23:47:28 UTC) #4
You don't have to rewrite the bytes.Buffer. Just put the " in always, and
then slice it away if you don't need it at the end.
Sign in to reply to this message.
Sudhir
Hello golang-dev@googlegroups.com, rsc@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
12 years, 4 months ago (2013年09月12日 11:40:44 UTC) #5
Sign in to reply to this message.
Sudhir
I have uploaded a new version of the code. It now preserves current behavior for ...
12 years, 4 months ago (2013年09月12日 11:40:45 UTC) #6
I have uploaded a new version of the code. It now preserves current behavior for
all existing tests. I have also added two new test cases to verify the current
functionality. The only downside in the code is that there is some garbage
created when parsing any field that has a quote in the middle of a quoted field.
This is because of copying []byte buffers - I couldn't find a way to implement
your suggestion above, possibly because I am still learning Go ...
To summarize the change now:
"x"y" -> x"y
"x"y -> "x"y
As before, the new behaviour applies when LazyQuotes is true.
Sign in to reply to this message.
gobot
Replacing golang-dev with golang-codereviews.
12 years ago (2013年12月20日 16:26:04 UTC) #7
Replacing golang-dev with golang-codereviews.
Sign in to reply to this message.
gobot
Replacing golang-dev with golang-codereviews.
12 years ago (2013年12月20日 16:26:10 UTC) #8
Replacing golang-dev with golang-codereviews.
Sign in to reply to this message.
gobot
Replacing golang-dev with golang-codereviews.
12 years ago (2013年12月20日 16:26:17 UTC) #9
Replacing golang-dev with golang-codereviews.
Sign in to reply to this message.
gobot
R=close To the author of this CL: The Go project has moved to Gerrit Code ...
11 years ago (2014年12月19日 05:12:50 UTC) #10
R=close
To the author of this CL:
The Go project has moved to Gerrit Code Review.
If this CL should be continued, please see the latest version of
https://golang.org/doc/contribute.html for instructions on
how to set up Git and the Go project's Gerrit codereview plugin,
and then create a new change with your current code.
If there has been discussion on this CL, please give a link to it
(golang.org/cl/13659043 is best) in the description in your
new CL.
Thanks very much.
Sign in to reply to this message.
|
This is Rietveld f62528b

AltStyle によって変換されたページ (->オリジナル) /