2
\$\begingroup\$

In trying to answer this question on StackOverflow about using the gocarina/gocsv package to read a CSV with a header column name that has a comma, I got to thinking about how to preprocess the first record of a CSV as it's being read.

I thought of a reader/fixer which takes a reader of CSV data, reads the first (header) record from the data and removes specified strings and sends that along, then forwards all subsequent bytes along with the least amount of interference.

I was able to code this up:

package main
import (
 "bufio"
 "bytes"
 "encoding/csv"
 "fmt"
 "io"
 "log"
 "strings"
)
// HeaderFixer removes certains strings from the field names in the header record of CSV data.
type HeaderFixer struct {
 rd *bufio.Reader
 removes []string
 done bool
}
func NewReader(r io.Reader, removes []string) *HeaderFixer {
 return &HeaderFixer{
 bufio.NewReader(r),
 removes,
 false,
 }
}
func (hf *HeaderFixer) Read(p []byte) (n int, err error) {
 if hf.done {
 n, err = hf.rd.Read(p)
 return
 }
 cr := csv.NewReader(hf.rd)
 header, err := cr.Read()
 if err != nil {
 return
 }
 for i, field := range header {
 for _, remove := range hf.removes {
 field = strings.Replace(field, remove, "", -1)
 }
 header[i] = field
 }
 var buf bytes.Buffer
 cw := csv.NewWriter(&buf)
 cw.Write(header)
 cw.Flush()
 copy(p, buf.Bytes())
 n = int(cr.InputOffset())
 hf.done = true
 return
}
var csvBlob = `"Col,1","Col
2"
a,b
c,d
e,f
g,h
`
func main() {
 sr := strings.NewReader(csvBlob)
 hr := NewReader(sr, []string{",", "\n"})
 cr := csv.NewReader(hr)
 for {
 record, err := cr.Read()
 if err != nil {
 if err == io.EOF {
 break
 }
 log.Fatal(err)
 }
 fmt.Println(record)
 }
}

It definitely removes the the newline and comma from the header:

[Col1 Col2]
[a b]
[c d]
[e f]
[g h]

Using InputOffset() seems to be correct for reporting how far the header/fixer read into the original CSV data. I'm not so sure about the "least amount of interference" in the guard clause that just wants to forward bytes along as efficiently as possible.

I also started this exploration by looking at golang.org/x/text/transform, but I could not figure out how to make that work for me... the only example is deprecated.

Toby Speight
87.3k14 gold badges104 silver badges322 bronze badges
asked Jan 16, 2023 at 2:13
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Complexity is more apparent to readers than writers. If you write a piece of code and it seems simple to you, but other people think it is complex, then it is complex.

"Obvious" is in the mind of the reader: it’s easier to notice that someone else’s code is nonobvious than to see problems with your own code. If someone reading your code says it’s not obvious, then it’s not obvious, no matter how clear it may seem to you.

A Philosophy of Software Design, John Ousterhout


I read your code. I revised your code to be simpler and more obvious. It provides a simple, obvious replacement for csv.Reader Read method.


package main
import (
 "encoding/csv"
 "fmt"
 "io"
 "log"
 "strings"
)
// HeaderReader is a csv.Reader that removes strings
// from the field names in the header record.
type HeaderReader struct {
 cr *csv.Reader
 removes []string
 header bool
}
func NewHeaderReader(r io.Reader, removes []string) *HeaderReader {
 return &HeaderReader{
 cr: csv.NewReader(r),
 removes: removes,
 header: false,
 }
}
func (hr *HeaderReader) Read() (record []string, err error) {
 if hr.header {
 return hr.cr.Read()
 }
 hr.header = true
 header, err := hr.cr.Read()
 if err != nil {
 return nil, err
 }
 for i, field := range header {
 for _, remove := range hr.removes {
 field = strings.ReplaceAll(field, remove, "")
 }
 header[i] = field
 }
 return header, nil
}
var csvBlob = `"Col,1","Col
2"
a,b
c,d
e,f
g,h
`
func main() {
 sr := strings.NewReader(csvBlob)
 hr := NewHeaderReader(sr, []string{",", "\n"})
 for {
 record, err := hr.Read()
 if err != nil {
 if err == io.EOF {
 break
 }
 log.Fatal(err)
 }
 fmt.Println(record)
 }
}

https://go.dev/play/p/MW8KScSr6uF

[Col1 Col2]
[a b]
[c d]
[e f]
[g h]

If we want a complete replacement for csv.Reader then we can add the remaining csv.Reader methods as pass-through wrappers.

func (hr *HeaderReader) FieldPos(field int) (line, column int) {
 return hr.cr.FieldPos(field)
}
func (hr *HeaderReader) InputOffset() int64 {
 return hr.cr.InputOffset()
}
func (hr *HeaderReader) ReadAll() (records [][]string, err error) {
 return hr.cr.ReadAll()
}
answered Jan 16, 2023 at 17:34
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.