Why do CSV file formats normally use quoting instead of escaping?

Question 1

There are a lot of variants of the CSV "standard" (or lack thereof). I've never personally see any that use an escape character (like \) instead of surrounding each field with double quotes. Instead of foo,bar,"foo,bar" it would be foo,bar,foo,円bar.

This would be handy for situations where a file needs to manually inspected or edited by hand. When counting commas to find the right field, it seems that it would be easier to tell which ones were not field separators if they escaped instead of quoted.

I don't see how it would make a difference from a parsing perspective, though.

Why quote instead of escape?

Question 2

For text which doesn't contain quotation marks, it reduces the problem to simply searching the text to see if it includes any commas and enclosing in quotation marks if it does, which is simpler and faster than either creating a new string to insert the escaps into or writing it out in parts to emit the escape key when required.

Question 3

I'm not sure it would be easier: For example, in foo,bar,foo\,円bar, the last comma would be a field separator.

Question 4

You really need both. Escaping is shorter. Quoting easier for "humans" to read (though as you point out for some jobs escaping makes it easier for simpler editors to count). Even with Quotes you still need escaping otherwise how do you put " into the cell. Quotes also allow you to put new lines more read-ably. When writting the parser if you are going to implement one you may as well implement both it does not add much complexity.

Question 5

Your question includes the answer, when you wrote "I don't see how it would make a difference from a parsing perspective, though"

There is no compelling reason, it just is. Csv is a data format, so the main goal is to be parseable.

Question 6

The CSV originates from the early seventies (Defined in IBM Fortran 77), it was introduced to give a better data transfer with less errors in punch cards as the previously used fixed length format was prone to errors in case of one or more missing spaces. The format is described in IBM DB2 administrative guide: Load, Import and Export file formats
ref: https://www.columbia.edu/sec/acis/db2/db2d0/db2d053.htm

The format is recently defined in RFC 4180, and needs to follow these guidelines to be compliant
What is the RFC 4180 CSV file? RFC 4180 defines a standard dialect for CSV, that specifies delimiters, quoting, and line breaks. As well as resolving these historical variations in CSV, RFC 4180 also resolves other potential inconsistencies, such as requiring the same number of fields on each line.
Ref: https://www.ietf.org/rfc/rfc4180.txt

The suggestions for a standard in RFC 4180 is later enchanced by W3C in 2015

The file type is used by all major players in the industry. Major changes are not easily applied.

A CSV file doesn't need to rely on commas as the separator between elements. The delimiter can be a semicolon, space, or some other character, though the comma is most common.
Eg in countries wich uses comma as decimal separator the semicolon is used as delimiter between elements.
This is why the escape character is not needed.

Question 7

This just moves the question around though - why did the RFC authors decide to use quoting rather than escaping?

Question 8

@plykkegaard The RFC contains this comment: "This section documents the format that seems to be followed by most implementations" so it looks like the double-quote standard was already the defacto standard.

Question 9

Call it something else rather than CSV an choose your own path or follow the suggested implementation guidelines

Question 10

Yeah you can always down vote if the answer does not suit you! Integrations systems like Microsoft BizTalk Server or Seeburger Integration Suite will have a hard time with flatfiles having escape characters rather than quotes, use another separator like tabe or pipe and you are good to go

Question 11

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Question 12

Chicken and egg.

If I was defining a spec, I'd escape for the reasons that you state. But any CSV parsing tool that you encounter (including Excel, sqloader, etc) is almost certain to use quoting, not escaping. So if you want to produce it, you need to produce it quoted. If you want to consume it, it is safe to assume that whoever generates it will quote.

Bryan Oakley Bryan Oakley 25.5k5 gold badges67 silver badges90 bronze badges · Answer 1 · 2013-07-20 15:17:05Z

Your question includes the answer, when you wrote "I don't see how it would make a difference from a parsing perspective, though"

There is no compelling reason, it just is. Csv is a data format, so the main goal is to be parseable.

plykkegaard plykkegaard 214 bronze badges · Answer 2 · 2023-10-13 10:59:55Z

The CSV originates from the early seventies (Defined in IBM Fortran 77), it was introduced to give a better data transfer with less errors in punch cards as the previously used fixed length format was prone to errors in case of one or more missing spaces. The format is described in IBM DB2 administrative guide: Load, Import and Export file formats
ref: https://www.columbia.edu/sec/acis/db2/db2d0/db2d053.htm

The format is recently defined in RFC 4180, and needs to follow these guidelines to be compliant
What is the RFC 4180 CSV file? RFC 4180 defines a standard dialect for CSV, that specifies delimiters, quoting, and line breaks. As well as resolving these historical variations in CSV, RFC 4180 also resolves other potential inconsistencies, such as requiring the same number of fields on each line.
Ref: https://www.ietf.org/rfc/rfc4180.txt

The suggestions for a standard in RFC 4180 is later enchanced by W3C in 2015

The file type is used by all major players in the industry. Major changes are not easily applied.

A CSV file doesn't need to rely on commas as the separator between elements. The delimiter can be a semicolon, space, or some other character, though the comma is most common.
Eg in countries wich uses comma as decimal separator the semicolon is used as delimiter between elements.
This is why the escape character is not needed.

This just moves the question around though - why did the RFC authors decide to use quoting rather than escaping?
@plykkegaard The RFC contains this comment: "This section documents the format that seems to be followed by most implementations" so it looks like the double-quote standard was already the defacto standard.
Call it something else rather than CSV an choose your own path or follow the suggested implementation guidelines
Yeah you can always down vote if the answer does not suit you! Integrations systems like Microsoft BizTalk Server or Seeburger Integration Suite will have a hard time with flatfiles having escape characters rather than quotes, use another separator like tabe or pipe and you are good to go
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

btilly btilly 18.4k1 gold badge52 silver badges77 bronze badges · Answer 3 · 2013-07-20 16:31:23Z

Chicken and egg.

If I was defining a spec, I'd escape for the reasons that you state. But any CSV parsing tool that you encounter (including Excel, sqloader, etc) is almost certain to use quoting, not escaping. So if you want to produce it, you need to produce it quoted. If you want to consume it, it is safe to assume that whoever generates it will quote.

Stack Exchange Network

Why do CSV file formats normally use quoting instead of escaping?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Why do CSV file formats normally use quoting instead of escaping?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions