Imagine a text file where each csv record may have different numbers of fields. The task is to write code to output how many fields there are in each record of the file. You can assume there is no header line in the file and can read in from a file or standard input, as you choose.
You can assume a version of rfc4180 for the csv rules which I will explain below for the definition of each line of the file. Here is a lightly edited version of the relevant part of the spec:
Definition of the CSV Format
Each record is located on a separate line, delimited by a line break (CRLF). For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx CRLF
The last record in the file may or may not have an ending line break. For example:
aaa,bbb,ccc CRLF
zzz,yyy,xxx
(Rule 3. does not apply in this challenge)
Within each record, there may be one or more fields, separated by commas. Spaces are considered part of a field and should not be ignored.
Each field may or may not be enclosed in double quotes. If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF
zzz,yyy,xxx
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
"aaa","b CRLF
bb","ccc" CRLF zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
Example
Input:
,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b
bb","ccc","fish",""
Should give the output:
2, 3, 3, 5
Your can give the output values in any way you find most convenient.
Libraries
You can use any library you like.
12 Answers 12
Stax, (削除) 19 (削除ここまで) 12 bytes
èJ§3!!}vAà○しろまるL>
Unpacked, ungolfed, and commented, it looks like this.
_'"/ split *all* of standard input by double quote characters
2:: keep only the even numbered elements
|j split on newlines (implicitly concatenates array of "strings")
m for each line, execute the rest of the program and output
',#^ count the number of commas occurring as substrings, and increment
-
1\$\begingroup\$ How does it work? \$\endgroup\$user9207– user92072018年06月10日 10:03:53 +00:00Commented Jun 10, 2018 at 10:03
-
1\$\begingroup\$ @Anush: I've added some more information. \$\endgroup\$recursive– recursive2018年06月10日 15:27:24 +00:00Commented Jun 10, 2018 at 15:27
R, 40 bytes
(x=count.fields(stdin(),","))[!is.na(x)]
Per the documentation of count.fields
, fields with line breaks get a field count of NA for the initial line, so we filter them out.
JavaScript (ES2018), (削除) 42 (削除ここまで) 59 bytes
s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)
f=
s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)
console.log(f(
`,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b
bb","ccc","fish",""`))
-
\$\begingroup\$ Technically this is ES2018 due to the
s
flag on the regex. Not that it matters that much ;-) And nice use of it, btw! \$\endgroup\$ETHproductions– ETHproductions2018年06月10日 02:07:52 +00:00Commented Jun 10, 2018 at 2:07 -
2\$\begingroup\$ This function only appears to work on one record at a time. I think the problem description requires handling an entire file of multiple records. \$\endgroup\$recursive– recursive2018年06月10日 03:46:18 +00:00Commented Jun 10, 2018 at 3:46
-
\$\begingroup\$ @ETHproductions, good point, will update. \$\endgroup\$Rick Hitchcock– Rick Hitchcock2018年06月10日 11:07:19 +00:00Commented Jun 10, 2018 at 11:07
-
\$\begingroup\$ @recursive, you're right, I misunderstood the inputs. Now updated, at the loss of many many bytes. \$\endgroup\$Rick Hitchcock– Rick Hitchcock2018年06月10日 12:17:21 +00:00Commented Jun 10, 2018 at 12:17
Jelly, 12 bytes
ṣ""m2FỴ=",§‘
A port of recursive's Stax answer - go give credit!
How?
ṣ""m2FỴ=",§‘ - Link: list of characters, V
"" - a double quote character = '"'
ṣ - split (V) at ('"')
m2 - modulo slice with two (1st, 3rd, 5th, ... elements of that)
F - flatten list of lists to a list
Ỵ - split at newlines
", - comma character = ','
= - equal? (vectorises)
§ - sum each
‘ - increment (vectorises)
- (as a full program implicit print)
Maybe you prefer ṣ""m2ẎỴċ€",‘
- Ẏ
is tighten and ċ€
counts the commas in each.
Python, 63 bytes
import csv
def f(s):return map(len,csv.reader(s.split("\n"))
Returns the output in an iterable map
object.
-
2
Perl 5.10.0, (削除) 55 (削除ここまで) 53 bytes
$_=shift;s/"(""|[^"])*"//g;s/^.*$/1+$&=~y:,::/gem;say
Explanation:
$_=shift; # first command-line arg
s/"(""|[^"])*"//g; # remove quoted fields
s/^.*$/ # replace each line
1+$&=~y:,:: # by the number of commas plus 1
/gem;
say # print
Java 10, 101 bytes
s->{for(var p:s.replaceAll("\"[^\"]*\"","x").split("\n"))System.out.println(p.split(",",-1).length);}
Explanation:
s->{ // Method with String parameter and no return-type
for(var p:s.replaceAll("\"[^\"]*\"","x")
// Replace all words within quotes with an "x"
.split("\n")) // Then split by new-line and loop over them:
System.out.println(p.split(",",-1) // Split the item by comma's
.length);} // And print the length of this array
POSIX shell (ShellShoccar-jpn/Parsrs + usp-engineers-community/Open-usp-tukubai), 19 bytes
Dependencies are
parsrc.sh|count 1 1
Output format
For each line, the following line is output:
<row number> <number of columns>
How it works
parsrc.sh
is a parser for CSV who converts such text into list of cells and their value.
Open-usp-tukubai's count
takes a range of field number from parameters and space-separated values to count in such range to summarize.
Here are some comments:
# raw CSV
parsrc.sh |
# 1: row number 2: column number 3: value where LFs are represented as "\n" while real backslashes as "\\"
count 1 1
# 1: row number 2: number of columns in the row
Example run on Termux
Because Termux is not a completely POSIX-compatible so command -p getconf PATH
fails, I am dropping the boilerplate to set PATH environment variable so the shell calls completely POSIX-compatible utilities first. Also because I am lazy to prepare Python 2, I am doing with COMMANDS.SH/count.