Count the number of fields in each csv record

Question 1

Imagine a text file where each csv record may have different numbers of fields. The task is to write code to output how many fields there are in each record of the file. You can assume there is no header line in the file and can read in from a file or standard input, as you choose.

You can assume a version of rfc4180 for the csv rules which I will explain below for the definition of each line of the file. Here is a lightly edited version of the relevant part of the spec:

Definition of the CSV Format

Each record is located on a separate line, delimited by a line break (CRLF). For example:
```
aaa,bbb,ccc CRLF
```

zzz,yyy,xxx CRLF

The last record in the file may or may not have an ending line break. For example:
```
aaa,bbb,ccc CRLF
```

zzz,yyy,xxx

(Rule 3. does not apply in this challenge)

Within each record, there may be one or more fields, separated by commas. Spaces are considered part of a field and should not be ignored.
Each field may or may not be enclosed in double quotes. If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:
```
"aaa","bbb","ccc" CRLF
```

zzz,yyy,xxx

Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
```
"aaa","b CRLF
```

bb","ccc" CRLF zzz,yyy,xxx

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
```
"aaa","b""bb","ccc"
```

Example

Input:

,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b 
bb","ccc","fish",""

Should give the output:

2, 3, 3, 5

Your can give the output values in any way you find most convenient.

Libraries

You can use any library you like.

Question 2

Stax, (削除) 19 (削除ここまで) 12 bytes

èJ§3!!}vAà○しろまるL>

Run and debug it

Unpacked, ungolfed, and commented, it looks like this.

_'"/ split *all* of standard input by double quote characters
2:: keep only the even numbered elements
|j split on newlines (implicitly concatenates array of "strings")
m for each line, execute the rest of the program and output
 ',#^ count the number of commas occurring as substrings, and increment

Run this one

Question 3

How does it work?

Question 4

@Anush: I've added some more information.

Question 5

R, 40 bytes

(x=count.fields(stdin(),","))[!is.na(x)]

Try it online!

Per the documentation of count.fields, fields with line breaks get a field count of NA for the initial line, so we filter them out.

Question 6

JavaScript (ES2018), (削除) 42 (削除ここまで) 59 bytes

s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)

f=
s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)
console.log(f(
`,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b 
bb","ccc","fish",""`))

Question 7

Technically this is ES2018 due to the s flag on the regex. Not that it matters that much ;-) And nice use of it, btw!

Question 8

This function only appears to work on one record at a time. I think the problem description requires handling an entire file of multiple records.

Question 9

@ETHproductions, good point, will update.

Question 10

@recursive, you're right, I misunderstood the inputs. Now updated, at the loss of many many bytes.

Question 11

Jelly, 12 bytes

ṣ""m2FỴ=",§‘

A port of recursive's Stax answer - go give credit!

Try it online!

How?

ṣ""m2FỴ=",§‘ - Link: list of characters, V
 "" - a double quote character = '"'
ṣ - split (V) at ('"')
 m2 - modulo slice with two (1st, 3rd, 5th, ... elements of that)
 F - flatten list of lists to a list
 Ỵ - split at newlines
 ", - comma character = ','
 = - equal? (vectorises)
 § - sum each
 ‘ - increment (vectorises)
 - (as a full program implicit print)

Maybe you prefer ṣ""m2ẎỴċ€",‘ - Ẏ is tighten and ċ€ counts the commas in each.

Question 12

Python, 63 bytes

import csv
def f(s):return map(len,csv.reader(s.split("\n"))

Returns the output in an iterable map object.

Question 13

Using a lambda function you can get this down to 54 bytes

Question 14

Wolfram Language (Mathematica), 30 bytes

Length/@ImportString[#,"CSV"]&

Try it online!

Question 15

Perl 5.10.0, (削除) 55 (削除ここまで) 53 bytes

$_=shift;s/"(""|[^"])*"//g;s/^.*$/1+$&=~y:,::/gem;say

Try it online!

Explanation:

$_=shift; # first command-line arg
s/"(""|[^"])*"//g; # remove quoted fields
s/^.*$/ # replace each line 
 1+$&=~y:,:: # by the number of commas plus 1
/gem;
say # print

Question 16

Java 10, 101 bytes

s->{for(var p:s.replaceAll("\"[^\"]*\"","x").split("\n"))System.out.println(p.split(",",-1).length);}

Try it online.

Explanation:

s->{ // Method with String parameter and no return-type
 for(var p:s.replaceAll("\"[^\"]*\"","x") 
 // Replace all words within quotes with an "x"
 .split("\n")) // Then split by new-line and loop over them:
 System.out.println(p.split(",",-1) // Split the item by comma's
 .length);} // And print the length of this array

Question 17

Jelly, 17 bytes

=""ÄḂżṣ7Ż¤ṣ0,ドル",Ẉ

Try it online!

-1 thanks to Jonathan Allan. duh duh duh...

Question 18

Ruby `-rcsv`, 20 bytes

Reads from STDIN.

p CSV($<).map &:size

Attempt This Online!

Question 19

Uiua, 14 bytes

≡/+¬∵≍□しろいしかくW⬚W°csv

Try it: Uiua pad

Question 20

POSIX shell (ShellShoccar-jpn/Parsrs + usp-engineers-community/Open-usp-tukubai), 19 bytes

Dependencies are

parsrc.sh|count 1 1

Output format

For each line, the following line is output:

<row number> <number of columns>

How it works

parsrc.sh is a parser for CSV who converts such text into list of cells and their value. Open-usp-tukubai's count takes a range of field number from parameters and space-separated values to count in such range to summarize. Here are some comments:

# raw CSV
parsrc.sh |
# 1: row number 2: column number 3: value where LFs are represented as "\n" while real backslashes as "\\"
count 1 1
# 1: row number 2: number of columns in the row

Example run on Termux

Because Termux is not a completely POSIX-compatible so command -p getconf PATH fails, I am dropping the boilerplate to set PATH environment variable so the shell calls completely POSIX-compatible utilities first. Also because I am lazy to prepare Python 2, I am doing with COMMANDS.SH/count.

Example run on Termux but had some modification

recursive recursive 10.5k21 silver badges36 bronze badges · Accepted Answer · 2018-06-10 01:15:01Z

Stax, (削除) 19 (削除ここまで) 12 bytes

èJ§3!!}vAà○しろまるL>

Run and debug it

Unpacked, ungolfed, and commented, it looks like this.

_'"/ split *all* of standard input by double quote characters
2:: keep only the even numbered elements
|j split on newlines (implicitly concatenates array of "strings")
m for each line, execute the rest of the program and output
 ',#^ count the number of commas occurring as substrings, and increment

Run this one

1

\$\begingroup\$ How does it work? \$\endgroup\$

user9207
– user9207

2018年06月10日 10:03:53 +00:00
Commented Jun 10, 2018 at 10:03
1

\$\begingroup\$ @Anush: I've added some more information. \$\endgroup\$

recursive
– recursive

2018年06月10日 15:27:24 +00:00
Commented Jun 10, 2018 at 15:27

Stack Exchange Network

Count the number of fields in each csv record

Example

Libraries

12 Answers 12

Stax, (削除) 19 (削除ここまで) 12 bytes

R, 40 bytes

JavaScript (ES2018), (削除) 42 (削除ここまで) 59 bytes

Jelly, 12 bytes

How?

Python, 63 bytes

Wolfram Language (Mathematica), 30 bytes

Perl 5.10.0, (削除) 55 (削除ここまで) 53 bytes

Java 10, 101 bytes

Jelly, 17 bytes

Ruby `-rcsv`, 20 bytes

Uiua, 14 bytes

POSIX shell (ShellShoccar-jpn/Parsrs + usp-engineers-community/Open-usp-tukubai), 19 bytes

Output format

How it works

Example run on Termux

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Count the number of fields in each csv record

Example

Libraries

12 Answers 12

Stax, (削除) 19 (削除ここまで) 12 bytes

R, 40 bytes

JavaScript (ES2018), (削除) 42 (削除ここまで) 59 bytes

Jelly, 12 bytes

How?

Python, 63 bytes

Wolfram Language (Mathematica), 30 bytes

Perl 5.10.0, (削除) 55 (削除ここまで) 53 bytes

Java 10, 101 bytes

Jelly, 17 bytes

Ruby -rcsv, 20 bytes

Uiua, 14 bytes

POSIX shell (ShellShoccar-jpn/Parsrs + usp-engineers-community/Open-usp-tukubai), 19 bytes

Output format

How it works

Example run on Termux

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Ruby `-rcsv`, 20 bytes