13
\$\begingroup\$

Imagine a text file where each csv record may have different numbers of fields. The task is to write code to output how many fields there are in each record of the file. You can assume there is no header line in the file and can read in from a file or standard input, as you choose.

You can assume a version of rfc4180 for the csv rules which I will explain below for the definition of each line of the file. Here is a lightly edited version of the relevant part of the spec:

Definition of the CSV Format

  1. Each record is located on a separate line, delimited by a line break (CRLF). For example:

    aaa,bbb,ccc CRLF
    

zzz,yyy,xxx CRLF

  1. The last record in the file may or may not have an ending line break. For example:

    aaa,bbb,ccc CRLF
    

zzz,yyy,xxx

(Rule 3. does not apply in this challenge)

  1. Within each record, there may be one or more fields, separated by commas. Spaces are considered part of a field and should not be ignored.

  2. Each field may or may not be enclosed in double quotes. If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:

    "aaa","bbb","ccc" CRLF
    

zzz,yyy,xxx

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF
    

bb","ccc" CRLF zzz,yyy,xxx

  1. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

Example

Input:

,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b 
bb","ccc","fish",""

Should give the output:

2, 3, 3, 5

Your can give the output values in any way you find most convenient.

Libraries

You can use any library you like.

The Fifth Marshal
6,2631 gold badge26 silver badges46 bronze badges
asked Jun 9, 2018 at 21:51
\$\endgroup\$
0

12 Answers 12

5
\$\begingroup\$

Stax, (削除) 19 (削除ここまで) 12 bytes

èJ§3!!}vAàしろまるL>

Run and debug it

Unpacked, ungolfed, and commented, it looks like this.

_'"/ split *all* of standard input by double quote characters
2:: keep only the even numbered elements
|j split on newlines (implicitly concatenates array of "strings")
m for each line, execute the rest of the program and output
 ',#^ count the number of commas occurring as substrings, and increment

Run this one

answered Jun 10, 2018 at 1:15
\$\endgroup\$
2
  • 1
    \$\begingroup\$ How does it work? \$\endgroup\$ Commented Jun 10, 2018 at 10:03
  • 1
    \$\begingroup\$ @Anush: I've added some more information. \$\endgroup\$ Commented Jun 10, 2018 at 15:27
4
\$\begingroup\$

R, 40 bytes

(x=count.fields(stdin(),","))[!is.na(x)]

Try it online!

Per the documentation of count.fields, fields with line breaks get a field count of NA for the initial line, so we filter them out.

answered Jun 10, 2018 at 13:34
\$\endgroup\$
3
\$\begingroup\$

JavaScript (ES2018), (削除) 42 (削除ここまで) 59 bytes

s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)

f=
s=>s.replace(/".+?"/sg).split`\n`.map(c=>c.split`,`.length)
console.log(f(
`,"Hello, World!"
"aaa","b""bb","ccc"
zzz,yyy,
"aaa","b 
bb","ccc","fish",""`))

answered Jun 10, 2018 at 1:40
\$\endgroup\$
4
  • \$\begingroup\$ Technically this is ES2018 due to the s flag on the regex. Not that it matters that much ;-) And nice use of it, btw! \$\endgroup\$ Commented Jun 10, 2018 at 2:07
  • 2
    \$\begingroup\$ This function only appears to work on one record at a time. I think the problem description requires handling an entire file of multiple records. \$\endgroup\$ Commented Jun 10, 2018 at 3:46
  • \$\begingroup\$ @ETHproductions, good point, will update. \$\endgroup\$ Commented Jun 10, 2018 at 11:07
  • \$\begingroup\$ @recursive, you're right, I misunderstood the inputs. Now updated, at the loss of many many bytes. \$\endgroup\$ Commented Jun 10, 2018 at 12:17
3
\$\begingroup\$

Jelly, 12 bytes

ṣ""m2FỴ=",§‘

A port of recursive's Stax answer - go give credit!

Try it online!

How?

ṣ""m2FỴ=",§‘ - Link: list of characters, V
 "" - a double quote character = '"'
ṣ - split (V) at ('"')
 m2 - modulo slice with two (1st, 3rd, 5th, ... elements of that)
 F - flatten list of lists to a list
 Ỵ - split at newlines
 ", - comma character = ','
 = - equal? (vectorises)
 § - sum each
 ‘ - increment (vectorises)
 - (as a full program implicit print)

Maybe you prefer ṣ""m2ẎỴċ€",‘ - is tighten and ċ€ counts the commas in each.

answered Jun 10, 2018 at 16:28
\$\endgroup\$
2
\$\begingroup\$

Python, 63 bytes

import csv
def f(s):return map(len,csv.reader(s.split("\n"))

Returns the output in an iterable map object.

answered Jun 9, 2018 at 22:30
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Using a lambda function you can get this down to 54 bytes \$\endgroup\$ Commented Jun 9, 2018 at 22:38
2
\$\begingroup\$

Wolfram Language (Mathematica), 30 bytes

Length/@ImportString[#,"CSV"]&

Try it online!

answered Jun 10, 2018 at 10:26
\$\endgroup\$
2
\$\begingroup\$

Perl 5.10.0, (削除) 55 (削除ここまで) 53 bytes

$_=shift;s/"(""|[^"])*"//g;s/^.*$/1+$&=~y:,::/gem;say

Try it online!

Explanation:

$_=shift; # first command-line arg
s/"(""|[^"])*"//g; # remove quoted fields
s/^.*$/ # replace each line 
 1+$&=~y:,:: # by the number of commas plus 1
/gem;
say # print
answered Jun 9, 2018 at 22:17
\$\endgroup\$
2
\$\begingroup\$

Java 10, 101 bytes

s->{for(var p:s.replaceAll("\"[^\"]*\"","x").split("\n"))System.out.println(p.split(",",-1).length);}

Try it online.

Explanation:

s->{ // Method with String parameter and no return-type
 for(var p:s.replaceAll("\"[^\"]*\"","x") 
 // Replace all words within quotes with an "x"
 .split("\n")) // Then split by new-line and loop over them:
 System.out.println(p.split(",",-1) // Split the item by comma's
 .length);} // And print the length of this array
answered Jun 11, 2018 at 7:13
\$\endgroup\$
1
\$\begingroup\$

Jelly, 17 bytes

=""ÄḂżṣ7ݤṣ0,ドル",Ẉ

Try it online!

-1 thanks to Jonathan Allan. duh duh duh...

answered Jun 9, 2018 at 22:12
\$\endgroup\$
0
1
\$\begingroup\$

Ruby -rcsv, 20 bytes

Reads from STDIN.

p CSV($<).map &:size

Attempt This Online!

answered Oct 3, 2024 at 18:58
\$\endgroup\$
1
\$\begingroup\$

Uiua, 14 bytes

≡/+¬∵≍しろいしかくW⬚W°csv

Try it: Uiua pad

answered Oct 8, 2024 at 3:30
\$\endgroup\$
0
\$\begingroup\$

POSIX shell (ShellShoccar-jpn/Parsrs + usp-engineers-community/Open-usp-tukubai), 19 bytes

Dependencies are

parsrc.sh|count 1 1

Output format

For each line, the following line is output:

<row number> <number of columns>

How it works

parsrc.sh is a parser for CSV who converts such text into list of cells and their value. Open-usp-tukubai's count takes a range of field number from parameters and space-separated values to count in such range to summarize. Here are some comments:

# raw CSV
parsrc.sh |
# 1: row number 2: column number 3: value where LFs are represented as "\n" while real backslashes as "\\"
count 1 1
# 1: row number 2: number of columns in the row

Example run on Termux

Because Termux is not a completely POSIX-compatible so command -p getconf PATH fails, I am dropping the boilerplate to set PATH environment variable so the shell calls completely POSIX-compatible utilities first. Also because I am lazy to prepare Python 2, I am doing with COMMANDS.SH/count.

Example run on Termux but had some modification

answered Nov 26, 2022 at 9:39
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.