Parse a csv file and create a dictionary of partial results

Question 1

I have a bunch of .csv files which I have to read and look for data. The .csv file is of the format:

A row of data I will ignore
State,County,City
WA,king,seattle
WA,pierce,tacoma

In every csv file, the order of columns is not consistent. For example in csv1 the order can be State,County,City, in csv2 it can be City,County,State. What I am interested is the State and County. Given a county I want to find out what State it is in. I am ignoring the fact that same counties can exist in multiple States. The way I am approaching this:

with open(‘file.csv’) as f:
 data = f.read()
# convert the data to iterable, skip the first line
reader = csv.DictReader(data.splitlines(1)[1:])
lines = list(reader)
counties = {k: v for (k,v in ((line[‘county’], line[‘State’]) for line in lines)}

Is there a better approach to this?

Question 2

You're on the right track, using a with block to open the file and csv.DictReader() to parse it.

Your list handling is a bit clumsy, though. To skip a line, use next(f). Avoid making a list of the entire file's data, if you can process the file line by line. The dict comprehension has an unnecessary complication as well.

with open('file.csv') as f:
 _ = next(f)
 reader = csv.DictReader(f)
 counties = { line['County']: line['State'] for line in reader }

Your sample file had County as the header, whereas your code looked for line[‘county’]. I assume that the curly quotes are an artifact of copy-pasting, but you should pay attention to the capitalization.

Question 3

I am really getting the data from an S3 bucket, but I didn't want to make the code more complicated in my example. So, I get the key from the bucket and then I say data = key.get_contents_as_string() So I am not really reading from a file. Instead, the contents of the key are the string representation of the csv file. I like the way you eliminated the list and cleaned up the dict comprehension, is there a way that I can avoid doing the data.splitlines(1)[1:]) when I create the reader since I already have the data in a string? (and i need to ignore the first row)

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2014-11-26 02:55:55Z

You're on the right track, using a with block to open the file and csv.DictReader() to parse it.

Your list handling is a bit clumsy, though. To skip a line, use next(f). Avoid making a list of the entire file's data, if you can process the file line by line. The dict comprehension has an unnecessary complication as well.

with open('file.csv') as f:
 _ = next(f)
 reader = csv.DictReader(f)
 counties = { line['County']: line['State'] for line in reader }

Your sample file had County as the header, whereas your code looked for line[‘county’]. I assume that the curly quotes are an artifact of copy-pasting, but you should pay attention to the capitalization.

I am really getting the data from an S3 bucket, but I didn't want to make the code more complicated in my example. So, I get the key from the bucket and then I say data = key.get_contents_as_string() So I am not really reading from a file. Instead, the contents of the key are the string representation of the csv file. I like the way you eliminated the list and cleaned up the dict comprehension, is there a way that I can avoid doing the data.splitlines(1)[1:]) when I create the reader since I already have the data in a string? (and i need to ignore the first row)

Stack Exchange Network

Parse a csv file and create a dictionary of partial results

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parse a csv file and create a dictionary of partial results

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions