I have a bunch of .csv files which I have to read and look for data. The .csv file is of the format:
A row of data I will ignore State,County,City WA,king,seattle WA,pierce,tacoma
In every csv file, the order of columns is not consistent. For example in csv1 the order can be State,County,City, in csv2 it can be City,County,State. What I am interested is the State and County. Given a county I want to find out what State it is in. I am ignoring the fact that same counties can exist in multiple States. The way I am approaching this:
with open(‘file.csv’) as f:
data = f.read()
# convert the data to iterable, skip the first line
reader = csv.DictReader(data.splitlines(1)[1:])
lines = list(reader)
counties = {k: v for (k,v in ((line[‘county’], line[‘State’]) for line in lines)}
Is there a better approach to this?
1 Answer 1
You're on the right track, using a with
block to open the file and csv.DictReader()
to parse it.
Your list handling is a bit clumsy, though. To skip a line, use next(f)
. Avoid making a list of the entire file's data, if you can process the file line by line. The dict comprehension has an unnecessary complication as well.
with open('file.csv') as f:
_ = next(f)
reader = csv.DictReader(f)
counties = { line['County']: line['State'] for line in reader }
Your sample file had County
as the header, whereas your code looked for line[‘county’]
. I assume that the curly quotes are an artifact of copy-pasting, but you should pay attention to the capitalization.
-
\$\begingroup\$ I am really getting the data from an S3 bucket, but I didn't want to make the code more complicated in my example. So, I get the key from the bucket and then I say data = key.get_contents_as_string() So I am not really reading from a file. Instead, the contents of the key are the string representation of the csv file. I like the way you eliminated the list and cleaned up the dict comprehension, is there a way that I can avoid doing the data.splitlines(1)[1:]) when I create the reader since I already have the data in a string? (and i need to ignore the first row) \$\endgroup\$Mark– Mark2014年11月26日 03:39:33 +00:00Commented Nov 26, 2014 at 3:39