I'm wondering how I could build a .csv file with a proper structure. As an example, my data has the form:
(indice, latitude, longitude, value)
- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
I would like to be able to save this data in .csv file with the format:
| longitude |
latitude | value |
That is, all the values with the same latitude would be in the same line and all the values with the same longitude would be in the same column. I know how to write a .csv file in Python, I'm wondering how could I perform this transformation properly.
Thank you in advance.
Thank you.
-
You will first have to loop over the data to collect all longitudes. Those will be your columns. Then I would probably create a dictionary for each latitude which contains longitude/value pairs. Then you can write a line for each latitude.. you should take a look at the csv.DictWriter class.rje– rje2014年09月16日 15:33:35 +00:00Commented Sep 16, 2014 at 15:33
-
I'd break up the lines with a regex and then use nested dicts to record the values mydict[latitude][longitude] = value. I'd also make a set of longitudes. The size of this set is the number of columns, make it a list and sort it to get an indexer into the nested list. Sort the latitude keys and off you go.tdelaney– tdelaney2014年09月16日 15:36:52 +00:00Commented Sep 16, 2014 at 15:36
-
What happens if there are more values pre lat/lon pair? What if there are two latitudes or longitudes which are almost the same but not exactly?Krab– Krab2014年09月16日 16:17:18 +00:00Commented Sep 16, 2014 at 16:17
4 Answers 4
I wrote a little program for you :) see below.
I'm assuming for now that your data is stored as a list of dicts, but if it is a list of lists the code shouldn't be too hard to fix.
#!/usr/bin/env python
import csv
data = [
dict(lat=1, lon=1, val=10),
dict(lat=1, lon=2, val=20),
dict(lat=2, lon=1, val=30),
dict(lat=2, lon=2, val=40),
dict(lat=3, lon=1, val=50),
dict(lat=3, lon=2, val=60),
]
# get a unique list of all longitudes
headers = list({d['lon'] for d in data})
headers.sort()
# make a dict of latitudes
data_as_dict = {}
for item in data:
# default value: a list of empty strings
lst = data_as_dict.setdefault(item['lat'], ['']*len(headers))
# get the longitute for this item
lon = item['lon']
# where in the line should it be?
idx = headers.index(lon)
# save value in the list
lst[idx]=item['val']
# in the actual file, we start with an extra header for the latitude
headers.insert(0,'latitude')
with open('latitude.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(headers)
lats = data_as_dict.keys()
lats.sort()
for latitude in lats:
# a line starts with the latitude, followed by list of values
l = data_as_dict[latitude]
l.insert(0, latitude)
writer.writerow(l)
output:
latitude 1 2
1 10 20
2 30 40
3 50 60
Granted, it's not the prettiest code, but I hope you get the idea
4 Comments
I'm assuming you have this data in a text file. Let's use regular expressions to parse the data (though string splitting looks like it could work if your format stays the same).
import re
data = list()
with open('path/to/data/file','r') as infile:
for line in infile:
matches = re.match(r".*(?<=lat=)(?P<lat>(?:\+|-)?[\d.]+).*(?<=value=)(?P<longvalue>(?:\+|-)?[\d.]+)", line)
data.append((matches.group('lat'), matches.group('longvalue'))
To unroll that nasty regex:
pat = re.compile(r"""
.* Match anything any number of times
(?<=lat=) assert that the last 4 characters are "lat="
(?P<lat> begin named capturing group "lat"
(?:\+|-)? allow one or none of either + or -
[\d.]+ and one or more digits or decimal points
) end named capturing group "lat"
.* Another wildcard
(?<=value=) assert that the last 6 characters are "value="
(?P<longvalue> begin named capturing group "longvalue"
(?:\+|-)? allow one or none of either + or -
[\d.]+ and one or more digits or decimal points
) end named capturing group "longvalue"
""", re.X)
# and a terser way of writing the code, since we've compiled the pattern above:
with open('path/to/data/file', 'r') as infile:
data = [(matches.group('lat'), matches.group('longvalue')) for line in infile for
matches in (re.match(pat, line),)]
Comments
Given your input data, I came up with the following:
from __future__ import print_function
def decode(line):
line = line.replace('- ', ' ')
fields = line.split()
index = fields[0]
data = dict([_.split('=') for _ in fields[1:]])
return index, data
def transform(filename):
transformed = {}
columns = set()
for line in open(filename):
index, data = decode(line.strip())
element = transformed.setdefault(data['lat'], {})
element[data['lon']] = data['value']
columns.add(data['lon'])
return columns, transformed
def main(filename):
columns, transformed = transform(filename)
columns = sorted(columns)
print(',', ','.join(columns))
for lat, data in transformed.items():
print(lat, ',', ', '.join([data.get(_, 'NULL') for _ in columns]))
if __name__ == '__main__':
main('so.txt')
Just in case, where the data contains more than only one latitude, I had added one additional line to the example, so my input data (so.txt) contained this:
- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702
- 5 - lat=-51.490000 lon=265.268000 value=147.152889
- 6 - lat=-51.490000 lon=265.459000 value=217.152889
- 7 - lat=-51.490000 lon=265.650000 value=301.160702
- 8 - lat=-51.490000 lon=265.841000 value=399.176327
- 9 - lat=-51.490000 lon=266.032000 value=511.199764
- 10 - lat=-51.490000 lon=266.223000 value=637.231014
- 11 - lat=-51.490000 lon=266.414000 value=777.270077
- 12 - lat=-51.490000 lon=266.605000 value=931.316952
- 13 - lat=-51.490000 lon=266.796000 value=1099.371639
- 14 - lat=-51.490000 lon=266.987000 value=1281.434139
- 15 - lat=-51.490000 lon=267.178000 value=1477.504452
- 16 - lat=-51.490000 lon=267.369000 value=1687.582577
- 17 - lat=-51.490000 lon=267.560000 value=1911.668514
- 18 - lat=-51.490000 lon=267.751000 value=2149.762264
- 19 - lat=-51.490000 lon=267.942000 value=2401.863827
- 20 - lat=-51.490000 lon=268.133000 value=2667.973202
- 21 - lat=-51.490000 lon=268.324000 value=2948.090389
- 22 - lat=-52.490000 lon=268.324000 value=2948.090389
(note the last line)
With that input file, the above program creates the following output:
, 264.313000,264.504000,264.695000,264.886000,265.077000,265.268000,265.459000,265.650000,265.841000,266.032000,266.223000,266.414000,266.605000,266.796000,266.987000,267.178000,267.369000,267.560000,267.751000,267.942000,268.133000,268.324000
-51.490000 , 7.270077, 7.231014, 21.199764, 49.176327, 91.160702, 147.152889, 217.152889, 301.160702, 399.176327, 511.199764, 637.231014, 777.270077, 931.316952, 1099.371639, 1281.434139, 1477.504452, 1687.582577, 1911.668514, 2149.762264, 2401.863827, 2667.973202, 2948.090389
-52.490000 , NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 2948.090389
Comments
YOu can pull lat/lon/value from each line using a regex. You'll want to lookup lat and lon later, so use a nested dict of the form d[lat][lon]=value to track it all. Add a set to keep track of the unique longitudes you see, and its pretty straight forward to generate the csv.
I sorted it in the example, but you may not care about that.
import re
import collections
data = """- 0 - lat=-51.490000 lon=264.313000 value=7.270077
- 1 - lat=-51.490000 lon=264.504000 value=7.231014
- 2 - lat=-51.490000 lon=264.695000 value=21.199764
- 3 - lat=-51.490000 lon=264.886000 value=49.176327
- 4 - lat=-51.490000 lon=265.077000 value=91.160702"""
regex = re.compile(r'- \d+ - lat=([\+\-]?[\d\.]+) lon=([\+\-]?[\d\.]+) value=([\+\-]?[\d\.]+)')
# lat/lon index will hold lats[latitude][longitude] = value
lats = collections.defaultdict(dict)
# longitude columns
lonset = set()
for line in data.split('\n'):
match = regex.match(line)
if match:
lat, lon, val = match.groups()
lats[lat][lon] = val
lonset.add(lon)
latkeys = sorted(lats.keys())
lonkeys = sorted(list(lonset))
header = ['latitude'] + lonkeys
print header
for lat in latkeys:
lons = lats[lat]
row = [lat] + [lons.get(lon, '') for lon in lonkeys]
print row