I have a csv with "lat,lon" in a single column. I am trying to convert it with ogr2ogr to another format, and cannot figure out how to format the VRT file to accept this format.
I cannot do anything other than change the VRT at the moment due to other limitations. I would love to rewrite the csv, but this is the format that I have to deal with for now.
example for formating:
name, date, value, latlon a, 20130101, 0, "100.8,40円.2" b, 20130101, 1, "101.2,40円.4" c, 20130102, 1, "100.2,41円.2" d, 20130103, 2, "102.0,40円.7" e, 20130104, 3, "100.6,41円.0"
The data is not a set number of characters, so splitting at a character number does not work right.
a, 20130101, 0, "10.844,円-40.2745" b, 20130101, 1, "101.2010,40円.411111" c, 20130102, 1, "-100.27867,1円.24522" d, 20130103, 2, "-1.04577,16円.745637" e, 20130104, 3, "100.617417,円-1.0757"
Accepted answer actually does not require the intermediate CSV, it can be done straight from original CSV. The INSTR is what I managed to get to work.
<OGRVRTDataSource>
<OGRVRTLayer name="result">
<SrcDataSource>test2.csv</SrcDataSource>
<SrcSQL dialect="sqlite">SELECT name, date, value, SUBSTR(latlon,2,INSTR(latlon,",円")-1) AS lat, SUBSTR(latlon,INSTR(latlon,",円")+2,-1) AS lon FROM test2</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>
1 Answer 1
You can solve with two chained VRT files and a bit of OGR SQL.
The first VRT
(e.g. remapped_csv.vrt
) is:
<OGRVRTDataSource>
<OGRVRTLayer name="remapped_csv">
<SrcDataSource>test.csv</SrcDataSource>
<SrcSQL>SELECT *, SUBSTR(latlon,2,5) AS lat, SUBSTR(latlon,9,12) AS lon FROM test</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>
The second VRT
(e.g. result.vrt
) is:
<OGRVRTDataSource>
<OGRVRTLayer name="result">
<SrcDataSource>remapped_csv.vrt</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>WGS84</LayerSRS>
<GeometryField encoding="PointFromColumns" x="lon" y="lat"/>
<SrcSQL>SELECT name, date, value, lat, lon FROM remapped_csv</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>
Here's the final result:
>ogrinfo result.vrt result
INFO: Open of `result.vrt'
using driver `VRT' successful.
Layer name: result
Geometry: Point
Feature Count: 5
Extent: (40.200000, 100.200000) - (41.200000, 102.000000)
Layer SRS WKT:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9108"]],
AUTHORITY["EPSG","4326"]]
name: String (0.0)
date: String (0.0)
value: String (0.0)
lat: String (0.0)
lon: String (0.0)
OGRFeature(result):1
name (String) = a
date (String) = 20130101
value (String) = 0
lat (String) = 100.8
lon (String) = 40.2
POINT (40.2 100.8)
OGRFeature(result):2
name (String) = b
date (String) = 20130101
value (String) = 1
lat (String) = 101.2
lon (String) = 40.4
POINT (40.4 101.2)
OGRFeature(result):3
name (String) = c
date (String) = 20130102
value (String) = 1
lat (String) = 100.2
lon (String) = 41.2
POINT (41.2 100.2)
OGRFeature(result):4
name (String) = d
date (String) = 20130103
value (String) = 2
lat (String) = 102.0
lon (String) = 40.7
POINT (40.7 102.0)
OGRFeature(result):5
name (String) = e
date (String) = 20130104
value (String) = 3
lat (String) = 100.6
lon (String) = 41.0
POINT (41.0 100.6)
Improved version of remapped_csv.vrt
using the dialect SQLite (SQLite >= 3.7.15, because the instr
function was added to the SQLite core functions since this version):
<OGRVRTDataSource>
<OGRVRTLayer name="remapped_csv">
<SrcDataSource>test2.csv</SrcDataSource>
<SrcSQL dialect="sqlite">SELECT *, SUBSTR(latlon,2,INSTR(latlon,",円")-1) AS lat, SUBSTR(latlon,INSTR(latlon,",円")+2,-1) AS lon FROM test2</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>
Not tested (because I have GDAL built with SQLite 3.7.9 at the moment), but it should work.
This is a working version of remapped_csv.vrt
using the trimming approach:
<OGRVRTDataSource>
<OGRVRTLayer name="remapped_csv">
<SrcDataSource>test2.csv</SrcDataSource>
<SrcSQL dialect="sqlite">SELECT *, rtrim(rtrim(latlon,'"+-.0123456789'),',円') AS lat, ltrim(ltrim(latlon,' "+-.0123456789'),',円') AS lon FROM test2</SrcSQL>
</OGRVRTLayer>
</OGRVRTDataSource>
-
Looks great, but what if I do not know the exact length of the lat/lon? So one goes to 3 decimal places and another goes to 5?eseglem– eseglem2013年12月05日 20:51:50 +00:00Commented Dec 5, 2013 at 20:51
-
Also there could be a negative or anywhere up to three digits before the decimal. "-1.12341234,80円.1234" vs "134.1186,円-6.89467"eseglem– eseglem2013年12月05日 21:12:50 +00:00Commented Dec 5, 2013 at 21:12
-
1After your updates, it seems that the only way to solve your problem is to write a Python script with a robust regex management of latlon string. Consider also this question: gis.stackexchange.com/questions/77897/… Good luck!Antonio Falciano– Antonio Falciano2013年12月05日 21:32:25 +00:00Commented Dec 5, 2013 at 21:32
-
1And what happens if the string doesn't contain valid coordinates? ;) Although, I've just updated the answer considering your updates. It should work in theory, but I don't have the right configuration at the moment in order to test it.Antonio Falciano– Antonio Falciano2013年12月06日 09:54:42 +00:00Commented Dec 6, 2013 at 9:54
-
1I found another solution without using regex in the last update. Hope this helps.Antonio Falciano– Antonio Falciano2013年12月06日 11:01:18 +00:00Commented Dec 6, 2013 at 11:01