I tried to read a GeoJSON file with Pandas, but I got a ValueError message:
'ValueError: Expected object or value'
Here's the approach I used:
import pandas as pd
geojsonPath = r"Z:\dems\address.geojson"
pd_json = pd.io.json.read_json(geojsonPath,lines=True)
pd_json.head()
Attached is an extract from the file
{
"type": "FeatureCollection",
"name": "cameron-addresses-county",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features": [
{ "type": "Feature", "properties": { "X": -78.1422444, "Y": 41.3286117, "hash": "93dd7b7e3ee3e8af", "number": "501", "street": "CASTLE GARDEN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 7579 }, "geometry": { "type": "Point", "coordinates": [ -78.1422444, 41.3286117 ] } },
{ "type": "Feature", "properties": { "X": -78.143584, "Y": 41.3284045, "hash": "853eb0c5f6e70fe3", "number": "64", "street": "BELDIN DR", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 4502 }, "geometry": { "type": "Point", "coordinates": [ -78.143584, 41.3284045 ] } },
{ "type": "Feature", "properties": { "X": -78.1711061, "Y": 41.3282128, "hash": "99a13ba635404d80", "number": "9760", "street": "MIX RUN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 8448 }, "geometry": { "type": "Point", "coordinates": [ -78.1711061, 41.3282128 ] } },
{ "type": "Feature", "properties": { "X": -78.1429278, "Y": 41.3282883, "hash": "70319cf9e435b858", "number": null, "street": null, "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": null }, "geometry": { "type": "Point", "coordinates": [ -78.1429278, 41.3282883 ] } },
{ "type": "Feature", "properties": { "X": -78.1427173, "Y": 41.3282733, "hash": "759f051e7a587eb2", "number": "465", "street": "CASTLE GARDEN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 6447 }, "geometry": { "type": "Point", "coordinates": [ -78.1427173, 41.3282733 ] } },
{ "type": "Feature", "properties": { "X": -78.1433463, "Y": 41.3282308, "hash": "9fbb571fc16a6cb2", "number": "61", "street": "BELDIN DR", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 4466 }, "geometry": { "type": "Point", "coordinates": [ -78.1433463, 41.3282308 ] } },
{ "type": "Feature", "properties": { "X": -78.1432403, "Y": 41.3282179, "hash": "8f837d813626f1e1", "number": null, "street": null, "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": null }, "geometry": { "type": "Point", "coordinates": [ -78.1432403, 41.3282179 ] } },
{ "type": "Feature", "properties": { "X": -78.1715165, "Y": 41.3280965, "hash": "5004ba87bd6e668b", "number": "9736", "street": "MIX RUN RD", "unit": null, "city": null, "district": null, "region": null, "postcode": null, "id": 7434 }, "geometry": { "type": "Point", "coordinates": [ -78.1715165, 41.3280965 ] } }
-
2You have a special library to do that, its name is geopandas. Do you work with anaconda?Helios– Helios2022年06月14日 01:07:41 +00:00Commented Jun 14, 2022 at 1:07
-
1Please post the full error and the full stack trace.Son of a Beach– Son of a Beach2022年06月14日 02:40:40 +00:00Commented Jun 14, 2022 at 2:40
-
1Please, do not forget about "What should I do when someone answers my question?"Taras– Taras ♦2022年06月21日 05:07:55 +00:00Commented Jun 21, 2022 at 5:07
1 Answer 1
There are several things to keep in mind:
- Do not forget to close the GeoJSON with
]}
- There is no need to call the
read_json()
viapd.io.json.read_json
, simplypd.read_json
. Even if it is placed in thepandas/pandas/io/json/
"ValueError: Expected object or value"
error comes because in terms of JSON yourgeojsonPath
variable is the right type but with wrong values.
So, to get everything working you can either:
As was commented by @SalimRodríguez, try to read your GeoJSON with GeoPandas
Output data format: GeoDataFrame
import geopandas as gpd absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson' addresses = gpd.read_file(absolute_path_to_file) print(addresses) X Y ... id geometry 0 -78.142244 41.328612 ... 7579.0 POINT (-78.14224 41.32861) 1 -78.143584 41.328404 ... 4502.0 POINT (-78.14358 41.32840) 2 -78.171106 41.328213 ... 8448.0 POINT (-78.17111 41.32821) 3 -78.142928 41.328288 ... NaN POINT (-78.14293 41.32829) 4 -78.142717 41.328273 ... 6447.0 POINT (-78.14272 41.32827) 5 -78.143346 41.328231 ... 4466.0 POINT (-78.14335 41.32823) 6 -78.143240 41.328218 ... NaN POINT (-78.14324 41.32822) 7 -78.171516 41.328097 ... 7434.0 POINT (-78.17152 41.32810)
If geometry is not important, you can can skip it simply by parsing your GeoJSON as a normal JSON
Output data format: DataFrame
import json import pandas as pd absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson' with open(absolute_path_to_file) as f: data = json.load(f) raw_data = [feature['properties'] for feature in data['features']] addresses = pd.DataFrame(raw_data) print(addresses) X Y hash ... region postcode id 0 -78.142244 41.328612 93dd7b7e3ee3e8af ... None None 7579.0 1 -78.143584 41.328404 853eb0c5f6e70fe3 ... None None 4502.0 2 -78.171106 41.328213 99a13ba635404d80 ... None None 8448.0 3 -78.142928 41.328288 70319cf9e435b858 ... None None NaN 4 -78.142717 41.328273 759f051e7a587eb2 ... None None 6447.0 5 -78.143346 41.328231 9fbb571fc16a6cb2 ... None None 4466.0 6 -78.143240 41.328218 8f837d813626f1e1 ... None None NaN 7 -78.171516 41.328097 5004ba87bd6e668b ... None None 7434.0
If geometry still matters, then parse your GeoJSON as a normal JSON in a little bit different manner
Output data format: DataFrame
import json import pandas as pd from shapely.geometry import Point absolute_path_to_file = 'C:/Documents/Python Scripts/address.geojson' with open(absolute_path_to_file) as f: data = json.load(f) raw_data = [feature['properties'] | {'geometry': Point(feature['geometry']['coordinates'])} for feature in data['features']] addresses = pd.DataFrame(raw_data) print(addresses) X Y ... id geometry 0 -78.142244 41.328612 ... 7579.0 POINT (-78.1422444 41.3286117) 1 -78.143584 41.328404 ... 4502.0 POINT (-78.143584 41.3284045) 2 -78.171106 41.328213 ... 8448.0 POINT (-78.1711061 41.3282128) 3 -78.142928 41.328288 ... NaN POINT (-78.1429278 41.3282883) 4 -78.142717 41.328273 ... 6447.0 POINT (-78.1427173 41.3282733) 5 -78.143346 41.328231 ... 4466.0 POINT (-78.1433463 41.3282308) 6 -78.143240 41.328218 ... NaN POINT (-78.1432403 41.3282179) 7 -78.171516 41.328097 ... 7434.0 POINT (-78.1715165 41.3280965)
If it is still important to obtain a GeoDataFrame as a final output data format, one can achieve it either with
for option (2):
gdf = gpd.GeoDataFrame(addresses, geometry=gpd.points_from_xy(addresses["X"], addresses["Y"]))
or for option (3):
gdf = gpd.GeoDataFrame(addresses, geometry=addresses["geometry"])
References:
-
1This is a high-quality answer. I learned new stuff, thank youaldo_tapia– aldo_tapia2022年06月14日 12:36:23 +00:00Commented Jun 14, 2022 at 12:36
-
How do I split the 'geometry' column into long and lat?Edudzi– Edudzi2022年09月10日 23:23:24 +00:00Commented Sep 10, 2022 at 23:23
Explore related questions
See similar questions with these tags.