I have loaded a .csv data into a pandas DataFrame and want to create a column named 'geometry' which will be made up of the shapely points from the lat and lon as given in the data. The data is below
the .csv data loaded; made up of lat, lon, timestamp and userid.
I therefore would want to create a Shapely point on each row, based on columns 'lon' and 'lat' and zip lon and lat columns and create the points using a for-loop (loop over the zipped object), OR use the apply method to apply the shapely Point constructor on each row.
This is what I tried:
import pandas as pd
from shapely.geometry import Point
fp = 'C:/Users/pku/Desktop/data/lat_lon.csv'
data = pd.read_csv(fp)
data.head()
dataframe = pd.DataFrame()
datafram['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)]
print(data['geometry'].head())
I had a bunch of errors...the last one being.
1 Answer 1
When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame()
?
data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
lat lon
0 41.389474 2.156421
1 41.383093 2.181116
2 41.373258 2.159358
3 41.385252 2.168779
4 41.390692 2.148911
Now
You can use List Comprehension
data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)]
data.head()
lat lon geometry
0 41.389474 2.156421 POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692)
You can use the apply command
data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
lat lon geometry geometry2
0 41.389474 2.156421 POINT (2.156421 41.389474) POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093) POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258) POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252) POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692) POINT (2.148911 41.390692)
You can also use GeoPandas (From CSV to GeoDataFrame in two lines)
import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat)
gdf.head()
lat lon geometry
0 41.389474 2.156421 POINT (2.15642 41.38947)
1 41.383093 2.181116 POINT (2.18112 41.38309)
2 41.373258 2.159358 POINT (2.15936 41.37326)
3 41.385252 2.168779 POINT (2.16878 41.38525)
4 41.390692 2.148911 POINT (2.14891 41.39069)
-
Perfect! It worked fine. Happy to be on this platform. Thanks, families.PKU– PKU2021年07月17日 10:29:06 +00:00Commented Jul 17, 2021 at 10:29
-
If this answer is acceptable to you, you must close the question by accepting the answer please.gene– gene2021年07月17日 17:27:04 +00:00Commented Jul 17, 2021 at 17:27
datafram['geometry']
notdata['geometry']