Create a column 'geometry' of points with longitude and latitude data given in a pandas DataFrame

Question 1

I have loaded a .csv data into a pandas DataFrame and want to create a column named 'geometry' which will be made up of the shapely points from the lat and lon as given in the data. The data is below

the .csv data loaded; made up of lat, lon, timestamp and userid.

I therefore would want to create a Shapely point on each row, based on columns 'lon' and 'lat' and zip lon and lat columns and create the points using a for-loop (loop over the zipped object), OR use the apply method to apply the shapely Point constructor on each row.

This is what I tried:

import pandas as pd
from shapely.geometry import Point
fp = 'C:/Users/pku/Desktop/data/lat_lon.csv'
data = pd.read_csv(fp)
data.head()
dataframe = pd.DataFrame()
datafram['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
print(data['geometry'].head())

I had a bunch of errors...the last one being.

the end of the error

Question 2

You set datafram['geometry'] not data['geometry']

Question 3

When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame() ?

data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
 lat lon
0 41.389474 2.156421
1 41.383093 2.181116
2 41.373258 2.159358
3 41.385252 2.168779
4 41.390692 2.148911

Now

You can use List Comprehension

data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
data.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692)

You can use the apply command

data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
 lat lon geometry geometry2
0 41.389474 2.156421 POINT (2.156421 41.389474) POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093) POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258) POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252) POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692) POINT (2.148911 41.390692)

You can also use GeoPandas (From CSV to GeoDataFrame in two lines)

import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat) 
gdf.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.15642 41.38947)
1 41.383093 2.181116 POINT (2.18112 41.38309)
2 41.373258 2.159358 POINT (2.15936 41.37326)
3 41.385252 2.168779 POINT (2.16878 41.38525)
4 41.390692 2.148911 POINT (2.14891 41.39069)

Question 4

Perfect! It worked fine. Happy to be on this platform. Thanks, families.

Question 5

If this answer is acceptable to you, you must close the question by accepting the answer please.

gene gene 55.8k3 gold badges115 silver badges196 bronze badges · Accepted Answer · 2021-07-17 08:07:46Z

When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame() ?

data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
 lat lon
0 41.389474 2.156421
1 41.383093 2.181116
2 41.373258 2.159358
3 41.385252 2.168779
4 41.390692 2.148911

Now

You can use List Comprehension

data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
data.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692)

You can use the apply command

data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
 lat lon geometry geometry2
0 41.389474 2.156421 POINT (2.156421 41.389474) POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093) POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258) POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252) POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692) POINT (2.148911 41.390692)

You can also use GeoPandas (From CSV to GeoDataFrame in two lines)

import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat) 
gdf.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.15642 41.38947)
1 41.383093 2.181116 POINT (2.18112 41.38309)
2 41.373258 2.159358 POINT (2.15936 41.37326)
3 41.385252 2.168779 POINT (2.16878 41.38525)
4 41.390692 2.148911 POINT (2.14891 41.39069)

Perfect! It worked fine. Happy to be on this platform. Thanks, families.
If this answer is acceptable to you, you must close the question by accepting the answer please.

Stack Exchange Network

Create a column 'geometry' of points with longitude and latitude data given in a pandas DataFrame

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Create a column 'geometry' of points with longitude and latitude data given in a pandas DataFrame

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions