5

I have loaded a .csv data into a pandas DataFrame and want to create a column named 'geometry' which will be made up of the shapely points from the lat and lon as given in the data. The data is below

the .csv data loaded; made up of lat, lon, timestamp and userid.

I therefore would want to create a Shapely point on each row, based on columns 'lon' and 'lat' and zip lon and lat columns and create the points using a for-loop (loop over the zipped object), OR use the apply method to apply the shapely Point constructor on each row.

This is what I tried:

import pandas as pd
from shapely.geometry import Point
fp = 'C:/Users/pku/Desktop/data/lat_lon.csv'
data = pd.read_csv(fp)
data.head()
dataframe = pd.DataFrame()
datafram['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
print(data['geometry'].head())

I had a bunch of errors...the last one being.

the end of the error

Fezter
22k11 gold badges72 silver badges128 bronze badges
asked Jul 17, 2021 at 3:10
1
  • You set datafram['geometry'] not data['geometry'] Commented Jul 17, 2021 at 6:13

1 Answer 1

8

When you read a csv file with Pandas the result is a Pandas DataFrame, therefore why create a new DataFrame: dataframe = pd.DataFrame() ?

data = pd.read_csv("lat_long.csv")
list(data.columns)
['lat', 'lon']
type(data)
<class 'pandas.core.frame.DataFrame'>
data.head()
 lat lon
0 41.389474 2.156421
1 41.383093 2.181116
2 41.373258 2.159358
3 41.385252 2.168779
4 41.390692 2.148911

Now

You can use List Comprehension

data['geometry'] = [Point(xy) for xy in zip(data.lon, data.lat)] 
data.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692)

You can use the apply command

data['geometry2'] = data.apply(lambda row:Point(row['lon'],row['lat']), axis=1)
data.head()
 lat lon geometry geometry2
0 41.389474 2.156421 POINT (2.156421 41.389474) POINT (2.156421 41.389474)
1 41.383093 2.181116 POINT (2.181116 41.383093) POINT (2.181116 41.383093)
2 41.373258 2.159358 POINT (2.159358 41.373258) POINT (2.159358 41.373258)
3 41.385252 2.168779 POINT (2.168779 41.385252) POINT (2.168779 41.385252)
4 41.390692 2.148911 POINT (2.148911 41.390692) POINT (2.148911 41.390692)

You can also use GeoPandas (From CSV to GeoDataFrame in two lines)

import geopandas as gpd
data = pd.read_csv("lat_long.csv")
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data.lon, data.lat) 
gdf.head()
 lat lon geometry
0 41.389474 2.156421 POINT (2.15642 41.38947)
1 41.383093 2.181116 POINT (2.18112 41.38309)
2 41.373258 2.159358 POINT (2.15936 41.37326)
3 41.385252 2.168779 POINT (2.16878 41.38525)
4 41.390692 2.148911 POINT (2.14891 41.39069)
Bera
81.7k14 gold badges84 silver badges198 bronze badges
answered Jul 17, 2021 at 8:07
2
  • Perfect! It worked fine. Happy to be on this platform. Thanks, families. Commented Jul 17, 2021 at 10:29
  • If this answer is acceptable to you, you must close the question by accepting the answer please. Commented Jul 17, 2021 at 17:27

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.