Reading from Excel, performing geospatial analysis and writing to Excel files

Question 1

I have written a Python snippet that reads lat and long stored in an Excel file. It converts them to a point which is then used to perform multiple geospatial analysis including buffer, intersection, and searching within a shapefile's attribute table (finding count of a specific attribute occurrence and a single row can have multiple attributes in the same column). This code snippet is very slow. I need to traverse around 2000 records and search for around 179 attributes (count) against 10 different radii. It takes around 2.45 minutes to traverse against 1 point for 10 radii and search for the count of only 10 attribute occurrences. Is there any way to speed up this process?

I am attaching the code below.

# importing libraries
from openpyxl import load_workbook
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
#import matplotlib.pyplot as plt
import csv
#loading input excel
book = load_workbook(r'File path of input file.xlsx')
sheet = book.active
bookOut = r'outputfile.csv'
#outputSheet = bookOut.active
#searching for a specific category and returning its count
def bSearch(lst, cat):
 c = 0
 for index, row in lst[0:len(lst)].iterrows():
 if cat in row['Category_t']:
 c = c+1
 return (c)
 
# creates buffer
def createbuf(p,r):
 bufp = p['geometry'].buffer(distance = r)
 buf = gpd.GeoDataFrame(geometry = bufp, crs = ucs.crs)
 return (buf)
# calculates intersection
def calint(b,u):
 areaoi = gpd.overlay(u, b, how= "intersection") 
 return (areaoi)
# shapefiles Input
ucs = gpd.read_file(r'Boundary.shp')
ucs = ucs.to_crs(epsg=32643)
#Business List
business = gpd.read_file(r'List of Categories.shp')
business = business.to_crs(epsg=32643)
businessArray = [
'Cat1','Cat2','Cat3','Cat4','Cat5','Cat6','Cat7','Cat8','Cat9','Cat10'
]
radiusVals=[0.5, 1, 1.5, 2, 2.5, 3.000, 4.000, 5.000, 7.000, 10.000]
# lat long
latlng = []
i = 0
otp = []
for row in range(143, sheet.max_row+1):
 detStore = []
 ind = 0
 for column in "AB":
 cell_name = "{}{}".format(column, row)
 latlng.append(sheet[cell_name].value)
 detStore.append(sheet[cell_name].value)
 for column in "CD":
 cell_name = "{}{}".format(column, row)
 detStore.append(sheet[cell_name].value)
 # PCS SRID
 p1 = Point((latlng[i],latlng[i+1]))
 df = pd.DataFrame({'a':[latlng[i+1],latlng[i]]})
 po = gpd.GeoDataFrame(geometry = [p1], crs = ucs.crs)
 po['geometry'] = po['geometry'].to_crs(epsg = 32643)
 print(i, "--", latlng[i+1], latlng[i])
 i = i+2
 # calling functions
 for r in range (0, len(radiusVals)):
 outputStore = []
 for d in range (0, len(detStore)):
 outputStore.append(detStore[d])
 outputStore.append(radiusVals[r])
 bufo = createbuf(po,(radiusVals[r]/111))
 aoi = calint(bufo ,ucs)
 busList = calint(bufo, business)
 for b in range (0, len(businessArray)):
 buSearch = bSearch (busList,businessArray[b])
 outputStore.append(buSearch)
 otp.append(outputStore)
print (otp)
fields = ['Location X', 'Location Y', 'Type', 'Name', 'Radius', 'Cat1','Cat2','Cat3','Cat4','Cat5','Cat6','Cat7','Cat8','Cat9','Cat10']
with open(bookOut, 'w', newline='') as f:
 write = csv.writer (f)
 write.writerow(fields)
 write.writerows(otp)

Question 2

Cross-posted as gis.stackexchange.com/q/393860/115.

Question 3

Do you have an input .shp file we can use to test your code with? I'd like to see if there's anything I can do for you, even after all this time, but reviewing Python-Excel combos without having a dataset to test with is adding complexity. And possibly the reason your question hasn't been answered yet.

Question 4

DRY

This list is repeated twice in the code:

'Cat1','Cat2','Cat3','Cat4','Cat5','Cat6','Cat7','Cat8','Cat9','Cat10'

You assigned it to the businessArray variable, which is good, but you could also use it when assigning to the fields variable.

Also, there is repetition in the list itself which can be eliminated with a loop, for example.

Comments

You should remove all commented-out code to reduce clutter. For example:

#import matplotlib.pyplot as plt
#outputSheet = bookOut.active

This comment can be removed since it is obvious from the code that you are importing:

# importing libraries

This comment is vague and is not needed if you give the variable a more meaningful name:

# lat long
latlng = []

For example:

coordinates = []

Note that it is common practice to use plural nouns for array variable names.

Documentation

The PEP 8 style guide recommends adding docstrings for functions. For example, you can convert your comments:

#searching for a specific category and returning its count
def bSearch(lst, cat):

into docstrings:

def bSearch(lst, cat):
 """ searching for a specific category and returning its count ""

Naming

PEP-8 recommends snake_case for function and variable names. bSearch would be b_search. You should also expand on what you mean by "b" because it is not obvious from the context.

In the bSearch function, c would be better as count.

calint might be better as calc_intersect because "int" is vague.

Layout

Move the functions to the top after the import lines. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

It is common to omit parentheses with simple return statements:

return (c)

Simpler

There is no need for the 0 start with range:

for r in range (0, len(radiusVals)):

This line can be simplified with an f-string:

cell_name = "{}{}".format(column, row)

The following:

i = i+2

can be simplified using the special assignment operator:

i += 2

The same goes for:

c = c+1

In the calint function, there is no need for the intermediate areaoi variable:

areaoi = gpd.overlay(u, b, how= "intersection") 
return (areaoi)

this is simpler:

return gpd.overlay(u, b, how= "intersection")

There is no need for range and len:

 for d in range (0, len(detStore)):
 outputStore.append(detStore[d])

This is simpler:

 for d_store in detStore:
 outputStore.append(d_store)

toolic toolic 14.5k5 gold badges29 silver badges203 bronze badges · Answer 1 · 2025-03-10 17:31:30Z

DRY

This list is repeated twice in the code:

'Cat1','Cat2','Cat3','Cat4','Cat5','Cat6','Cat7','Cat8','Cat9','Cat10'

You assigned it to the businessArray variable, which is good, but you could also use it when assigning to the fields variable.

Also, there is repetition in the list itself which can be eliminated with a loop, for example.

Comments

You should remove all commented-out code to reduce clutter. For example:

#import matplotlib.pyplot as plt
#outputSheet = bookOut.active

This comment can be removed since it is obvious from the code that you are importing:

# importing libraries

This comment is vague and is not needed if you give the variable a more meaningful name:

# lat long
latlng = []

For example:

coordinates = []

Note that it is common practice to use plural nouns for array variable names.

Documentation

The PEP 8 style guide recommends adding docstrings for functions. For example, you can convert your comments:

#searching for a specific category and returning its count
def bSearch(lst, cat):

into docstrings:

def bSearch(lst, cat):
 """ searching for a specific category and returning its count ""

Naming

PEP-8 recommends snake_case for function and variable names. bSearch would be b_search. You should also expand on what you mean by "b" because it is not obvious from the context.

In the bSearch function, c would be better as count.

calint might be better as calc_intersect because "int" is vague.

Layout

Move the functions to the top after the import lines. Having them in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

It is common to omit parentheses with simple return statements:

return (c)

Simpler

There is no need for the 0 start with range:

for r in range (0, len(radiusVals)):

This line can be simplified with an f-string:

cell_name = "{}{}".format(column, row)

The following:

i = i+2

can be simplified using the special assignment operator:

i += 2

The same goes for:

c = c+1

In the calint function, there is no need for the intermediate areaoi variable:

areaoi = gpd.overlay(u, b, how= "intersection") 
return (areaoi)

this is simpler:

return gpd.overlay(u, b, how= "intersection")

There is no need for range and len:

 for d in range (0, len(detStore)):
 outputStore.append(detStore[d])

This is simpler:

 for d_store in detStore:
 outputStore.append(d_store)

Stack Exchange Network

Reading from Excel, performing geospatial analysis and writing to Excel files

1 Answer 1

DRY

Comments

Documentation

Naming

Layout

Simpler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

1 Answer 1

Comments

Documentation

Naming

Layout

Simpler

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related