Overwriting hosted table in ArcGIS Online with CSV file from Pandas dataframe in Python

Question 1

I have a script that I'm using to sort through a very large feature layer with thousands of records and create a summary table of them by type. It also calculates the count of records of the last 5 years to the current date for each year rather than the entire calendar year and then calculates the average. My output table is the type, average, and current count for the year. Right now I have this outputting to a CSV but I'm trying to get that CSV to overwrite a hosted feature table. I've referenced some posts on the Esri Community forums and it seems like this should be possible but I keep getting an error. Here is my code:

import arcpy, os, uuid
import pandas as pd
import datetime
from datetime import timedelta
from arcgis.features import GeoAccessor, FeatureLayer
from arcgis.gis import GIS
# variables
url = 'arcgis online url'
username = 'username'
password = 'password'
hostedTableID = 'item ID'
inputfc = r'feature layer from sde on server'
outputCSVFile = r'csv in folder on server'
# get table from AGOL
gis = GIS(url, username, password)
CrimeTrendsTable = gis.content.get(hostedTableID)
CrimeTrendsTableLyr = CrimeTrendsTable.tables[0]
crimeLayer = FeatureLayer(CrimeTrendsTableLyr.url, gis=gis)
# truncate table
crimeLayer.manager.truncate()
# import feature class and create slice of the data set in a new dataframe
df = pd.DataFrame.spatial.from_featureclass(inputfc)
df = df.sort_values(['OFFENSES_YEAR', 'CRIME_STAT_TYPE', 'FROM_DATE'], ascending= [True, True, True], ignore_index=True)
df = df.loc[:, ['OFFENSES_YEAR', 'CRIME_STAT_TYPE', 'FROM_DATE']]
# set variables for today and the previous 5 years
today = datetime.datetime.today()
todate1 = today - timedelta(365)
todate2 = today - timedelta(730)
todate3 = today - timedelta(1095)
todate4 = today - timedelta(1460)
todate5 = today - timedelta(1825)
this_year = datetime.datetime.today().year
one_year_ago = this_year - 1
two_years_ago = this_year - 2
three_years_ago = this_year - 3
four_years_ago = this_year - 4
five_years_ago = this_year - 5
# create new dataframes for current and each previous year to date
df0 = df.loc[(df['OFFENSES_YEAR']==this_year) & (df['FROM_DATE'] < today)]
df1 = df.loc[(df['OFFENSES_YEAR']==one_year_ago) & (df['FROM_DATE'] < todate1)]
df2 = df.loc[(df['OFFENSES_YEAR']==two_years_ago) & (df['FROM_DATE'] < todate2)]
df3 = df.loc[(df['OFFENSES_YEAR']==three_years_ago) & (df['FROM_DATE'] < todate3)]
df4 = df.loc[(df['OFFENSES_YEAR']==four_years_ago) & (df['FROM_DATE'] < todate4)]
df5 = df.loc[(df['OFFENSES_YEAR']==five_years_ago) & (df['FROM_DATE'] < todate5)]
# create tables for each current and previous year grouped by crime type, add together previous year tables and average
table0 = df0.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count().reset_index()\
 .rename(columns={"OFFENSES_YEAR" : this_year})
table1 = df1.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table2 = df2.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table3 = df3.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table4 = df4.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table5 = df5.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
avgtable = pd.concat([table1, table2, table3, table4, table5]).groupby('CRIME_STAT_TYPE').mean().reset_index()\
 .rename(columns={"OFFENSES_YEAR" : "previous5yearAverage"})
trendstable = avgtable.join(table0.set_index('CRIME_STAT_TYPE'), on='CRIME_STAT_TYPE').fillna(0)
trendstable.to_csv(outputCSVFile, sep='\t', encoding='utf-8')
# update hosted table from csv file
csvDF = GeoAccessor.from_table(outputCSVFile)
adds_fs = csvDF.spatial.to_featureset()
adds_dict = adds_fs.to_dict()
adds = adds_dict["features"]
crimeLayer.edit_features(adds=adds)

Here's an example of some data from my csv table:

CRIME_STAT_TYPE	previous5yearAverage	2024
0	AGGRAVATED ASSAULT	12.000000	9.0
1	ALL OTHER OFFENSES	54.800000	46.0
2	ANIMAL CRUELTY	1.666667	1.0
3	ARSON	1.000000	0.0
4	BAD CHECKS	1.750000	1.0
5	BURGLARY/BREAKING AND ENTERING	44.400000	50.0
45	WIRE FRAUD	6.000000	0.0

This is the error I'm getting when I run the adds_fs = csvDF.spatial.to_featureset() line:

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
In [60]:
Line 1: adds_fs = csvDF.spatial.to_featureset()
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in to_featureset:
Line 3573: return FeatureSet.from_dict(self.__feature_set__)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in __feature_set__:
Line 3297: if self.sr is None:
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in sr:
Line 3503: for g in self._data[self.name]
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\frame.py, in __getitem__:
Line 3505: indexer = self.columns.get_loc(key)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexes\base.py, in get_loc:
Line 3631: raise KeyError(key) from err
KeyError: None
---------------------------------------------------------------------------

Also here is a screenshot of the CSV. I think something is happening with how pandas is creating the CSV. When trying to read the CSV file, it is not being read correctly. See below:

CSV Output in Python Console

I'm wondering if it has something to do with the delimiter being used or something else?

I'm very new to all of this.

Question 2

First - you are outputting a CSV file from a data frame, and then immediately reading the CSV back into a data frame using the GeoAccessor.from_table() function. It would be more efficient and reliable to just kick off directly from the dataframe that already has the data sitting there. The second question would be, does your data have a spatial component or is this just a non-spatial table?

Question 3

Also, how large is this dataset? It may not be best practice for the API to use the edit_features() function and may be better advised to switch to an append method.

Question 4

@AustinAverill - thanks for the advice! It's my first time using python and pandas. I ended up pushing the data to an xls instead of a csv and that worked for some reason. It's just a non-spatial table. The original dataset is quite large but by the time it's filtered and summarized it's not that big at all. I'll post my work around as the answer.

Question 5

So I couldn't get the csv to work but I was able to get it to work using an xls file. The code is as follows:

finaltable.to_excel(outputXLSFile)
# update hosted table from xls file
finaltable = arcpy.ExcelToTable_conversion(outputXLSFile, outputTable, 'Sheet1')
arcpy.DeleteField_management(outputTable, ["COL_A"])
arcpy.management.Append(inputs=[outputTable], target=CrimeTrendsTableLyr.url)[0]

Fred Mitchell Fred Mitchell 111 bronze badge · Accepted Answer · 2024-04-23 17:41:24Z

So I couldn't get the csv to work but I was able to get it to work using an xls file. The code is as follows:

finaltable.to_excel(outputXLSFile)
# update hosted table from xls file
finaltable = arcpy.ExcelToTable_conversion(outputXLSFile, outputTable, 'Sheet1')
arcpy.DeleteField_management(outputTable, ["COL_A"])
arcpy.management.Append(inputs=[outputTable], target=CrimeTrendsTableLyr.url)[0]

Stack Exchange Network

Overwriting hosted table in ArcGIS Online with CSV file from Pandas dataframe in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Overwriting hosted table in ArcGIS Online with CSV file from Pandas dataframe in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions