I have a script that I'm using to sort through a very large feature layer with thousands of records and create a summary table of them by type. It also calculates the count of records of the last 5 years to the current date for each year rather than the entire calendar year and then calculates the average. My output table is the type, average, and current count for the year. Right now I have this outputting to a CSV but I'm trying to get that CSV to overwrite a hosted feature table. I've referenced some posts on the Esri Community forums and it seems like this should be possible but I keep getting an error. Here is my code:
import arcpy, os, uuid
import pandas as pd
import datetime
from datetime import timedelta
from arcgis.features import GeoAccessor, FeatureLayer
from arcgis.gis import GIS
# variables
url = 'arcgis online url'
username = 'username'
password = 'password'
hostedTableID = 'item ID'
inputfc = r'feature layer from sde on server'
outputCSVFile = r'csv in folder on server'
# get table from AGOL
gis = GIS(url, username, password)
CrimeTrendsTable = gis.content.get(hostedTableID)
CrimeTrendsTableLyr = CrimeTrendsTable.tables[0]
crimeLayer = FeatureLayer(CrimeTrendsTableLyr.url, gis=gis)
# truncate table
crimeLayer.manager.truncate()
# import feature class and create slice of the data set in a new dataframe
df = pd.DataFrame.spatial.from_featureclass(inputfc)
df = df.sort_values(['OFFENSES_YEAR', 'CRIME_STAT_TYPE', 'FROM_DATE'], ascending= [True, True, True], ignore_index=True)
df = df.loc[:, ['OFFENSES_YEAR', 'CRIME_STAT_TYPE', 'FROM_DATE']]
# set variables for today and the previous 5 years
today = datetime.datetime.today()
todate1 = today - timedelta(365)
todate2 = today - timedelta(730)
todate3 = today - timedelta(1095)
todate4 = today - timedelta(1460)
todate5 = today - timedelta(1825)
this_year = datetime.datetime.today().year
one_year_ago = this_year - 1
two_years_ago = this_year - 2
three_years_ago = this_year - 3
four_years_ago = this_year - 4
five_years_ago = this_year - 5
# create new dataframes for current and each previous year to date
df0 = df.loc[(df['OFFENSES_YEAR']==this_year) & (df['FROM_DATE'] < today)]
df1 = df.loc[(df['OFFENSES_YEAR']==one_year_ago) & (df['FROM_DATE'] < todate1)]
df2 = df.loc[(df['OFFENSES_YEAR']==two_years_ago) & (df['FROM_DATE'] < todate2)]
df3 = df.loc[(df['OFFENSES_YEAR']==three_years_ago) & (df['FROM_DATE'] < todate3)]
df4 = df.loc[(df['OFFENSES_YEAR']==four_years_ago) & (df['FROM_DATE'] < todate4)]
df5 = df.loc[(df['OFFENSES_YEAR']==five_years_ago) & (df['FROM_DATE'] < todate5)]
# create tables for each current and previous year grouped by crime type, add together previous year tables and average
table0 = df0.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count().reset_index()\
.rename(columns={"OFFENSES_YEAR" : this_year})
table1 = df1.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table2 = df2.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table3 = df3.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table4 = df4.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
table5 = df5.groupby('CRIME_STAT_TYPE')['OFFENSES_YEAR'].count()
avgtable = pd.concat([table1, table2, table3, table4, table5]).groupby('CRIME_STAT_TYPE').mean().reset_index()\
.rename(columns={"OFFENSES_YEAR" : "previous5yearAverage"})
trendstable = avgtable.join(table0.set_index('CRIME_STAT_TYPE'), on='CRIME_STAT_TYPE').fillna(0)
trendstable.to_csv(outputCSVFile, sep='\t', encoding='utf-8')
# update hosted table from csv file
csvDF = GeoAccessor.from_table(outputCSVFile)
adds_fs = csvDF.spatial.to_featureset()
adds_dict = adds_fs.to_dict()
adds = adds_dict["features"]
crimeLayer.edit_features(adds=adds)
Here's an example of some data from my csv table:
CRIME_STAT_TYPE | previous5yearAverage | 2024 | |
---|---|---|---|
0 | AGGRAVATED ASSAULT | 12.000000 | 9.0 |
1 | ALL OTHER OFFENSES | 54.800000 | 46.0 |
2 | ANIMAL CRUELTY | 1.666667 | 1.0 |
3 | ARSON | 1.000000 | 0.0 |
4 | BAD CHECKS | 1.750000 | 1.0 |
5 | BURGLARY/BREAKING AND ENTERING | 44.400000 | 50.0 |
45 | WIRE FRAUD | 6.000000 | 0.0 |
This is the error I'm getting when I run the adds_fs = csvDF.spatial.to_featureset() line:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
In [60]:
Line 1: adds_fs = csvDF.spatial.to_featureset()
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in to_featureset:
Line 3573: return FeatureSet.from_dict(self.__feature_set__)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in __feature_set__:
Line 3297: if self.sr is None:
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\arcgis\features\geo\_accessor.py, in sr:
Line 3503: for g in self._data[self.name]
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\frame.py, in __getitem__:
Line 3505: indexer = self.columns.get_loc(key)
File C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\indexes\base.py, in get_loc:
Line 3631: raise KeyError(key) from err
KeyError: None
---------------------------------------------------------------------------
Also here is a screenshot of the CSV. I think something is happening with how pandas is creating the CSV. When trying to read the CSV file, it is not being read correctly. See below:
I'm wondering if it has something to do with the delimiter being used or something else?
I'm very new to all of this.
1 Answer 1
So I couldn't get the csv to work but I was able to get it to work using an xls file. The code is as follows:
finaltable.to_excel(outputXLSFile)
# update hosted table from xls file
finaltable = arcpy.ExcelToTable_conversion(outputXLSFile, outputTable, 'Sheet1')
arcpy.DeleteField_management(outputTable, ["COL_A"])
arcpy.management.Append(inputs=[outputTable], target=CrimeTrendsTableLyr.url)[0]
Explore related questions
See similar questions with these tags.
GeoAccessor.from_table()
function. It would be more efficient and reliable to just kick off directly from the dataframe that already has the data sitting there. The second question would be, does your data have a spatial component or is this just a non-spatial table?