I have a point shapefile which has duplicate points based on an "ID".
ID1 + permit1 + resources associated to permit
ID1 + permit2 + resources associated to permit
ID2 + permit1 + resources associated to permit
etc.
In my output I want 1 point for each ID but want to merge all associated permit and resources information for that one point into one row.
I have found some helpful scripts online (credit specifically to Xander Bakker: https://community.esri.com/t5/python-questions/arcpy-search-duplicate-attributes-then-update/td-p/534281) on how to use search cursors, dictionaries, and update cursors to merge attributes into just one field (permits). However, I am struggling to adjust my script to also update additional fields (resources).
Additionally, I think I need to create new fields with a greater character limit and populate those fields instead of the existing fields so I don't run into an issue with limited character spaces in the existing fields.
See photos below for desired output.
How do I manipulate and/or simplify my code to update not just one field but both fields (permit and resources)?
Dictionaries are confusing!
Current script (which is only updating the existing permit field):
def main():
fc_in = intersect_output
fc_out = r"D:\DRC\vector\intersect_output_clean.shp"
fld_id = "ID"
fld_permit = "Permit"
sr = arcpy.Describe(fc_in).spatialReference
# create empty output fc
fc_ws, fc_name = os.path.split(fc_out)
arcpy.CreateFeatureclass_management(fc_ws, fc_name, template=fc_in, spatial_reference=sr)
# create dictionary with ID's to store related permits
flds = (fld_id, fld_permit)
dct = {}
with arcpy.da.SearchCursor(fc_in, flds) as curs_in:
for row_in in curs_in:
p_id = row_in[0]
p_permit = row_in[1]
if p_id in dct:
dct[p_id] += "+ {0}".format(p_permit)
else:
dct[p_id] = p_permit
del row_in, curs_in
# now do the update:
flds = correctFieldList(arcpy.ListFields(fc_out))
with arcpy.da.InsertCursor(fc_out, flds) as curs_out:
with arcpy.da.SearchCursor(fc_in, flds) as curs_in:
for row_in in curs_in:
lst_row = list(row_in)
p_id = row_in[flds.index(fld_id)]
if p_id in dct:
permits = dct[p_id]
lst_row[flds.index(fld_permit)] = permits
tmp = dct.pop(p_id)
curs_out.insertRow(tuple(lst_row))
def correctFieldList(flds):
flds_use = ['Shape@']
fldtypes_not = ['Geometry', 'Guid', 'OID']
for fld in flds:
if not fld.type in fldtypes_not:
flds_use.append(fld.name)
return flds_use
if __name__ == '__main__':
main()
-
I've got to frame challenge here. The "multiple rows in one row" model is a violation of basic relational principles. It's far better to model this with a one-to-many join, at which point you'll be using the software as designed and tested. Attempting this with shapefile, with all its limitations, can only make an ugly solution tragic.Vince– Vince2023年02月07日 19:06:56 +00:00Commented Feb 7, 2023 at 19:06
-
Vince, are you politely suggesting that I work with feature classes within a geodatabase instead of shapefiles? Additionally, running a one-to-many spatial join would give me duplicate points in my output which is not the end goal. I have tried a one-to-one spatial join and instructed the tool to join attributes for the two fields I want to aggregate attributes for. However, some of the rows in the output included the join delimiter that I set (a + sign) but nothing after the plus sign... which I assume means it did not successfully join the attributes from both tables?Katie B.– Katie B.2023年02月07日 20:05:25 +00:00Commented Feb 7, 2023 at 20:05
1 Answer 1
Try this. Concatenate 2 fields into new field COMBO and use this field calculator expression to compute consecutive number per ID:
Use Pivot table tool to get this:
Delete identical in a copy of original shapefile and join pivot table to it.
Explore related questions
See similar questions with these tags.