I have an enterprise geodatabase (SQL) that contains a table ("linktable") which has a key field (Main_ID) that is also in up to a dozen related feature classes (via a relationship class).
I want to run a script in Arctoolbox that allows the user to specify a query statement on the linktable. This would then gather the Main_IDs of the resulting records and go through the feature classes open in an mxd one by one looking for the related Main_ID field in each of them. I want the related records in each feature class selected. The user can pick the selection method (new, add_to, remove, etc).
There are over 180,000 records in linktable. Potentially tens of thousands in each fc.
I came up with a method 1-2 years ago that involved creating a long SQL statement with IN on each fc. It works up until that list gets way too long.
It was suggested I repost this issue here and try to solve it using more db-centric ideas. I'm just not a full-time developer and my python only goes so far. I would love to find a more efficient solution. I'm posting the current, inefficient code here as well as a link to my most recent post on this problem
import arcpy, os, string
import arcpy.mapping as MAP
#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]
#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)
#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)
#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)
#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"
#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
# Set the SearchCursor to look through the selection of the linkTable
sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
# Add DBMS-specific field delimiters
fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
# Add single-quotes for string field values
valueList = ["'%s'" % value for value in sourceIDs]
# Format WHERE clause in the form of an IN statement
whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
arcpy.AddMessage("SQL Clause: {}".format(whereClause))
for lyr in layers:
if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
# Process: Select Layer By Attribute
arcpy.AddMessage("Querying related records in {0}".format(lyr))
arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)
else:
arcpy.AddMessage("No availble layers for selection.")
sys.exit(0)
del sqlExp, mxd, df, linkTable, linkKeyField, layers
2 Answers 2
You're not letting the RDBMS do its job. There's no reason to use any cursors to do what is a standard part of the SQL language: The subquery.
I haven't tested this in any way, just commented out the unnecessary code and added one statement (on two lines):
import arcpy, os, string
import arcpy.mapping as MAP
#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]
#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)
#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)
#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)
#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"
#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
## # Set the SearchCursor to look through the selection of the linkTable
## sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
## # Add DBMS-specific field delimiters
## fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
## # Add single-quotes for string field values
## valueList = ["'%s'" % value for value in sourceIDs]
## # Format WHERE clause in the form of an IN statement
## whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
whereClause = "{:s} IN (SELECT {:s} FROM {:s} WHERE {:s})".format(
linkKeyField,linkKeyField,linkTable,sqlExp)
arcpy.AddMessage("SQL Clause: {}".format(whereClause))
for lyr in layers:
if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
# Process: Select Layer By Attribute
arcpy.AddMessage("Querying related records in {0}".format(lyr))
arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)
else:
arcpy.AddMessage("No available layers for selection.")
sys.exit(0)
del sqlExp, mxd, df, linkTable, linkKeyField, layers
You should of course make sure that each feature class has an index on the link column, and that the link table has indexes on likely query columns from the user expression.
Your code should verify that the sqlExp
is not empty (SQL syntax error)
-
Thanks to both @Vince and FelixIP I appreciate the feedback. I hadn't spent anytime working with subqueries so this was new to me but clearly it's a powerful tool. This worked great w/out the need for the complexity of my original code.MattS– MattS2017年10月15日 16:50:47 +00:00Commented Oct 15, 2017 at 16:50
I tested my idea about supplementary field and update cursor.
From this table:
I created 3 feature classes (it does not matter how) with 60,000 records each and ran this script:
import arcpy
fClasses=["ZERO","ONE","TWO"]
linkTable="LINKTABLE"
linkKeyField = "NAME"
qField="ONE"
sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField)])
for fc in fClasses:
try: arcpy.AddField_management(fc, qField, "INTEGER")
except:pass
with arcpy.da.UpdateCursor(fc,(linkKeyField,qField)) as cursor:
for link,one in cursor:
if link in sourceIDs:one=1
else:one=0
cursor.updateRow((link,one))
arcpy.SelectLayerByAttribute_management(fc, "NEW_SELECTION","%s =1"%qField)
It took 15.8 seconds to complete on my very old machine at home and result was exactly what I expected, i.e. every single record in all three feature classes were selected in current mxd, e.g.:
-
Yes, it could be done this way, but with a SQL query of
WHERE linkkeyfield IN (SELECT linkkeyfield FROM linktable)
(which is the worst case in terms of data volume, but the OP states there will be a user query on the link table, so it needs a WHERE as well), you wouldn't need the AddField, or the SearchCursor, or the UpdateCursor. You should probably mention that the linkKeyField should have an index.Vince– Vince2017年10月15日 04:23:23 +00:00Commented Oct 15, 2017 at 4:23 -
Such complex queries are well outside my league and I have no experience with SDE.FelixIP– FelixIP2017年10月15日 07:29:42 +00:00Commented Oct 15, 2017 at 7:29
-
Some folks use subqueries too much (when the proper use pattern would be a JOIN), but subqueries have been part of the SQL language forever and should be part of every technologist's toolbox. Even file geodatabase supports subqueries, so it's only shapefiles which are left out.Vince– Vince2017年10月15日 14:01:14 +00:00Commented Oct 15, 2017 at 14:01
-
@FelixIP Really appreciate you taking the time to provide this input. I've no doubt this would be a useful solution to another problem but it seems like the subquery route works great too.MattS– MattS2017年10月15日 16:51:44 +00:00Commented Oct 15, 2017 at 16:51
joincol in (SELECT DISTINCT joincol FROM lookuptab WHERE usersupplied_constraint)
. That's it. No cursors, no ugly set algebra. Bam!