Better approach for selecting related records in multiple featureclasses based on query of related table? (sde/SQL server)

Question 1

I have an enterprise geodatabase (SQL) that contains a table ("linktable") which has a key field (Main_ID) that is also in up to a dozen related feature classes (via a relationship class).

I want to run a script in Arctoolbox that allows the user to specify a query statement on the linktable. This would then gather the Main_IDs of the resulting records and go through the feature classes open in an mxd one by one looking for the related Main_ID field in each of them. I want the related records in each feature class selected. The user can pick the selection method (new, add_to, remove, etc).

There are over 180,000 records in linktable. Potentially tens of thousands in each fc.

I came up with a method 1-2 years ago that involved creating a long SQL statement with IN on each fc. It works up until that list gets way too long.
It was suggested I repost this issue here and try to solve it using more db-centric ideas. I'm just not a full-time developer and my python only goes so far. I would love to find a more efficient solution. I'm posting the current, inefficient code here as well as a link to my most recent post on this problem

import arcpy, os, string
import arcpy.mapping as MAP
#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]
#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)
#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)
#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)
#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"
#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
 # Set the SearchCursor to look through the selection of the linkTable
 sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
 # Add DBMS-specific field delimiters
 fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
 # Add single-quotes for string field values
 valueList = ["'%s'" % value for value in sourceIDs]
 # Format WHERE clause in the form of an IN statement
 whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
 arcpy.AddMessage("SQL Clause: {}".format(whereClause))
 for lyr in layers:
 if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
 # Process: Select Layer By Attribute
 arcpy.AddMessage("Querying related records in {0}".format(lyr))
 arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)
else:
 arcpy.AddMessage("No availble layers for selection.")
 sys.exit(0)
del sqlExp, mxd, df, linkTable, linkKeyField, layers

Question 2

Perhaps insert split of big list into smaller lists before selecting them, however I think using da.updatecursor on some field might work even faster. Populate it with 1 or 0, depending on guid being in the set or not.

Question 3

Well, 'splitting' the list was described in the link in my post as a solution. I assumed there'd be a more efficient way, however. Not sure I really follow your other suggestion. Nothing is getting updated.

Question 4

After populating this field, you can use this field=1 query. Da.updateCursor is very fast.

Question 5

If all you need is a selection statement, why aren't you just using a subselect, as in a WHERE clause of joincol in (SELECT DISTINCT joincol FROM lookuptab WHERE usersupplied_constraint). That's it. No cursors, no ugly set algebra. Bam!

Question 6

@Vince I'm not a big developer. I don't really understand your answer. Can you help point out which lines to replace? If that line gets the unique Main_IDs from linktable, I'm not sure then how to put that in a value that goes through each fc to find and select related records? Would love an example of what to replace.

Question 7

You're not letting the RDBMS do its job. There's no reason to use any cursors to do what is a standard part of the SQL language: The subquery.

I haven't tested this in any way, just commented out the unnecessary code and added one statement (on two lines):

import arcpy, os, string
import arcpy.mapping as MAP
#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]
#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)
#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)
#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)
#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"
#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
## # Set the SearchCursor to look through the selection of the linkTable
## sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
## # Add DBMS-specific field delimiters
## fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
## # Add single-quotes for string field values
## valueList = ["'%s'" % value for value in sourceIDs]
## # Format WHERE clause in the form of an IN statement
## whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
 whereClause = "{:s} IN (SELECT {:s} FROM {:s} WHERE {:s})".format(
 linkKeyField,linkKeyField,linkTable,sqlExp)
 arcpy.AddMessage("SQL Clause: {}".format(whereClause))
 for lyr in layers:
 if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
 # Process: Select Layer By Attribute
 arcpy.AddMessage("Querying related records in {0}".format(lyr))
 arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)
else:
 arcpy.AddMessage("No available layers for selection.")
 sys.exit(0)
del sqlExp, mxd, df, linkTable, linkKeyField, layers

You should of course make sure that each feature class has an index on the link column, and that the link table has indexes on likely query columns from the user expression.

Your code should verify that the sqlExp is not empty (SQL syntax error)

Question 8

Thanks to both @Vince and FelixIP I appreciate the feedback. I hadn't spent anytime working with subqueries so this was new to me but clearly it's a powerful tool. This worked great w/out the need for the complexity of my original code.

Question 9

I tested my idea about supplementary field and update cursor.

From this table:

enter image description here

I created 3 feature classes (it does not matter how) with 60,000 records each and ran this script:

import arcpy
fClasses=["ZERO","ONE","TWO"]
linkTable="LINKTABLE"
linkKeyField = "NAME"
qField="ONE"
sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField)])
for fc in fClasses:
 try: arcpy.AddField_management(fc, qField, "INTEGER")
 except:pass
 with arcpy.da.UpdateCursor(fc,(linkKeyField,qField)) as cursor:
 for link,one in cursor:
 if link in sourceIDs:one=1
 else:one=0
 cursor.updateRow((link,one))
 arcpy.SelectLayerByAttribute_management(fc, "NEW_SELECTION","%s =1"%qField)

It took 15.8 seconds to complete on my very old machine at home and result was exactly what I expected, i.e. every single record in all three feature classes were selected in current mxd, e.g.:

enter image description here

Question 10

Yes, it could be done this way, but with a SQL query of WHERE linkkeyfield IN (SELECT linkkeyfield FROM linktable) (which is the worst case in terms of data volume, but the OP states there will be a user query on the link table, so it needs a WHERE as well), you wouldn't need the AddField, or the SearchCursor, or the UpdateCursor. You should probably mention that the linkKeyField should have an index.

Question 11

Such complex queries are well outside my league and I have no experience with SDE.

Question 12

Some folks use subqueries too much (when the proper use pattern would be a JOIN), but subqueries have been part of the SQL language forever and should be part of every technologist's toolbox. Even file geodatabase supports subqueries, so it's only shapefiles which are left out.

Question 13

@FelixIP Really appreciate you taking the time to provide this input. I've no doubt this would be a useful solution to another problem but it seems like the subquery route works great too.

Vince Vince 20.5k16 gold badges49 silver badges65 bronze badges · Accepted Answer · 2017-10-15 04:40:31Z

You're not letting the RDBMS do its job. There's no reason to use any cursors to do what is a standard part of the SQL language: The subquery.

I haven't tested this in any way, just commented out the unnecessary code and added one statement (on two lines):

import arcpy, os, string
import arcpy.mapping as MAP
#all relevant fcs as well as Link table should be loaded into current mxd
mxd = MAP.MapDocument("CURRENT")
df = MAP.ListDataFrames(mxd)[0]
#Get Link tables from input
linkTable = arcpy.GetParameterAsText(0)
#Make lists of all feature layers in dataframe
layers = MAP.ListLayers(mxd, "", df)
#get sql expression (sql calculator, query of linktable)
sqlExp = arcpy.GetParameterAsText(1)
arcpy.AddMessage(sqlExp)
#set selection type (new, add_to, etc)
selectType = arcpy.GetParameterAsText(2)
#key field contains Main_IDs that potentially relate Link table to fcs
linkKeyField = "Main_ID"
#Main code to select all records in fc with matching Main_ID in Linktable
if len(layers) > 0:
## # Set the SearchCursor to look through the selection of the linkTable
## sourceIDs = set([row[0] for row in arcpy.da.SearchCursor(linkTable, linkKeyField, sqlExp)])
## # Add DBMS-specific field delimiters
## fieldDelimited = arcpy.AddFieldDelimiters(arcpy.Describe(linkTable).path, linkKeyField)
## # Add single-quotes for string field values
## valueList = ["'%s'" % value for value in sourceIDs]
## # Format WHERE clause in the form of an IN statement
## whereClause = "%s IN(%s)" % (fieldDelimited, ', '.join(map(str, valueList)))
 whereClause = "{:s} IN (SELECT {:s} FROM {:s} WHERE {:s})".format(
 linkKeyField,linkKeyField,linkTable,sqlExp)
 arcpy.AddMessage("SQL Clause: {}".format(whereClause))
 for lyr in layers:
 if len(arcpy.ListFields(lyr, "Main_ID")) > 0:
 # Process: Select Layer By Attribute
 arcpy.AddMessage("Querying related records in {0}".format(lyr))
 arcpy.SelectLayerByAttribute_management(lyr, selectType, whereClause)
else:
 arcpy.AddMessage("No available layers for selection.")
 sys.exit(0)
del sqlExp, mxd, df, linkTable, linkKeyField, layers

You should of course make sure that each feature class has an index on the link column, and that the link table has indexes on likely query columns from the user expression.

Your code should verify that the sqlExp is not empty (SQL syntax error)

Thanks to both @Vince and FelixIP I appreciate the feedback. I hadn't spent anytime working with subqueries so this was new to me but clearly it's a powerful tool. This worked great w/out the need for the complexity of my original code.

Stack Exchange Network

Better approach for selecting related records in multiple featureclasses based on query of related table? (sde/SQL server)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Better approach for selecting related records in multiple featureclasses based on query of related table? (sde/SQL server)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions