Geodatabase fields can be assigned domains. ArcGIS Desktop uses the domains as lookup tables (a.k.a. validation tables).
When users edit data in the attribute table
or attribute editing window
, the values are restricted to the list of codes in the domains. This is a useful tool for enforcing data integrity at the application level.
However, it is easy for users to accidentally circumvent the domains and enter erroneous values by using the field calculator
, ArcCatalog LOAD tool
, geoprocessing tools
, etc.. This is the main cause of errors in geodatabase tables at my organization.
Is there a way to use domains to restrict values that can be entered (beyond the attribute table) to truly force data integrity?
-
2Are you truly using Oracle? If so you can use the on store trigger to reject an edit at the database level... a lot of work though. You are correct that domain bound columns can be subverted, education is the answer to this - let the operators know that it causes pain to subvert the domain (and threaten to make them fix their mistakes or revoke their edit permissions).Michael Stimson– Michael Stimson2017年03月24日 01:41:25 +00:00Commented Mar 24, 2017 at 1:41
-
1Here's a basic starting point tech-recipes.com/rx/19839/… and docs.oracle.com/database/121/LNPLS/… I have used such in PostgreSQL but know that similar operations exist in Oracle and MS SQL. I found them difficult to setup but I'm not truly a DBA, someone who is trained to administer an enterprise database properly would be much better at them.Michael Stimson– Michael Stimson2017年03月24日 01:46:48 +00:00Commented Mar 24, 2017 at 1:46
-
2@MichaelMiles-Stimson, I should add "threats" to my list of techniques for quality data.Fezter– Fezter2017年03月29日 02:53:12 +00:00Commented Mar 29, 2017 at 2:53
4 Answers 4
Business rules vs data integrity rules
When you model your database, you specify the business logic in two spaces:
a. The data integrity rules. This includes among others having an integer column so users won't be able to enter strings.
b. The business domain rules. This includes having check/unique/foreign key constraints on columns so it won't be possible to store duplicate values or illegal values.
I think it's a valid choice to use geodatabase domains to set up rules for the business domain, however you should keep in mind that those rules are really just helpers as they do not expose any restrictions on the database table itself. On the other hand, you might like your geodatabase playing a central role in defining the business rules for ArcGIS end users.
Possible solution
Here is what has worked for me best over last years. Say you have a feature class Roads
(stored as a database table) which stores information about the roads and it has at least one column called Type
.
- Create a new database table called
RoadType
with two columns,Id
andDescription
. - Populate this table according to your business domain rules (road can be of major, minor, highway, interstate, motorway, ramp type). So, you will have say 10 rows in this table.
- Create a foreign key connecting the
Roads
feature classType
column with theRoadType
tableId
column. This means that it won't be possible to define for a feature as a road type a value that is not in theId
column. - Now you use this
RoadType
table to create a geodatabase domain (can be done either inarcpy
if you are comfortable with scripting or using GP tools).
Advantages of the solution
You keep your business domain rules in a plain database table available for other applications who can work with SQL and they don't have to work with XML (as when one is trying to get information about geodatabase domains using plain SQL). If your database table will be edited by other applications that are not aware of ArcGIS geodatabase domains (maybe you have a non-spatial batch operation loading data into the
Roads
feature class every night), you can be sure that the data integrity will be enforced.You are still using geodatabase domains for ArcGIS users who are editing data in a rich client such as ArcMap and when editing feature's attributes, they will see the available choices for a field that has a domain associated with it. This is indeed helpful.
When you find out that you should add a new road type, it's all about writing a short SQL snippet that will insert new rows into the
RoadType
table and then usingarcpy
to update the geodatabase domain accordingly.This makes maintaining the geodatabase domains very easy - should you move your database data to another database where you don't have Esri enterprise geodatabase enabled, your data integrity rules will still be enforced. Also, if you will load your database tables into a new enterprise geodatabase, you can always create geodatabase domain using the
RoadType
table since your business rules table (RoadType
) is following your feature class.
QA/QC using ArcGIS
One would also like to perform various QA/QC operations on the business data. Non-spatial data integrity can be enforced fairly easily as described above. However, checking for spatial integrity can be a lot more difficult to perform. Even though one could write SQL using Oracle Spatial or ST_Geometry functions, I'd suggest using ArcGIS tools for this.
You can use multiple tools for QA/QC that can do a lot of checks in an automated fashion. It is important to note that you can schedule your Python modules to run at a certain time using Windows Task Scheduler.
Geodatabase topology (you can also use GP tools such as
Validate Topology
andExport Topology Errors
as part of your Python script).Data Reviewer extension provides a lot of checks. Again, you can easily schedule a task as specified here in the Help page.
Custom Python scripts that where you can use the power of
arcpy
to generate reports on the QA/QC done and edit your data as needed.
-
This is a good approach where your domains are not subtype based. But can it be made to work when a field should be using a different domain for each subtype? This is the case for many of the fields in our geodatabase.Son of a Beach– Son of a Beach2017年04月04日 02:25:29 +00:00Commented Apr 4, 2017 at 2:25
-
Sure, why not. You would either create a table for each subtype or would have 3 fields in the
RoadType
table -Id
,Subtype
,RoadType
,Description
. So you could have44, 'City', 'Gravel', 'Desc'
and45, 'Coutryside', 'Gravel', 'Desc'
for subtypesCity
andCoutryside
. Does this make sense?Alex Tereshenkov– Alex Tereshenkov2017年04月04日 05:51:33 +00:00Commented Apr 4, 2017 at 5:51 -
@AlexTereshenkov Do you use foreign keys in versioned geodatabases? If yes, are they a) versioned, registered with base --or-- b) versioned, NOT registered with base? I ask, because of this question: Can primary keys and foreign keys be used in enterprise geodatabases?User1974– User19742017年04月06日 16:01:25 +00:00Commented Apr 6, 2017 at 16:01
-
1@Wilson, the geodabases I take care of are non-versioned. It's a lot of work to have them versioned and this often makes things over complicated. In most cases the versioning is being misused which brings a lot of pain both for dba people and GIS admins.Alex Tereshenkov– Alex Tereshenkov2017年04月06日 18:41:21 +00:00Commented Apr 6, 2017 at 18:41
-
1Sounds very familiar. Can IT provide any valid arguments why editing should happen in a version? There should be a specific reason why some workflow is being implemented. You can refer to this page to review the scenarios and analyze carefully all the pros and cons listed there - desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/version-scenarios.htm. In terms of "trust" to students and new folks, having a check-out-in replicas desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/… might help, but there are definitely other options.Alex Tereshenkov– Alex Tereshenkov2017年04月07日 05:29:17 +00:00Commented Apr 7, 2017 at 5:29
An alternative is to check for erroneous values after they've been entered, rather than prevent them from being entered in the first place.
It's been a while since I've done this. A rough workflow is: For each table, for each field with a domain, check all values against the associated domain.
This could be done in Python, although it would be painfully slow. I prefer to do it in procedural SQL (PL/SQL is lightning fast). I can check the entire database in less time than it takes for ArcPy to warm up (approx. 5 seconds).
Rough steps:
1 . List of all fields and their domains, for each table using SQL (table_name, field_name, domain_name)
2a. Loop through list of fields, check against lookup tables
2b. Or, instead of a loop, create a function to check values against lookup tables
It is not possible for ArcGIS to prevent bad values being entered by means other than manual data entry. This is a serious headache, and should be addressed by Esri!
Work Around in ArcMap:
However, you can use the "Validate" option in the "Editor" menu to check for features with invalid values. You will need to have the Editing toolbar visible for this.
To do this:
- have an edit session running ( Editor -> Start Editing )
- select the features you want to validate
- choose "Validate Features" from the "Editor" menu
The output of this command is rather vague, but it will at least tell you whether the features' attributes all contain valid values or not (according to their assigned domains). If they do not, then it is up to you to figure out which ones are wrong, but at least it's a start.
Unfortunately, the validate features option only works on feature classes, not non-spatial tables, as the name implies.
-
1There is also Esri's Data Reviewer Extension, but as with all their extensions, it is expensive. Setting up automated reviewer runs is not difficult, but getting it to automatically do anything useful with the results is impossible (confirmed with a support call to Esri). You've still got to check its results manually.Son of a Beach– Son of a Beach2017年03月29日 05:11:03 +00:00Commented Mar 29, 2017 at 5:11
While this doesn't let us restrict values to the domains, it does let us query against the domains for errors, and fix them after-the-fact:
In the Select by Attributes
window:
NUMBER_FIELD NOT IN (
SELECT
CODE
FROM
USER1.A_NUMBERS_DOMAIN_VW)
More info in my question here: Select rows where column values don't match the domain
Explore related questions
See similar questions with these tags.