I am trying to query mailing address records that match physical address records, but the two fields sometimes have data in a different format. For example, the property's mailing address may be shown as '123 NW Elm Street', while the physical address may be shown as '123 Elm St'. Even if I can just query whether the address contains the same street name that would help me if I knew if 'Elm' was anywhere in the address. There is a field with just street name, with no number or prefix or suffix.
I am trying to do a definition query to select nearly matching records with wildcards with something like this:
"StreetAddress" LIKE "%StreetName%"
I have tried numerous variations with no luck, including '%'+"STreetName"+'%' etc.
Is there a "FIELD_A" INCLUDES "FIELD_B" query or a "FIELD_A" CONTAINS "FIELD_B" command?
-
Welcome to GIS SE! As a new user be sure to take the Tour to learn about our focussed Q&A format. What GIS software are you using?PolyGeo– PolyGeo ♦2016年12月14日 23:43:16 +00:00Commented Dec 14, 2016 at 23:43
-
Please also edit the question to specify the data source format, since the functions available are defined by the software in use and the capabilities of the data store.Vince– Vince2016年12月15日 00:28:37 +00:00Commented Dec 15, 2016 at 0:28
3 Answers 3
You need a nested query.
StreetAddress IN (SELECT StreetName FROM <Whatever you layer is named>)
e.g. StreetAddress IN (SELECT StreetName FROM House_Addresses)
- House_Addresses is the layer name in this example.
Also, if you have not heard of Fuzzy tables/relationships you should look into them. They may provide a solution.
Here's the tool for excell: https://www.microsoft.com/en-au/download/details.aspx?id=15011
Assuming you're using ArcGIS, you can make the comparison in field calculator (Python parser), and select on that:
1 if !StreetName! in !StreetAddress! else 0
In addition to what @phloem suggests, you can use python's difflib module with get_close_matches
or SequenceMatcher.ratio
methods. The former gives you the best matching entries within a set/iterable, say whole of a column, while the latter gives you a score by comparing inputted pair, i.e., values in two columns of a row. For example:
difflib.get_close_matches('123 NW Elm Street', ['123 Elm St'],1,0.7)[0]
will give you '123 Elm St'
, whereas
int(difflib.SequenceMatcher(None, '123 NW Elm Street', '123 Elm St').ratio()*100)
will compare source with target and yield a matching score, 74
.