2

I am trying to query mailing address records that match physical address records, but the two fields sometimes have data in a different format. For example, the property's mailing address may be shown as '123 NW Elm Street', while the physical address may be shown as '123 Elm St'. Even if I can just query whether the address contains the same street name that would help me if I knew if 'Elm' was anywhere in the address. There is a field with just street name, with no number or prefix or suffix.

I am trying to do a definition query to select nearly matching records with wildcards with something like this:

"StreetAddress" LIKE "%StreetName%"

I have tried numerous variations with no luck, including '%'+"STreetName"+'%' etc.

Is there a "FIELD_A" INCLUDES "FIELD_B" query or a "FIELD_A" CONTAINS "FIELD_B" command?

PolyGeo
65.5k29 gold badges115 silver badges349 bronze badges
asked Dec 14, 2016 at 23:41
2
  • Welcome to GIS SE! As a new user be sure to take the Tour to learn about our focussed Q&A format. What GIS software are you using? Commented Dec 14, 2016 at 23:43
  • Please also edit the question to specify the data source format, since the functions available are defined by the software in use and the capabilities of the data store. Commented Dec 15, 2016 at 0:28

3 Answers 3

2

You need a nested query.

StreetAddress IN (SELECT StreetName FROM <Whatever you layer is named>)

e.g. StreetAddress IN (SELECT StreetName FROM House_Addresses) - House_Addresses is the layer name in this example.

Also, if you have not heard of Fuzzy tables/relationships you should look into them. They may provide a solution.

Here's the tool for excell: https://www.microsoft.com/en-au/download/details.aspx?id=15011

Python: https://pypi.python.org/pypi/fuzzywuzzy

Video: https://www.youtube.com/watch?v=3v-qxcjZbyo

answered Dec 15, 2016 at 0:01
1

Assuming you're using ArcGIS, you can make the comparison in field calculator (Python parser), and select on that:

1 if !StreetName! in !StreetAddress! else 0
answered Dec 15, 2016 at 0:38
1

In addition to what @phloem suggests, you can use python's difflib module with get_close_matches or SequenceMatcher.ratio methods. The former gives you the best matching entries within a set/iterable, say whole of a column, while the latter gives you a score by comparing inputted pair, i.e., values in two columns of a row. For example:

difflib.get_close_matches('123 NW Elm Street', ['123 Elm St'],1,0.7)[0] will give you '123 Elm St', whereas

int(difflib.SequenceMatcher(None, '123 NW Elm Street', '123 Elm St').ratio()*100) will compare source with target and yield a matching score, 74.

answered Dec 15, 2016 at 5:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.