1

I have two types of addresses:

Unit 5, 123 Fake Street Drive
123 Fake St Dr, Unit 5

How can I use Python to compare the two addresses by the numbers?

For example:

Unit 5, 123 Fake Street Drive -> [5,123]
123 Fake St Dr, Unit 5 -> [123,5]
TRUE
123 Fake Street Drive -> [123]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE
Unit 5, 155 Fake Street Drive -> [155,5]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE

All I have now is:

if bool(set([int(s) for s in address.split() if s.isdigit()]) & set([int(s) for s in address2.split() if s.isdigit()])):

I want to find out if one list of numbers is the same as another list of numbers regardless of the order.

asked Mar 6, 2017 at 15:18
3
  • just compare the sets with ==, that should work. Commented Mar 6, 2017 at 15:19
  • Just to make sure, but the if the addresses are Unit 1, 2 Street and 2 A completely different street, unit 1 the output should be False? Commented Mar 6, 2017 at 15:23
  • Try "1, 2".split() you'll get 1,: not all digits: fails. Commented Mar 6, 2017 at 15:23

2 Answers 2

3

You just have to build sets of extracted numbers and compare them with ==. set supports equality very well.

Another problem here is that str.split() won't work well for instance for 5,. So isdigit() fails and your sets aren't equal.

Let me suggest re.findall to find digits, put them into sets and compare, using \d+ or \b\d+\b to avoid digits inside words (like N2P for instance)

import re
address="Unit 5, 123 Fake Street Drive"
address2 = "123 Fake St Dr, Unit 5"
pattern = r"\b\d+\b"
print(set(re.findall(pattern,address))==set(re.findall(pattern,address2)))

This yields True, whereas if I change/add/remove one number from one of the lists above, I get False

As suggested in comments, that above fails if there are repeated numbers in one string, and not in the other: we could have a false positive since set clobbers duplicates.

If that's an issue, then replacing set with collections.Counter fixes that

collections.Counter(re.findall(pattern,address))==collections.Counter(re.findall(pattern,address2))

works too, Counter is a dictionary and compares to other dictionaries.

answered Mar 6, 2017 at 15:23

3 Comments

{5} == {5,5} ; returns True (it should return False here), you'll need a Counter or multiset
you're right! OP example doesn't cover that. Good point. Edited with Counter, fits right in place.
One more question, can I distinguish numbers that are not inside a word? For example, I don't want to capture numbers in postal codes such as "N2P"
0

I recommend that you use a sorted list, not a set. The set cannot distinguish between "Unit 1, 1 street X" and "1 Street Y" - but a sorted list will do.

answered Mar 6, 2017 at 15:37

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.