I have two types of addresses:
Unit 5, 123 Fake Street Drive
123 Fake St Dr, Unit 5
How can I use Python to compare the two addresses by the numbers?
For example:
Unit 5, 123 Fake Street Drive -> [5,123]
123 Fake St Dr, Unit 5 -> [123,5]
TRUE
123 Fake Street Drive -> [123]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE
Unit 5, 155 Fake Street Drive -> [155,5]
123 Fake St Dr, Unit 5 -> [123,5]
FALSE
All I have now is:
if bool(set([int(s) for s in address.split() if s.isdigit()]) & set([int(s) for s in address2.split() if s.isdigit()])):
I want to find out if one list of numbers is the same as another list of numbers regardless of the order.
2 Answers 2
You just have to build set
s of extracted numbers and compare them with ==
. set
supports equality very well.
Another problem here is that str.split()
won't work well for instance for 5,
. So isdigit()
fails and your sets aren't equal.
Let me suggest re.findall
to find digits, put them into sets and compare, using \d+
or \b\d+\b
to avoid digits inside words (like N2P
for instance)
import re
address="Unit 5, 123 Fake Street Drive"
address2 = "123 Fake St Dr, Unit 5"
pattern = r"\b\d+\b"
print(set(re.findall(pattern,address))==set(re.findall(pattern,address2)))
This yields True
, whereas if I change/add/remove one number from one of the lists above, I get False
As suggested in comments, that above fails if there are repeated numbers in one string, and not in the other: we could have a false positive since set
clobbers duplicates.
If that's an issue, then replacing set
with collections.Counter
fixes that
collections.Counter(re.findall(pattern,address))==collections.Counter(re.findall(pattern,address2))
works too, Counter
is a dictionary and compares to other dictionaries.
3 Comments
{5} == {5,5}
; return
s True
(it should return False
here), you'll need a Counter
or multisetCounter
, fits right in place.I recommend that you use a sorted list, not a set. The set cannot distinguish between "Unit 1, 1 street X" and "1 Street Y" - but a sorted list will do.
==
, that should work.Unit 1, 2 Street
and2 A completely different street, unit 1
the output should beFalse
?"1, 2".split()
you'll get1,
: not all digits: fails.