python: comparing two strings

Asked 15 years, 1 month ago

Viewed 16k times

I would like to know if there is a library that will tell me approximately how similar two strings are

I am not looking for anything specific, but in this case:

a = 'alex is a buff dude'
b = 'a;exx is a buff dud'

we could say that b and a are approximately 90% similar.

Is there a library which can do this?

Improve this question

edited Aug 23, 2010 at 20:40

viraptor's user avatar

viraptor

34.3k13 gold badges114 silver badges204 bronze badges

asked Aug 23, 2010 at 20:29

Alex Gordon's user avatar

Alex Gordon Alex Gordon

61.3k307 gold badges708 silver badges1.1k bronze badges

possible duplicate of Text difference algorithm

tzot
– tzot

2010年09月20日 14:09:48 +00:00
Commented Sep 20, 2010 at 14:09

Add a comment |

4 Answers 4

Sorted by: Reset to default

import difflib
>>> a = 'alex is a buff dude'
>>> b = 'a;exx is a buff dud'
>>> difflib.SequenceMatcher(None, a, b).ratio()
0.89473684210526316

Improve this answer

edited Aug 23, 2010 at 21:12

answered Aug 23, 2010 at 21:06

killown's user avatar

killown killown

4,9373 gold badges27 silver badges30 bronze badges

Comments

http://en.wikipedia.org/wiki/Levenshtein_distance

There are a few libraries on pypi, but be aware that this is expensive, especially for longer strings.

You may also want to check out python's difflib: http://docs.python.org/library/difflib.html

Improve this answer

answered Aug 23, 2010 at 20:35

Radomir Dopieralski's user avatar

Radomir Dopieralski Radomir Dopieralski

2,59317 silver badges14 bronze badges

2 Comments

John Machin

John Machin Over a year ago

expensive? difflib is a monster compared to semi-decent Levenshtein implementations.

2010年08月23日T23:26:34.623Z+00:00

Radomir Dopieralski

Radomir Dopieralski Over a year ago

It wasn't my intention to suggest that difflib is less expensive -- it just does a similar, albeit a little different, thing.

2010年08月24日T08:37:04.927Z+00:00

Look for Levenshtein algorithm for comparing strings. Here's a random implementation found via google: http://hetland.org/coding/python/levenshtein.py

Improve this answer

answered Aug 23, 2010 at 20:34

viraptor's user avatar

viraptor viraptor

34.3k13 gold badges114 silver badges204 bronze badges

Comments

Other way is to use longest common substring. Here a implementation in Daniweb with my lcs implementation (this is also defined in difflib)

Here is simple length only version with list as data structure:

def longest_common_sequence(a,b):
 n1=len(a)
 n2=len(b)
 previous=[]
 for i in range(n2):
 previous.append(0)
 over = 0
 for ch1 in a:
 left = corner = 0
 for ch2 in b:
 over = previous.pop(0)
 if ch1 == ch2:
 this = corner + 1
 else:
 this = over if over >= left else left
 previous.append(this)
 left, corner = this, over
 return 200.0*previous.pop()/(n1+n2)

Here is my second version which actualy gives the common string with deque data structure (also with the example data use case):

from collections import deque
a = 'alex is a buff dude'
b = 'a;exx is a buff dud'
def lcs_tuple(a,b):
 n1=len(a)
 n2=len(b)
 previous=deque()
 for i in range(n2):
 previous.append((0,''))
 over = (0,'')
 for i in range(n1):
 left = corner = (0,'')
 for j in range(n2):
 over = previous.popleft()
 if a[i] == b[j]:
 this = corner[0] + 1, corner[1]+a[i]
 else:
 this = max(over,left)
 previous.append(this)
 left, corner = this, over
 return 200.0*this[0]/(n1+n2),this[1]
print lcs_tuple(a,b)
""" Output:
(89.47368421052632, 'aex is a buff dud')
"""

Improve this answer

edited Aug 24, 2010 at 7:23

answered Aug 23, 2010 at 21:12

Tony Veijalainen's user avatar

Tony Veijalainen Tony Veijalainen

5,56525 silver badges32 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

python: comparing two strings

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related