Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit ae9f4ab

Browse files
Merge pull request avinashkranjan#364 from vaishnavijha/master
script matching and regex matching from urls
2 parents 0feb2af + b3d9c52 commit ae9f4ab

File tree

4 files changed

+59
-0
lines changed

4 files changed

+59
-0
lines changed
File renamed without changes.

‎string_matching_scripts/Capture.PNG‎

134 KB
Loading[フレーム]
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
2+
#String matching can be useful for a variety of situations, for example, joining two tables by an athlete’s name when it is spelled or punctuated differently in both tables.
3+
#Installing FuzzyWuzzy
4+
5+
#Import
6+
import fuzzywuzzy
7+
from fuzzywuzzy import fuzz
8+
from fuzzywuzzy import process
9+
Str_A = 'FuzzyWuzzy is a lifesaver!'
10+
Str_B = 'fuzzy wuzzy is a LIFE SAVER.'
11+
ratio = fuzz.ratio(Str_A.lower(), Str_B.lower())
12+
print('Similarity score: {}'.format(ratio))
13+
#We used the ratio() function above to calculate the Levenshtein distance similarity ratio between the two strings (sequences). The similarity ratio percentage here is 93%. We can say the Str_B has a similarity of 93% to Str_A when both are lowercase.
14+
#Partial Ratio
15+
#FuzzyWuzzy also has more powerful functions to help with matching strings in more complex situations. The partial ratio() function allows us to perform substring matching. This works by taking the shortest string and matching it with all substrings that are of the same length.
16+
Str_A = 'Chicago, Illinois'
17+
Str_B = 'Chicago'
18+
ratio = fuzz.partial_ratio(Str_A.lower(), Str_B.lower())
19+
print('Similarity score: {}'.format(ratio))
20+
#Using the partial ratio() function above, we get a similarity ratio of 100. In the scenario of Chicago and Chicago, Illinois this can be helpful since both strings are referring to the same city. This function is also useful when matching names. For example, if one sequence was someone’s first and middle name, and the sequence you’re trying to match on is that person’s first, middle, and last name. The partial_ratio() function will return a 100% match since the person’s first and middle name are the same.
21+
#Token Sort Ratio
22+
# FuzzyWuzzy also has token functions that tokenize the strings, change capitals to lowercase, and remove punctuation. The token_sort_ratio() function sorts the strings alphabetically and then joins them together. Then, the fuzz.ratio() is calculated. This can come in handy when the strings you are comparing are the same in spelling but are not in the same order.
23+
Str_A = 'Gunner William Kline'
24+
Str_B = 'Kline, Gunner William'
25+
ratio = fuzz.token_sort_ratio(Str_A, Str_B)
26+
print('Similarity score: {}'.format(ratio))
27+
#token_set_ratio()
28+
#The token_set_ratio() function is similar to the token_sort_ratio() function above, except it takes out the common tokens before calculating the fuzz.ratio() between the new strings. This function is the most helpful when applied to a set of strings with a significant difference in lengths.
29+
Str_A = 'The 3000 meter steeplechase winner, Soufiane El Bakkali'
30+
Str_B = 'Soufiane El Bakkali'
31+
ratio = fuzz.token_set_ratio(Str_A, Str_B)
32+
print('Similarity score: {}'.format(ratio))
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
string matching with the help of fuzzywuzzy
2+
3+
Short description of package/script
4+
5+
Fuzzy string matching is the process of finding strings that match a given pattern. Basically it uses Levenshtein Distance to calculate the differences between sequences.
6+
7+
## Setup instructions
8+
install python-Levenshtein and fuzzywuzzy
9+
pip install fuzzywuzzy[speedup]
10+
11+
12+
## Detailed explanation of script, if needed
13+
14+
Of course, a big problem with most corners of the internet is labeling. One of our most consistently frustrating issues is trying to figure out whether two ticket listings are for the same real-life event (that is, without enlisting the help of our army of interns).To pick an example completely at random,
15+
Bollywood has a show running in India called "Zarkana". When we scour the web to find tickets for sale, mostly those tickets are identified by a title, date, time, and venue.
16+
For example we search for a given show or movie on internet and we get a number of results with different names but represeting the same show.our human brain can easily comprehend that
17+
all are same but we do want to do this programmatically and hence came the concept of fuzzywuzzy library in python which calculates the similarity of two or more sentences .
18+
If we have far too many events (over 60,000) to be able to just throw us at the problem. So we want to do this programmatically, but we also want our programmatic results to pass the "human brain" test, and make sense to normal users.
19+
So basically we want to have an efficioent way of string matching script.
20+
21+
## Output
22+
output in this file:
23+
string_matching_scripts\Capture.PNG
24+
25+
## Author(s)
26+
27+
vaishnavi jha

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /