I am taking an example scenario for my question. If I have a list of URLs :
url_list=["https:www.example.com/pag31/go","https:www.example.com/pag12/go","https:www.example.com/pag0/go"]
I want to replace the substring in between ".com/" and "go"
For Eg. new url should look like
['https:www.example.com/home/go','https:www.example.com/home/go','https:www.example.com/home/go']
I have tried slicing and replacing based on index but couldn't get the required result for the whole list.
Any help is really appreciated. Thanks in advance.
2 Answers 2
You can use regex sub() and list comprehension to apply your logic to every element of your list.
import re
url_list=["https:www.google.com/pag31/go","https:www.facebook.com/pag12/go","http:www.bing.com/pag0/go"]
pattern = r'(?<=com\/).*(?=\/go)'
result = [re.sub(pattern, 'home', url) for url in url_list]
This will match against any string where a value is found between com/ and /go. This will also ensure that we capture any website, regardless of http(s).
Output:
['https:www.google.com/home/go', 'https:www.facebook.com/home/go', 'http:www.bing.com/home/go']
Regex Explanation
The pattern r'(?<=com\/).*(?=\/go)' looks for the following:
(?<=com\/): Positive lookbehind to check if com/ prefixes our lookup
.*: Matches anything an infinite amount of times
(?=\/go): Positive look ahead to check if /go directly occurs after .*
This enables us to match any string between the positive checks. You can find a more in-depth explanation on the pattern here
Comments
You can try using regular expressions of python.
import re
re_url ="https:www.example.com/.*/go"
url = "https:www.example.com/home/go"
url_list_new= [re.sub(re_url,url,x) for x in url_list]
url_list_new
Output:
['https:www.example.com/home/go',
'https:www.example.com/home/go',
'https:www.example.com/home/go']
2 Comments
example the poster has listed. If any other URL other then example are in the list, or even a http url it will not match.