172

Does anyone know of a regular expression I could use to find URLs within a string? I've found a lot of regular expressions on Google for determining if an entire string is a URL but I need to be able to search an entire string for URLs. For example, I would like to be able to find www.google.com and http://yahoo.com in the following string:

Hello www.google.com World http://yahoo.com

I am not looking for specific URLs in the string. I am looking for ALL of the URLs in the string which is why I need a regular expression.

AeroX
3,4732 gold badges28 silver badges40 bronze badges
asked May 17, 2011 at 22:51
2
  • For PHP: preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match); from stackoverflow.com/q/910912/1066234 Commented Aug 6, 2021 at 5:03
  • you're example missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:50

35 Answers 35

1
2
311

This is the one I use

(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])

Works for me, should work for you too.

Adam
6,23139 gold badges130 silver badges224 bronze badges
answered May 18, 2011 at 8:37
15
  • 12
    Don't forget to escape the forward slashes. Commented Jul 8, 2017 at 6:53
  • 4
    It's 2017, and unicode domain names are all over the place. \w may not match international symbols (depends on regex engine), the range is needed instead: a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF. Commented Aug 29, 2017 at 13:34
  • 6
    This is fine for general purpose, but there are many cases that it doesn't catch. This enforces that your links are prefixed with a protocol. If choose to ignore protocols, endings of emails are accepted as it is the case with [email protected]. Commented Sep 7, 2017 at 8:09
  • 8
    shouldn't [\w_-] be [\w-]? because \w matches _ already. per mozilla docs Commented Nov 4, 2017 at 7:19
  • 12
    Upvoted but This answer does not work what the question is asking www.yahoo.com. """(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?""".r.findAllIn("www.google.com").toList . ALSO LACKS EXPLANATION for answer Commented Nov 11, 2017 at 23:58
69

Guess no regex is perfect for this use. I found a pretty solid one here

(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])

Some differences / advantages compared to the other ones posted here:

  • It does not match email addresses
  • It does match localhost:12345
  • It won't detect something like moo.com without http or www

See here for examples

user30745941
answered Mar 26, 2015 at 21:08
3
  • 8
    it matches www.e This is not a valid url Commented Dec 20, 2016 at 22:46
  • 3
    The g option isn't valid in all regular expression implementations (e.g. Ruby's built-in implementation). Commented Jan 17, 2020 at 13:23
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:49
48
text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd&params2=kjhdkjshd
The code below catches all urls in text and returns urls in list."""
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-&?=%.]+', text)
print(urls)

Output:

[
 'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
 'www.google.com', 
 'facebook.com',
 'http://test.com/method?param=wasd',
 'http://test.com/method?param=wasd&params2=kjhdkjshd'
]
answered Feb 13, 2018 at 14:56
6
  • Kotlin val urlRegex = "(?:(?:https?|ftp):\\/\\/)?[\\w/\\-?=%.]+\\.[\\w/\\-?=%.]+" Commented Feb 27, 2019 at 6:32
  • 2
    Misses & parameters in the url. e.g. http://test.com/method?param=wasd&param2=wasd2 misses param2 Commented May 18, 2019 at 21:38
  • 1
    also lacks support for URLs with # Commented Dec 22, 2020 at 19:53
  • @TrophyGeek I think you just copied the regex from the first comment, and Akshay forgot to include the &. The right version would be: val urlRegex = "(?:(?:https?|ftp):\\/\\/)?[\\w/\\-?=%.]+\\.[\\w/\\-&?=%.]+" Commented Jan 21, 2022 at 17:16
  • 1
    This also thinks hello... is a URL Commented Mar 23, 2022 at 8:12
19

Wrote one up myself:

let regex = /([\w+]+\:\/\/)?([\w\d-]+\.)*[\w-]+[\.\:]\w+([\/\?\=\&\#\.]?[\w-]+)*\/?/gm

It works on ALL of the following domains:

https://www.facebook.com
https://app-1.number123.com
http://facebook.com
ftp://facebook.com
http://localhost:3000
localhost:3000/
unitedkingdomurl.co.uk
this.is.a.url.com/its/still=going?wow
shop.facebook.org
app.number123.com
app1.number123.com
app-1.numbEr123.com
app.dashes-dash.com
www.facebook.com
facebook.com
fb.com/hello_123
fb.com/hel-lo
fb.com/hello/goodbye
fb.com/hello/goodbye?okay
fb.com/hello/goodbye?okay=alright
Hello www.google.com World http://yahoo.com
https://www.google.com.tr/admin/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
http://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
ftp://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
drive.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://www.example.pl
http://www.example.com
www.example.pl
example.com
http://blog.example.com
http://www.example.com/product
http://www.example.com/products?id=1&page=2
http://www.example.com#up
http://255.255.255.255
255.255.255.255
shop.facebook.org/derf.html

You can see how it performs here on regex101 and adjust as needed

answered Jul 21, 2020 at 20:43
6
  • Your regex missed this when I tested it. It only caught part of the URL: shop.facebook.org/derf.html Commented Feb 23, 2021 at 7:50
  • 1
    @DavidRector Thanks! You are absolutely correct. I have updated the regex string and regex101 url based on your feedback. Added a \. at the end of the second last pair of square brackets [ ] Commented Feb 24, 2021 at 2:39
  • 7
    This also matches any string of the form alphanum_char.alphanum_char, for example, a.r, b.4, 7.e, etc. These aren't valid URLs. Commented Jun 17, 2021 at 23:24
  • 2
    Unfortunately this also matches times - 09:00 Commented Jun 21, 2021 at 15:26
  • your regex missed the case of the protocol is not specified as //www.leboncoin.fr Commented Jan 5, 2024 at 10:43
12

None of the solutions provided here solved the problems/use-cases I had.

What I have provided here, is the best I have found/made so far. I will update it when I find new edge-cases that it doesn't handle.

\b
 #Word cannot begin with special characters
 (?<![@.,%&#-])
 #Protocols are optional, but take them with us if they are present
 (?<protocol>\w{2,10}:\/\/)?
 #Domains have to be of a length of 1 chars or greater
 ((?:\w|\&\#\d{1,5};)[.-]?)+
 #The domain ending has to be between 2 to 15 characters
 (\.([a-z]{2,15})
 #If no domain ending we want a port, only if a protocol is specified
 |(?(protocol)(?:\:\d{1,6})|(?!)))
\b
#Word cannot end with @ (made to catch emails)
(?![@])
#We accept any number of slugs, given we have a char after the slash
(\/)?
#If we have endings like ?=fds include the ending
(?:([\w\d\?\-=#:%@&.;])+(?:\/(?:([\w\d\?\-=#:%@&;.])+))*)?
#The last char cannot be one of these symbols .,?!,- exclude these
(?<![.,?!-])
answered Dec 20, 2016 at 12:21
2
  • 1
    Is there any way to make this javascript friendly? As named capturing groups are not fully functional there, so the protocol value check does not validate. Commented Feb 7, 2020 at 7:26
  • 2
    @einord, I know this is way late, but you can just remove the named portion of the capturing group and it works fine in JS. /\b(?<![@.,%&#-])(\w{2,10}:\/\/)?((?:\w|&#\d{1,5};)[.-]?)+(\.([a-z]{2,15})|((?::\d{1,6})|(?!)))\b(?![@])(\/)?(?:([\w\d?\-=#:%@&.;])+(?:\/(?:([\w\d?\-=#:%@&;.])+))*)?(?<![.,?!-])/g Commented Oct 24, 2024 at 18:04
9

I think this regex (regular expression) pattern handle precisely what you want

(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?

and this is an snippet example to extract Urls:

// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string to filter goes here.";
// Check if there is a url in the text
preg_match_all($reg_exUrl, $text, $url,$matches);
var_dump($matches);
user22522657
answered Dec 10, 2017 at 5:43
1
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:48
8

If you have to be strict on selecting links, I would go for:

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»""‘’]))

For more infos, read this:

An Improved Liberal, Accurate Regex Pattern for Matching URLs

answered Nov 12, 2017 at 12:29
1
7

All of the above answers are not match for Unicode characters in URL, for example: http://google.com?query=đức+filan+đã+search

For the solution, this one should work:

(ftp:\/\/|www\.|https?:\/\/){1}[a-zA-Z0-9u00a1-\uffff0-]{2,}\.[a-zA-Z0-9u00a1-\uffff0-]{2,}(\S*)
answered Jun 22, 2016 at 6:33
6
  • 2
    Unicode characters were forbidden as per the RFC 1738 on URLs (faqs.org/rfcs/rfc1738.html). They would have to be percent encoded to be standards compliant - although I think it may have changed more recently - worth reading w3.org/International/articles/idn-and-iri Commented Sep 7, 2016 at 9:41
  • @mrswadge I just cover the cases. We're not sure if all people care about the standard. Thank you for your info. Commented Sep 12, 2016 at 2:54
  • 1
    Only this one worked perfectly for me having urls such as "example.com" "www.exmaple.com" "example.com" "example.co.in" "exmaple.com/?q='me'" Commented Jan 30, 2020 at 7:43
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:48
  • @Adrien Parrochia I don't think it's valid, isn't it? Commented Jan 8, 2024 at 10:12
6

I found this which covers most sample links, including subdirectory parts.

Regex is:

(?:(?:https?|ftp):\/\/|\b(?:[a-z\d]+\.))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]+\)))?\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))?\)|[^\s`!()\[\]{};:'".,<>?«»""‘’]))?
Klaus Gütter
12.2k7 gold badges35 silver badges43 bronze badges
answered Jan 8, 2019 at 6:37
1
  • When I tried this, the ends of sentences were marked as a match. In the above sentence, the last word "match" and the period were matched. Commented Feb 23, 2021 at 7:55
6

I used the regular expression below to find the url in a string:

(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?
paisanco
4,3306 gold badges32 silver badges46 bronze badges
answered Jan 19, 2015 at 10:47
2
  • 3
    [a-zA-Z]{2,3} is really poor for matching TLD, see official list: data.iana.org/TLD/tlds-alpha-by-domain.txt Commented Jan 19, 2015 at 11:04
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:48
5

IMPROVED

Detects Urls like these:

Regex:

/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$/gm

Please note that working with URLs and domain validation can be complex, and regex alone may not cover all edge cases. For more comprehensive URL validation, it's recommended to use specialized libraries or built-in URL validation functions provided by your programming language or framework.

answered Apr 27, 2019 at 14:25
2
  • This will detect some expression such as "A.D." or "B.C." as urls though. Commented Dec 17, 2023 at 10:20
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:47
4

Short and simple. I have not tested in javascript code yet but It looks it will work:

((http|ftp|https):\/\/)?(([\w.-]*)\.([\w]*))

Code on regex101.com

Code preview

answered Nov 12, 2017 at 12:18
4
  • 1
    I liked your regex because it was exactly what I was looking for: I needed to identify and strip URLs out of some text, not validate. Worked in rails. Commented Aug 16, 2019 at 5:17
  • @Dagmar I am glad to hear that :) Commented Aug 17, 2019 at 2:18
  • 1
    you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:46
  • You are right @AdrienParrochia. When I posted this, I didn't check for it. Maybe I can do it later. Commented Jan 6, 2024 at 22:53
4

Using the regex provided by @JustinLevene did not have the proper escape sequences on the back-slashes. Updated to now be correct, and added in condition to match the FTP protocol as well: Will match to all urls with or without protocols, and with out without "www."

Code: ^((http|ftp|https):\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?

Example: https://regex101.com/r/uQ9aL4/65

answered Oct 16, 2018 at 17:24
4

Here a little bit more optimized regexp:

(?:(?:(https?|ftp|file):\/\/|www\.|ftp\.)|([\w\-_]+(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:\/~\+#]*[A-Z\-\@?^=%&amp;\/~\+#]){2,6}?

Here is test with data: https://regex101.com/r/sFzzpY/6

enter image description here

answered Mar 12, 2020 at 13:45
2
  • 1
    Your test shows some of your URL's are not being detected fully. This entire string should be marked as a match: stackoverflow.com/questions/60619430/… Commented Feb 23, 2021 at 7:53
  • you're regex missed the case of the protocol is not set : //www.google.fr Commented Jan 5, 2024 at 10:47
4

Wasn't easy one, but managed to compose a short and efficient regex pattern to match URLs, also captures email addresses. Hope that works for you.

((\bhttp(|s)|ftp|file):\/\/)|\bwww[ ]*\.[ ]*([a-zA-Z0-9%:?#@\/=_-]*)|([a-zA-Z0-9%:.?#@\/=_-]*)[ ]*\.[ ]*(com|eu|org|co|uk|pdf|etc)

This can be tested here regexr.com

answered Nov 30, 2022 at 9:55
1
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:47
3

If you have the url pattern, you should be able to search for it in your string. Just make sure that the pattern doesnt have ^ and $ marking beginning and end of the url string. So if P is the pattern for URL, look for matches for P.

answered May 17, 2011 at 22:54
4
  • This is the regex I found that verifies if an entire string is a URL. I took out the ^ at the beggining and the $ at the end like you said and it still didn't work. What am I doing wrong? ^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?,円\'/\\\+&amp;%\$#\=~])*[^\.,円\)\(\s]$ Commented May 17, 2011 at 23:19
  • It might help if you showed what language you're using. Either way, be sure to check http://regexpal.com/; there you can test different expressions against your string until you get it right. Commented May 17, 2011 at 23:37
  • @user758263 - do you really need such a complex regex for the url? Depends on what the possible urls you might actually find. Also see gskinner.com/RegExr for trying out regex. They also have hundreds of samples on the right under the Community tab including ones for urls Commented May 18, 2011 at 0:06
  • I'm trying to look for all possible URLs and I'm using C++. Thanks for the links entonio and manojlds. The gskinner site was especially helpful since it had samples. Commented May 18, 2011 at 15:11
3

A probably too simplistic, but working method might be:

[localhost|http|https|ftp|file]+://[\w\S(\.|:|/)]+

I tested it on Python and as long as the string parsing contains a space before and after and none in the url (which I have never seen before) it should be fine.

Here is an online ide demonstrating it

However here are some benefits of using it:

  • It recognises file: and localhost as well as ip addresses
  • It will never match without them
  • It does not mind unusual characters such as # or - (see url of this post)
answered Feb 6, 2018 at 19:52
2
  • you're regex missed the case of the protocol is not set //www.google.fr Commented Jan 5, 2024 at 10:47
  • [localhost|http|https|ftp|file]+ is a character class. I guess you wanted to use a group: (?:localhost|http|https|ftp|file) further \S non white space already includes \w,.,:,/ and even (,). You can as well use \b(?:https?|localhost|ftp|file)://\S+, it would match at least what probably was meant to match. Commented Jun 4, 2024 at 18:10
3

I liked Stefan Henze 's solution but it would pick up 34.56. Its too general and I have unparsed html. There are 4 anchors for a url;

www ,

http:\ (and co) ,

. followed by letters and then / ,

or letters . and one of these: https://ftp.isc.org/www/survey/reports/current/bynum.txt .

I used lots of info from this thread. Thank you all.

"(((((http|ftp|https|gopher|telnet|file|localhost):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]{2,200}(?:(?:\\.[\\w_-]+)*))((\\.[\\w_-]+\\/([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(\\.((org|com|net|edu|gov|mil|int|arpa|biz|info|unknown|one|ninja|network|host|coop|tech)|(jp|br|it|cn|mx|ar|nl|pl|ru|tr|tw|za|be|uk|eg|es|fi|pt|th|nz|cz|hu|gr|dk|il|sg|uy|lt|ua|ie|ir|ve|kz|ec|rs|sk|py|bg|hk|eu|ee|md|is|my|lv|gt|pk|ni|by|ae|kr|su|vn|cy|am|ke))))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"

Above solves just about everything except a string like "eurls:www.google.com,facebook.com,http://test.com/", which it returns as a single string. Tbh idk why I added gopher etc. Proof R code

if(T){
 wierdurl<-vector()
 wierdurl[1]<-"https://JP納豆.例.jp/dir1/納豆 "
 wierdurl[2]<-"xn--jp-cd2fp15c.xn--fsq.jp "
 wierdurl[3]<-"http://52.221.161.242/2018/11/23/biofourmis-collab"
 wierdurl[4]<-"https://12000.org/ "
 wierdurl[5]<-" https://vg-1.com/?page_id=1002 "
 wierdurl[6]<-"https://3dnews.ru/822878"
 wierdurl[7]<-"The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
 Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd
 The code below catches all urls in text and returns urls in list. "
 wierdurl[8]<-"Thelinkofthisquestion:https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
 Alsotherearesomeurls:www.google.com,facebook.com,http://test.com/method?param=wasd
 Thecodebelowcatchesallurlsintextandreturnsurlsinlist. "
 wierdurl[9]<-"Thelinkofthisquestion:https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-stringAlsotherearesomeurlsZwww.google.com,facebook.com,http://test.com/method?param=wasdThecodebelowcatchesallurlsintextandreturnsurlsinlist."
 wierdurl[10]<-"1facebook.com/1res"
 wierdurl[11]<-"1facebook.com/1res/wat.txt"
 wierdurl[12]<-"www.e "
 wierdurl[13]<-"is this the file.txt i need"
 wierdurl[14]<-"xn--jp-cd2fp15c.xn--fsq.jpinspiredby "
 wierdurl[15]<-"[xn--jp-cd2fp15c.xn--fsq.jp/inspiredby "
 wierdurl[16]<-"xnto--jpto-cd2fp15c.xnto--fsq.jpinspiredby "
 wierdurl[17]<-"fsety--fwdvg-gertu56.ffuoiw--ffwsx.3dinspiredby "
 wierdurl[18]<-"://3dnews.ru/822878 "
 wierdurl[19]<-" http://mywebsite.com/msn.co.uk "
 wierdurl[20]<-" 2.0http://www.abe.hip "
 wierdurl[21]<-"www.abe.hip"
 wierdurl[22]<-"hardware/software/data"
 regexstring<-vector()
 regexstring[2]<-"(http|ftp|https)://([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
 regexstring[3]<-"/(?:(?:https?|ftp|file):\\/\\/|www\\.|ftp\\.)(?:\\([-A-Z0-9+&@#\\/%=~_|$?!:,.]*\\)|[-A-Z0-9+&@#\\/%=~_|$?!:,.])*(?:\\([-A-Z0-9+&@#\\/%=~_|$?!:,.]*\\)|[A-Z0-9+&@#\\/%=~_|$])/igm"
 regexstring[4]<-"[a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]?"
 regexstring[5]<-"((http|ftp|https)\\:\\/\\/)?([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
 regexstring[6]<-"((http|ftp|https):\\/\\/)?([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?"
 regexstring[7]<-"(http|ftp|https)(:\\/\\/)([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:/~+#-]*[\\w@?^=%&/~+#-])?"
 regexstring[8]<-"(?:(?:https?|ftp|file):\\/\\/|www\\.|ftp\\.)(?:\\([-A-Z0-9+&@#/%=~_|$?!:,.]*\\)|[-A-Z0-9+&@#/%=~_|$?!:,.])*(?:\\([-A-Z0-9+&@#/%=~_|$?!:,.]*\\)|[A-Z0-9+&@#/%=~_|$])"
 regexstring[10]<-"((http[s]?|ftp):\\/)?\\/?([^:\\/\\s]+)((\\/\\w+)*\\/)([\\w\\-\\.]+[^#?\\s]+)(.*)?(#[\\w\\-]+)?"
 regexstring[12]<-"http[s:/]+[[:alnum:]./]+"
 regexstring[9]<-"http[s:/]+[[:alnum:]./]+" #in DLpages 230
 regexstring[1]<-"[[:alnum:]-]+?[.][:alnum:]+?(?=[/ :])" #in link_graphs 50
 regexstring[13]<-"^(?!mailto:)(?:(?:http|https|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[0-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))|localhost)(?::\\d{2,5})?(?:(/|\\?|#)[^\\s]*)?$"
 regexstring[14]<-"(((((http|ftp|https):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]+(?:(?:\\.[\\w_-]+)*))((\\.((org|com|net|edu|gov|mil|int)|(([:alpha:]{2})(?=[, ]))))|([\\/]([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"
 regexstring[15]<-"(((((http|ftp|https|gopher|telnet|file|localhost):\\/\\/)|(www\\.)|(xn--)){1}([\\w_-]+(?:(?:\\.[\\w_-]+)+))([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(([\\w_-]{2,200}(?:(?:\\.[\\w_-]+)*))((\\.[\\w_-]+\\/([\\w.,@?^=%&:\\/~+#-]*[\\w@?^=%&\\/~+#-])?)|(\\.((org|com|net|edu|gov|mil|int|arpa|biz|info|unknown|one|ninja|network|host|coop|tech)|(jp|br|it|cn|mx|ar|nl|pl|ru|tr|tw|za|be|uk|eg|es|fi|pt|th|nz|cz|hu|gr|dk|il|sg|uy|lt|ua|ie|ir|ve|kz|ec|rs|sk|py|bg|hk|eu|ee|md|is|my|lv|gt|pk|ni|by|ae|kr|su|vn|cy|am|ke))))))(?!(((ttp|tp|ttps):\\/\\/)|(ww\\.)|(n--)))"
 }
for(i in wierdurl){#c(7,22)
 for(c in regexstring[c(15)]) {
 print(paste(i,which(regexstring==c)))
 print(str_extract_all(i,c))
 }
}
answered Jul 3, 2020 at 17:53
3

I use this Regex:-

((\w+:\/\/\S+)|(\w+[\.:]\w+\S+))[^\s,\.]

It works fine for many URLs, including: http://google.com, https://dev-site.io:8080/home?val=1&count=100, www.regexr.com, localhost:8080/path, ...

Toby Speight
31.6k57 gold badges80 silver badges115 bronze badges
answered Feb 21, 2019 at 13:33
2

I have utilize c# Uri class and it works, well with IP Address, localhost

 public static bool CheckURLIsValid(string url)
 {
 Uri returnURL;
 return (Uri.TryCreate(url, UriKind.Absolute, out returnURL)
 && (returnURL.Scheme == Uri.UriSchemeHttp || returnURL.Scheme == Uri.UriSchemeHttps));
 }
answered Apr 30, 2020 at 11:43
2

This regex is perfectly working for me, should work for you too

(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?
answered Mar 9, 2021 at 16:31
0
1

This is a slight improvement on/adjustment to (depending on what you need) Rajeev's answer:

([\w\-_]+(?:(?:\.|\s*\[dot\]\s*[A-Z\-_]+)+))([A-Z\-\.,@?^=%&amp;:/~\+#]*[A-Z\-\@?^=%&amp;/~\+#]){2,6}?

See here for an example of what it does and does not match.

I got rid of the check for "http" etc as I wanted to catch url's without this. I added slightly to the regex to catch some obfuscated urls (i.e. where user's use [dot] instead of a "."). Finally I replaced "\w" with "A-Z" to and "{2,3}" to reduce false positives like v2.0 and "moo.0dd".

Any improvements on this welcome.

answered Jan 19, 2015 at 10:43
2
  • [a-zA-Z]{2,3}is really poor for matching TLD, see official list: data.iana.org/TLD/tlds-alpha-by-domain.txt. Also your regex matches _.........&&&&&& not sure it's a valid url. Commented Jan 19, 2015 at 11:06
  • Thanks for that JE SUIS CHAELIE, any suggestions for improvement (especially for the false positive)? Commented Jan 19, 2015 at 16:31
0

I use the logic of finding text between two dots or periods

the regex below works fine with python

(?<=\.)[^}]*(?=\.)
answered Aug 26, 2014 at 18:37
0

Matching a URL in a text should not be so complex

(?:(?:(?:ftp|http)[s]*:\/\/|www\.)[^\.]+\.[^ \n]+)

https://regex101.com/r/wewpP1/2

answered Nov 3, 2016 at 15:11
0
0

I used this

^(https?:\\/\\/([a-zA-z0-9]+)(\\.[a-zA-z0-9]+)(\\.[a-zA-z0-9\\/\\=\\-\\_\\?]+)?)$
Panciz
2,2642 gold badges31 silver badges57 bronze badges
answered Jan 10, 2018 at 16:36
0
(?:vnc|s3|ssh|scp|sftp|ftp|http|https)\:\/\/[\w\.]+(?:\:?\d{0,5})|(?:mailto|)\:[\w\.]+\@[\w\.]+

If you want an explanation of each part, try in regexr[.]com where you will get a great explanation of every character.

This is split by an "|" or "OR" because not all useable URI have "//" so this is where you can create a list of schemes as or conditions that you would be interested in matching.

answered Dec 17, 2019 at 23:29
0

How about this one?

(http:\/\/|ftp:\/\/|https:\/\/|www\.)([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?

It matches both in the question.

answered Apr 30, 2021 at 4:59
0

This slightly simpler version of GooDeeJAY's answer serves me well (and supports e.g. # and other characters at the expense of increasing 'false positives'):

import re
text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd&params2=kjhdkjshd#changed
The code below catches all urls in text and returns urls in list."""
regex = r"(?i)(https?://|www.|\w+\.)[^\s]+"
urls = [match.group() for match in re.finditer(regex, text)]
print(urls)

and outputs

[
'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 
'www.google.com,', 
'facebook.com,', 
'http://test.com/method?param=wasd,', 
'http://test.com/method?param=wasd&params2=kjhdkjshd#changed'
]
answered May 12, 2021 at 12:19
0

This expression also finds paths like: /path/text.html

(https?\:\/[^\"\'\n\<\>\;\)\s]*)|(www?\.[^\"\'\n\<\>\;\s]*)|([^\s\&\=\;,円\<\<\>\"\'\(\)]+\/[\w\/])([^\"\'\n\;\s]*)|((?<!\<)[\/]+[\w]+[^\'\"\s\<\>]*)
answered Sep 3, 2021 at 7:09
0
^(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?

This will verify the url link....

answered Aug 3, 2022 at 13:20
1
2

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.