29
import re
str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str)
print str2.group()
current result=> error
expected => wwwqqqzzz

I want to extract the string wwwqqqzzz. How I do that?

Maybe there are a lot of dots, such as:

"whatever..s#[email protected].:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid"

In this case, I basically want the stuff bounded by // and /. How do I achieve that?

One additional question:

import re
str="xxx.yyy.xxx:80"
m = re.search(r"([^:]*)", str)
str2=m.group(0)
print str2
str2=m.group(1)
print str2

Seems that m.group(0) and m.group(1) are the same.

alex
7,59612 gold badges62 silver badges116 bronze badges
asked Nov 16, 2012 at 20:03
2
  • do you want dots to be removed from the final string? Commented Nov 16, 2012 at 20:06
  • yes, i just want purely characters [a-zA-Z]* between //and /, before '//' has bunch characters, also after '/' at the end, Commented Nov 16, 2012 at 20:08

5 Answers 5

44

match tries to match the entire string. Use search instead. The following pattern would then match your requirements:

m = re.search(r"//([^/]*)", str)
print m.group(1)

Basically, we are looking for /, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.

In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:

m = re.search(r"(?<=//)[^/]*", str)
print m.group()

Lookarounds are not included in the actual match, hence the desired result.

This (or any other reasonable regex solution) will not remove the .s immediately. But this can easily be done in a second step:

m = re.search(r"(?<=//)[^/]*", str)
host = m.group()
cleanedHost = host.replace(".", "")

That does not even require regular expressions.

Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info into wwwregularexpressionsinfo) then you are better off using the regex version of replace:

cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
answered Nov 16, 2012 at 20:07
Sign up to request clarification or add additional context in comments.

5 Comments

sorry, I just saw that requirement. simply run another step: resultstr.replace(r".", ""). Will include that in a second.
"there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind" - Do you have anything to back this up? Both my intuition and timeit tell me that lookbehinds are slower then a simple group capture.
what does it mean by group(0) ,group(1), seems group(0) result same as group(1) in my case, added on question
@runcode group(0) gives you the complete match. group(1) gives you what was matched with everything inside the first set of parentheses. in your example you wrapped your whole pattern in parentheses. hence, both calls give the same result.
@lqc, I don't have any source at hand no. I believe it mostly applies for more complex patterns, where things would be captured multiple times. after all, the engine needs to keep track of what was matched since it entered a capturing group. In any specific case, the lookbehind might be less efficient, I admit.
3
print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))

See this demo .

answered Nov 16, 2012 at 20:19

Comments

2
output=re.findall("(?<=//)\w+.*(?=/)",str)
final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])
print final
ohmu
19.8k44 gold badges114 silver badges151 bronze badges
answered Aug 14, 2014 at 15:59

Comments

0
import re
str_1="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
str2=re.match(".*//([a-zA-Z.]*)",str_1)
print(str2.group(1).replace('.',''))
RavinderSingh13
135k14 gold badges61 silver badges100 bronze badges
answered May 17, 2021 at 13:06

1 Comment

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.
-1
import re
str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
re.findall('//([a-z.]*)', str)
BDL
22.4k33 gold badges57 silver badges65 bronze badges
answered Jan 16, 2017 at 10:58

1 Comment

Although the code might solve the problem, it is not an answer on its own. One should always add an explanation to it.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.