My Initial String consists of <span>
and some contents in between and a </span></span
, I would like to remove that piece(including span and contents inside it and /span) from my string , what should I do ?
Part of String that need to be Removed : "<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")
+14 variable strings+</span></span
I would like to remove that whole piece mentioned above
3 Answers 3
import re
txt = 'Iam a good boy <span>some blahblahblah </span</span and my name is john'
print(re.sub(r'<span>.*</span</span ', '', txt))
Prints:
Iam a good boy and my name is john
to the updated question
import re
txt = """<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")+14 variable strings+</span></span"""
print(re.sub(r'<span [^<>]*?</span>?</span', '', txt))
# prints: <span class="_5mfr">
-
Did this work with the string you updated in your question? I don't think the accepted answer solves for the current string.Navaneeth Sen– Navaneeth Sen2021年09月17日 06:26:34 +00:00Commented Sep 17, 2021 at 6:26
-
Yeah,This Didn't Worked,I was using Except block with the program and I thought that problem had solvedSimpleGuy_– SimpleGuy_2021年09月17日 06:36:30 +00:00Commented Sep 17, 2021 at 6:36
-
Did you try this one stackoverflow.com/a/69218568/449378 ?Navaneeth Sen– Navaneeth Sen2021年09月17日 07:59:40 +00:00Commented Sep 17, 2021 at 7:59
Use BeautifulSoup
:
from bs4 import BeautifulSoup
soup = BeautifulSoup(string, 'html.parser')
for x in soup.findAll('span'):
x.replace_with('')
print(soup.string)
-
1It's not by me,Bro It's somebody ElseSimpleGuy_– SimpleGuy_2021年09月17日 06:06:52 +00:00Commented Sep 17, 2021 at 6:06
-
@SimpleGuy_ Ah! yeap! Hope it works for you!U13-Forward– U13-Forward2021年09月17日 06:07:14 +00:00Commented Sep 17, 2021 at 6:07
-
Anyway,I've changed my question a little bit so that my aim is clear to everyoneSimpleGuy_– SimpleGuy_2021年09月17日 06:07:32 +00:00Commented Sep 17, 2021 at 6:07
-
@SimpleGuy_ Added working code, this is much more consistentU13-Forward– U13-Forward2021年09月17日 06:17:35 +00:00Commented Sep 17, 2021 at 6:17
-
@SimpleGuy_ Great!U13-Forward– U13-Forward2021年09月17日 06:19:57 +00:00Commented Sep 17, 2021 at 6:19
You can replace everything found by the regex as shown below:
import re
regex = r"(<span.+?>)|(<\/span>)"
test_str = "<span class=\\\"_5mfr\\\"><span class=\\\"_6qdm\\\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\\\"static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/...\\\")'>© Dasamoolam Damu (Troll Malayalam)ഹൗ ക്രൂവൽ<span class=\\\"_5mfr\\\"><span class=\\\"_6qdm\\\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\\\"static.xx.fbcdn.net/images/emoji.php/v9/td7/1/16/...\\\")'></span></span></span></span>"
print(re.sub(regex, '', test_str))
s = s.replace('<span>','').replace('</span>','')
.>
in the</span
span
thing removed,but some part is till there like mentioned below