Here is the sample text.
sample_text='Extract text before the last word'
Using string split method I can extract substring before 'word'
print(sample_text.split('word',1)[0])
I am extracting sample_text from a pdf document so there can be following possibilities.
sample_text='Extract text before the last w ord'
sample_text='Extract text before the last wo rd'
sample_text='Extract text before the last wor d'
sample_text='Extract text before the last wo r d'
Is there a simple way to take these possibilities into account and get the desired output?
Thanks in advance.
2 Answers 2
You can use a regular expression that ignore space : In your example, with the word "word" that would be the regular expression :
"w\s*o\s*r\s*d"
Try to split each line in this way :
import re
sample_text='Extract text before the last w ord'
re_ignor_space = "w\s*o\s*r\s*d"
sample_text_splitted = re.split(re_ignor_space, sample_text)
desired_string = ''.join(sample_text_splitted[:-1])
print (desired_string)
If you do not need the last word just ignore it with slice :
desired_string = ''.join(sample_text_splitted[:-1])
Output :
Extract text before the last
4 Comments
You can split by regex pattern if you want.
import re
pattern = 'w\d?o\d?r\d?d'
print(re.split(pattern, sample_text))
Outputs:
['Extract text before the last ', '']