1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Extract string before a given substring Python

Asked 5 years, 8 months ago

Viewed 2k times

Here is the sample text.

sample_text='Extract text before the last word'

Using string split method I can extract substring before 'word'

print(sample_text.split('word',1)[0])

I am extracting sample_text from a pdf document so there can be following possibilities.

sample_text='Extract text before the last w ord'
sample_text='Extract text before the last wo rd'
sample_text='Extract text before the last wor d'
sample_text='Extract text before the last wo r d'

Is there a simple way to take these possibilities into account and get the desired output?

Thanks in advance.

Improve this question

asked Apr 21, 2020 at 7:27

Abhishek Kulkarni's user avatar

Abhishek Kulkarni

6761 gold badge9 silver badges24 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default

You can use a regular expression that ignore space : In your example, with the word "word" that would be the regular expression :

"w\s*o\s*r\s*d"

Try to split each line in this way :

import re
sample_text='Extract text before the last w ord'
re_ignor_space = "w\s*o\s*r\s*d"
sample_text_splitted = re.split(re_ignor_space, sample_text)
desired_string = ''.join(sample_text_splitted[:-1])
print (desired_string)

If you do not need the last word just ignore it with slice :

desired_string = ''.join(sample_text_splitted[:-1])

Output :

Extract text before the last

Improve this answer

edited Apr 21, 2020 at 8:00

answered Apr 21, 2020 at 7:39

jossefaz's user avatar

jossefaz

4,0325 gold badges21 silver badges49 bronze badges

4 Comments

Abhishek Kulkarni

Abhishek Kulkarni Over a year ago

I don't need the last word. Expected output is:- 'Extract text before the last'

2020年04月21日T07:41:42.883Z+00:00

jossefaz

jossefaz Over a year ago

I added another line of code to get you the 'Extract text before the last' Check it out

2020年04月21日T07:46:58.75Z+00:00

Abhishek Kulkarni

Abhishek Kulkarni Over a year ago

Desired string still has 'w ord'

2020年04月21日T07:48:46.13Z+00:00

jossefaz

jossefaz Over a year ago

Okay i change the code and it should do the trick now...check it out

2020年04月21日T08:01:19.037Z+00:00

You can split by regex pattern if you want.


import re
pattern = 'w\d?o\d?r\d?d'
print(re.split(pattern, sample_text))

Outputs:

['Extract text before the last ', '']

Improve this answer

edited Apr 21, 2020 at 8:03

answered Apr 21, 2020 at 7:29

Bogdan Veliscu's user avatar

Bogdan Veliscu

6816 silver badges11 bronze badges

2 Comments

Abhishek Kulkarni

Abhishek Kulkarni Over a year ago

It is still printing the entire sample_text.

2020年04月21日T07:39:57.313Z+00:00

Bogdan Veliscu

Bogdan Veliscu Over a year ago

Sorry, I had a typo in the pattern, try it now.

2020年04月21日T08:04:05Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Extract string before a given substring Python

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related