Python - Extract text from string

Question 1

What are the most efficient ways to extract text from a string? Are there some available functions or regex expressions, or some other way?

For example, my string is below and I want to extract the IDs as well as the ScreenNames, separately.

[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]

Thank you!

Edit: These are the text strings that I want to pull. I want them to be in a list.

Target_IDs = 1234567890, 233323490, 4459284 Target_ScreenNames = RandomNameHere, AnotherRandomName, YetAnotherName

Question 2

Is the text you want to parse the list at the bottom of your post?

Question 3

Use regex, extract each User(ID={matching expression}, ScreenName={matching experssion}) first, then do another extraction to get what you want.

Question 4

@Jakub, I revised my post. The text I want to parse are now at the bottom of the post. I am specifically looking to parse out 1234567890, 233323490, 4459284 and RandomNameHere, AnotherRandomName, YetAnotherName

Question 5

@btquanto I'm new to regular expressions as well, any pointers for types of expressions? I'm looking at a regex generator with a cheat sheet and what I've tried didn't work

Question 6

import re
str = '[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]'
print 'Target IDs = ' + ','.join( re.findall(r'ID=(\d+)', str) )
print 'Target ScreenNames = ' + ','.join( re.findall(r' ScreenName=(\w+)', str) )

Output : Target IDs = 1234567890,233323490,4459284 Target ScreenNames = RandomNameHere,AnotherRandomName,YetAnotherName

Question 7

Wow, that worked perfectly! Thank you! Now to learn what the code is actually doing :)

Question 8

It depends. Assuming that all your text comes in the form of

TagName = TagValue1, TagValue2, ...

You need just two calls to split.

tag, value_string = string.split('=')
values = value_string.split(',')

Remove the excess space (probably a couple of rstrip()/lstrip() calls will suffice) and you are done. Or you can take regex. They are slightly more powerful, but in this case I think it's a matter of personal taste.

If you want more complex syntax with nonterminals, terminals and all that, you'll need lex/yacc, which will require some background in parsers. A rather interesting thing to play with, but not something you'll want to use for storing program options and such.

Question 9

I'll look into this as well. Thank you.

Question 10

The regex I'd use would be:

(?:ID=|ScreenName=)+(\d+|[\w\d]+)

However, this assumes that ID is only digits (\d) and usernames are only letters or numbers ([\w\d]).

This regex (when combined with re.findall) would return a list of matches that could be iterated through and sorted in some fashion like so:

import re
s = "[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]"
pattern = re.compile(r'(?:ID=|ScreenName=)+(\d+|[\w\d]+)');
ids = []
names = [] 
for p in re.findall(pattern, s):
 if p.isnumeric():
 ids.append(p)
 else:
 names.append(p)
print(ids, names)

Question 11

Thanks for the regex. Usernames can have letters and numbers.

Question 12

Updated to allow for that possibility. [\d\w] matches either a letter or a number, and + allows for multiple matches.

Transhuman 3,5671 gold badge12 silver badges16 bronze badges · Accepted Answer · 2016-11-07 05:09:41Z

import re
str = '[User(ID=1234567890, ScreenName=RandomNameHere), User(ID=233323490, ScreenName=AnotherRandomName), User(ID=4459284, ScreenName=YetAnotherName)]'
print 'Target IDs = ' + ','.join( re.findall(r'ID=(\d+)', str) )
print 'Target ScreenNames = ' + ','.join( re.findall(r' ScreenName=(\w+)', str) )

Output : Target IDs = 1234567890,233323490,4459284 Target ScreenNames = RandomNameHere,AnotherRandomName,YetAnotherName

Wow, that worked perfectly! Thank you! Now to learn what the code is actually doing :)

CollectivesTM on Stack Overflow

Python - Extract text from string

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

1 Comment

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related