I have a string, where I am only interested in getting the numbers encapsulated in single quotes.
For instance if I have the string "hsa456456 ['1', '2', ...]
I only want the 1 and the 2 and whatever numbers follow
To do this, I have the following code:
import re
#pattern = re.compile("dog")
#l = re.findall(pattern, "dog dog dog")
#valuepattern=re.compile('\'\d{1,6}\'')
valuepattern = re.compile('\'\d+\'')
li = []
s = "hsa04012 [[['7039', '1956', '1398', '25'], ['7039', '1956', '1399', '25']], [['1839', '1956', '1398', '25'], ['1839', '1956', '1399', '25']], [['1399', '25']], [['1398', '25']], [['727738', '1956', '1398', '25'], ['727738', '1956', '1399', '25']], [['1956', '1398', '25'], ['1956', '1399', '25']], [['1950', '1956', '1398', '25'], ['1950', '1956', '1399', '25']], [['374', '1956', '1398', '25'], ['374', '1956', '1399', '25']], [['2069', '1956', '1398', '25'], ['2069', '1956', '1399', '25']], [['685', '1956', '1398', '25'], ['685', '1956', '1399', '25']]]"
#if match:
# print match.group()
#else:
# print "no match"
l = re.findall(valuepattern, s)
#print l
for item in l:
li.append(item.strip("'"))
#print item
for item in li:
print item
My areas of interest is to minimize the number of lists. Right now, I use two l and li. I take the item from l and append it to li after stripping. I was curious if there was a way to accomplish this operation all within one list... without the need for li and then appending.
3 Answers 3
New regex
If you change your regular expression to the following you won't need to even do str.strip()
valuepattern = re.compile("'(\d+)'")
List Comprehension
Alternatively if you don't want to do that, you could do the following. Currently you have:
for item in l:
li.append(item.strip("'"))
This can be replaced with a list comprehension:
l = [x.strip("'") for x in l]
Final Note
As you compile your regular expression, you can replace
re.findall(valuepattern, s)
with
valuepattern.findall(s)
Well this is not the best, but kind of 'short and works'.
def try_parse(s):
try: return int(s)
except: return None
ls = [try_parse(x) for x in your_string.split("'")]
ls = filter(lambda s: s is not None, ls)
Alternative, given your input is representative:
ls = eval(ls[find("["):]) # now fold into recursive list flattening...
-
\$\begingroup\$ I guess this works, but it doesn't feel like the best way to do it. It could also have problems if there are numeric values not in quotes, or if there is a stray quote somewhere. (i don't know how likely either of those scenarios are though...) \$\endgroup\$Matt– Matt2012年11月14日 20:18:09 +00:00Commented Nov 14, 2012 at 20:18
-
\$\begingroup\$ I totally agree, but given the unknown input any method is likely to fail. \$\endgroup\$avip– avip2012年11月14日 20:24:12 +00:00Commented Nov 14, 2012 at 20:24
try to avoid using regex for every problem. Many problems can be solved without regex
map(int, s.split("'")[1::2])