1
\$\begingroup\$

I have a string, where I am only interested in getting the numbers encapsulated in single quotes.

For instance if I have the string "hsa456456 ['1', '2', ...]

I only want the 1 and the 2 and whatever numbers follow

To do this, I have the following code:

import re
#pattern = re.compile("dog")
#l = re.findall(pattern, "dog dog dog")
#valuepattern=re.compile('\'\d{1,6}\'')
valuepattern = re.compile('\'\d+\'')
li = []
s = "hsa04012 [[['7039', '1956', '1398', '25'], ['7039', '1956', '1399', '25']], [['1839', '1956', '1398', '25'], ['1839', '1956', '1399', '25']], [['1399', '25']], [['1398', '25']], [['727738', '1956', '1398', '25'], ['727738', '1956', '1399', '25']], [['1956', '1398', '25'], ['1956', '1399', '25']], [['1950', '1956', '1398', '25'], ['1950', '1956', '1399', '25']], [['374', '1956', '1398', '25'], ['374', '1956', '1399', '25']], [['2069', '1956', '1398', '25'], ['2069', '1956', '1399', '25']], [['685', '1956', '1398', '25'], ['685', '1956', '1399', '25']]]"
#if match:
# print match.group()
#else:
# print "no match"
l = re.findall(valuepattern, s)
#print l
for item in l:
 li.append(item.strip("'"))
 #print item
for item in li:
 print item

My areas of interest is to minimize the number of lists. Right now, I use two l and li. I take the item from l and append it to li after stripping. I was curious if there was a way to accomplish this operation all within one list... without the need for li and then appending.

asked Nov 14, 2012 at 19:39
\$\endgroup\$

3 Answers 3

2
\$\begingroup\$

New regex

If you change your regular expression to the following you won't need to even do str.strip()

valuepattern = re.compile("'(\d+)'")

List Comprehension

Alternatively if you don't want to do that, you could do the following. Currently you have:

for item in l:
 li.append(item.strip("'"))

This can be replaced with a list comprehension:

l = [x.strip("'") for x in l]

Final Note

As you compile your regular expression, you can replace

re.findall(valuepattern, s)

with

valuepattern.findall(s)
answered Nov 14, 2012 at 20:00
\$\endgroup\$
1
\$\begingroup\$

Well this is not the best, but kind of 'short and works'.

def try_parse(s):
 try: return int(s)
 except: return None
ls = [try_parse(x) for x in your_string.split("'")]
ls = filter(lambda s: s is not None, ls)

Alternative, given your input is representative:

ls = eval(ls[find("["):]) # now fold into recursive list flattening...
answered Nov 14, 2012 at 20:13
\$\endgroup\$
2
  • \$\begingroup\$ I guess this works, but it doesn't feel like the best way to do it. It could also have problems if there are numeric values not in quotes, or if there is a stray quote somewhere. (i don't know how likely either of those scenarios are though...) \$\endgroup\$ Commented Nov 14, 2012 at 20:18
  • \$\begingroup\$ I totally agree, but given the unknown input any method is likely to fail. \$\endgroup\$ Commented Nov 14, 2012 at 20:24
1
\$\begingroup\$

try to avoid using regex for every problem. Many problems can be solved without regex

map(int, s.split("'")[1::2])
answered Nov 15, 2012 at 6:26
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.