Minimize Number of Lists

Question 1

I have a string, where I am only interested in getting the numbers encapsulated in single quotes.

For instance if I have the string "hsa456456 ['1', '2', ...]

I only want the 1 and the 2 and whatever numbers follow

To do this, I have the following code:

import re
#pattern = re.compile("dog")
#l = re.findall(pattern, "dog dog dog")
#valuepattern=re.compile('\'\d{1,6}\'')
valuepattern = re.compile('\'\d+\'')
li = []
s = "hsa04012 [[['7039', '1956', '1398', '25'], ['7039', '1956', '1399', '25']], [['1839', '1956', '1398', '25'], ['1839', '1956', '1399', '25']], [['1399', '25']], [['1398', '25']], [['727738', '1956', '1398', '25'], ['727738', '1956', '1399', '25']], [['1956', '1398', '25'], ['1956', '1399', '25']], [['1950', '1956', '1398', '25'], ['1950', '1956', '1399', '25']], [['374', '1956', '1398', '25'], ['374', '1956', '1399', '25']], [['2069', '1956', '1398', '25'], ['2069', '1956', '1399', '25']], [['685', '1956', '1398', '25'], ['685', '1956', '1399', '25']]]"
#if match:
# print match.group()
#else:
# print "no match"
l = re.findall(valuepattern, s)
#print l
for item in l:
 li.append(item.strip("'"))
 #print item
for item in li:
 print item

My areas of interest is to minimize the number of lists. Right now, I use two l and li. I take the item from l and append it to li after stripping. I was curious if there was a way to accomplish this operation all within one list... without the need for li and then appending.

Question 2

New regex

If you change your regular expression to the following you won't need to even do str.strip()

valuepattern = re.compile("'(\d+)'")

List Comprehension

Alternatively if you don't want to do that, you could do the following. Currently you have:

for item in l:
 li.append(item.strip("'"))

This can be replaced with a list comprehension:

l = [x.strip("'") for x in l]

Final Note

As you compile your regular expression, you can replace

re.findall(valuepattern, s)

with

valuepattern.findall(s)

Question 3

Well this is not the best, but kind of 'short and works'.

def try_parse(s):
 try: return int(s)
 except: return None
ls = [try_parse(x) for x in your_string.split("'")]
ls = filter(lambda s: s is not None, ls)

Alternative, given your input is representative:

ls = eval(ls[find("["):]) # now fold into recursive list flattening...

Question 4

I guess this works, but it doesn't feel like the best way to do it. It could also have problems if there are numeric values not in quotes, or if there is a stray quote somewhere. (i don't know how likely either of those scenarios are though...)

Question 5

I totally agree, but given the unknown input any method is likely to fail.

Question 6

try to avoid using regex for every problem. Many problems can be solved without regex

map(int, s.split("'")[1::2])

Matt Matt 4322 silver badges8 bronze badges · Accepted Answer · 2012-11-14 20:00:59Z

New regex

If you change your regular expression to the following you won't need to even do str.strip()

valuepattern = re.compile("'(\d+)'")

List Comprehension

Alternatively if you don't want to do that, you could do the following. Currently you have:

for item in l:
 li.append(item.strip("'"))

This can be replaced with a list comprehension:

l = [x.strip("'") for x in l]

Final Note

As you compile your regular expression, you can replace

re.findall(valuepattern, s)

with

valuepattern.findall(s)

Stack Exchange Network

Minimize Number of Lists

3 Answers 3

New regex

List Comprehension

Final Note

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Minimize Number of Lists

3 Answers 3

New regex

List Comprehension

Final Note

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions