Search string in a list

Question 1

data = ['HTTP/1.1 200 OK', 'CACHE-CONTROL: max-age=1810', 'DATE: 2014年5月14日 12:15:19 GMT', 'EXT:', 'LOCATION: http://192.168.94.57:9000/DeviceDescription.xml', 'SERVER: Windows NT/5.0, UPnP/1.0, pvConnect UPnP SDK/1.0', 'ST: uuid:7076436f-6e65-1063-8074-78542e239ff5', 'USN: uuid:7076436f-6e65-1063-8074-78542e239ff5', 'Content-Length: 0', '', '']

From the above list, I have to extract the .xml link.

My code:

for element in data:
 if 'LOCATION' in element:
 xmllink = element.split(': ').[1]

It's taking too much time. How can I make this faster?

Actually I am doing SSDP discovery for finding devices in a network. After sending the M-SEARCH command, devices send a datagram packet which I have taken in a data variable. From this I have to extract the file link of that device for processing it.

When I use indexing to extract, it was done quickly.

Question 2

I cannot understand how a split and a small array like that can take too much time. Have you use some sort of profiling to make sure that this part is the problem?

Question 3

@Marc-Andre actually i am doing ssdp dicovery for devices in network, after sending M-SEARCH command devices respond with a datagram packet which i have taken in "data" and its taking so much time to process this, earlier i have used direct indexing to find "LOCATION" and it was done quickly

Question 4

those information are important for a review! You should edit the question.

Question 5

You want to test for element.startswith('LOCATION: '). You are doing 'LOCATION' in element, which is not only slower since it has to check at every position of every element, it might also lead to false matches.

Also, element and data are poor names. I suggest header and headers, respectively.

My suggestion:

LOC = 'LOCATION: '
xmllinks = [header[len(LOC):] for header in headers if header.startswith(LOC)]
if xmllinks:
 xmllink = xmllinks[0]

Question 6

Is it safe to assume there will be a single space after 'LOCATION:' based on his original implementation? Searching for simply 'LOCATION:' then using strip() to remove any additional white space would probably be more secure.

Question 7

I am not sure what is causing problem. This piece of code should not take up a lot of time. But here are some of the suggestions to speed up your code:

Instead of list, create a set. The lookup time in set is constant.
But the main problem with set is uses up more memory in comparison to list. So another option is to keep a sorted list (if possible) and use the bisect module to search the list.

Now some style suggestions:

for element in data:
 if 'LOCATION' in element:
 xmllink = element.split(': ').[1]

rewrite it as

for element in data:
 if 'LOCATION' in element and ':' in element:
 xmllink = element.split(':')[1].strip()

This ensures that if a string like 'LOCATION something' is in list then it will not raise any errors.

The .strip() is a better way to remove trailing whitespaces

Question 8

Using a set only works if you want to check the whole element, set.contains("LOCATION: http://www.example.com/test.xml") would be fast, but what is needed here is something like set.startsWith("LOCATION: "), which doesn't exist.

Question 9

With such a small input it should be lightning fast. Anyway, shouldn't you break after finding the element? I'd write:

xmllink = next(s.split(":", 1)[1].strip() for s in data if s.startswith("LOCATION:")

Question 10

how does this code ensures speedup ?

Question 11

because it breaks after finding the match. Not sure how the OP is having speed problems here, though

Question 12

@tokland Read the comments under the question, he's giving a bit more information about the speed problem.

Question 13

While I agree with the break suggestion, I think using regex is totally overkill.

Question 14

some refactors.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Answer 1 · 2014-05-14 14:50:40Z

You want to test for element.startswith('LOCATION: '). You are doing 'LOCATION' in element, which is not only slower since it has to check at every position of every element, it might also lead to false matches.

Also, element and data are poor names. I suggest header and headers, respectively.

My suggestion:

LOC = 'LOCATION: '
xmllinks = [header[len(LOC):] for header in headers if header.startswith(LOC)]
if xmllinks:
 xmllink = xmllinks[0]

Is it safe to assume there will be a single space after 'LOCATION:' based on his original implementation? Searching for simply 'LOCATION:' then using strip() to remove any additional white space would probably be more secure.

Pranav Raj Pranav Raj 4952 silver badges11 bronze badges · Answer 2 · 2014-05-14 14:22:55Z

I am not sure what is causing problem. This piece of code should not take up a lot of time. But here are some of the suggestions to speed up your code:

Instead of list, create a set. The lookup time in set is constant.
But the main problem with set is uses up more memory in comparison to list. So another option is to keep a sorted list (if possible) and use the bisect module to search the list.

Now some style suggestions:

for element in data:
 if 'LOCATION' in element:
 xmllink = element.split(': ').[1]

rewrite it as

for element in data:
 if 'LOCATION' in element and ':' in element:
 xmllink = element.split(':')[1].strip()

This ensures that if a string like 'LOCATION something' is in list then it will not raise any errors.

The .strip() is a better way to remove trailing whitespaces

Using a set only works if you want to check the whole element, set.contains("LOCATION: http://www.example.com/test.xml") would be fast, but what is needed here is something like set.startsWith("LOCATION: "), which doesn't exist.

tokland tokland 11.2k1 gold badge21 silver badges26 bronze badges · Answer 3 · 2014-05-14 14:16:55Z

1

\$\begingroup\$

With such a small input it should be lightning fast. Anyway, shouldn't you break after finding the element? I'd write:

xmllink = next(s.split(":", 1)[1].strip() for s in data if s.startswith("LOCATION:")

Share

edited May 14, 2014 at 20:12

answered May 14, 2014 at 14:16

tokland's user avatar

tokland tokland

11.2k1 gold badge21 silver badges26 bronze badges

\$\endgroup\$

5

\$\begingroup\$ how does this code ensures speedup ? \$\endgroup\$

Pranav Raj
– Pranav Raj

2014年05月14日 14:26:26 +00:00
Commented May 14, 2014 at 14:26
1

\$\begingroup\$ because it breaks after finding the match. Not sure how the OP is having speed problems here, though \$\endgroup\$

tokland
– tokland

2014年05月14日 14:26:52 +00:00
Commented May 14, 2014 at 14:26
1

\$\begingroup\$ @tokland Read the comments under the question, he's giving a bit more information about the speed problem. \$\endgroup\$

Marc-Andre
– Marc-Andre

2014年05月14日 14:28:09 +00:00
Commented May 14, 2014 at 14:28
\$\begingroup\$ While I agree with the break suggestion, I think using regex is totally overkill. \$\endgroup\$

Simon Forsberg
– Simon Forsberg

2014年05月14日 16:12:13 +00:00
Commented May 14, 2014 at 16:12
\$\begingroup\$ some refactors. \$\endgroup\$

tokland
– tokland

2014年05月14日 20:14:46 +00:00
Commented May 14, 2014 at 20:14

Add a comment |

Stack Exchange Network

Search string in a list

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Search string in a list

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions