how to fast processing one million strings to remove quotes
Daiyue Weng
daiyueweng at gmail.com
Wed Aug 2 13:48:20 EDT 2017
that works superbly! any idea about how to multi process the task and
concatenate results from each process back into a list?
On 2 August 2017 at 18:05, MRAB <python at mrabarnett.plus.com> wrote:
> On 2017年08月02日 16:05, Daiyue Weng wrote:
>>> Hi, I am trying to removing extra quotes from a large set of strings (a
>> list of strings), so for each original string, it looks like,
>>>> """str_value1"",""str_value2"",""str_value3"",1,""str_value4"""
>>>>>> I like to remove the start and end quotes and extra pairs of quotes on
>> each
>> string value, so the result will look like,
>>>> "str_value1","str_value2","str_value3",1,"str_value4"
>>>>>> and then join each string by a new line.
>>>> I have tried the following code,
>>>> for line in str_lines[1:]:
>> strip_start_end_quotes = line[1:-1]
>> splited_line_rem_quotes =
>> strip_start_end_quotes.replace('\"\"', '"')
>> str_lines[str_lines.index(line)] = splited_line_rem_quotes
>>>> for_pandas_new_headers_str = '\n'.join(splited_lines)
>>>> but it is really slow (running for ages) if the list contains over 1
>> million string lines. I am thinking about a fast way to do that.
>>>> [snip]
>> The problem is the line:
>> str_lines[str_lines.index(line)]
>> It does a linear search through str_lines until time finds a match for the
> line.
>> To find the 10th line it must search through the first 10 lines.
>> To find the 100th line it must search through the first 100 lines.
>> To find the 1000th line it must search through the first 1000 lines.
>> And so on.
>> In Big-O notation, the performance is O(n**2).
>> The Pythonic way of doing it is to put the results into a new list:
>>> new_str_lines = str_lines[:1]
>> for line in str_lines[1:]:
> strip_start_end_quotes = line[1:-1]
> splited_line_rem_quotes = strip_start_end_quotes.replace('\"\"', '"')
> new_str_lines.append(splited_line_rem_quotes)
>>> In Big-O notation, the performance is O(n).
> --
> https://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list