I have written a function to selectively extract data from a file. I want to be able to extract only from a certain line and only given rows.
Would convert this function into a generator improve the overhead when I need to process large files?
import itertools
import csv
def data_extraction(filename,start_line,lenght,span_start,span_end):
with open(filename, "r") as myfile:
file_= csv.reader(myfile, delimiter=' ') #extracts data from .txt as lines
return (x for x in [filter(lambda a: a != '', row[span_start:span_end]) \
for row in itertools.islice(file_, start_line, lenght)])
-
\$\begingroup\$ Are you looking for general advice or are you just interested in making this a generator? The former is on-topic here, the latter is not. Too specific. Please take a look at the help center. \$\endgroup\$Mast– Mast ♦2016年09月22日 16:06:38 +00:00Commented Sep 22, 2016 at 16:06
-
\$\begingroup\$ Well I am asking about increasing the performance of this working function. And according to the help center, generally applicable questions on code should be going to SO, and specific question on how to improve a piece of code should go here. \$\endgroup\$Sorade– Sorade2016年09月22日 16:39:44 +00:00Commented Sep 22, 2016 at 16:39
-
\$\begingroup\$ "Do I want feedback about any or all facets of the code?" This means we can complain about any and all facets of your code, even if it doesn't address your specific concern for generators. If you have a problem with that, CR is not the place to be. If you're fine with that, Welcome! \$\endgroup\$Mast– Mast ♦2016年09月22日 17:12:19 +00:00Commented Sep 22, 2016 at 17:12
-
3\$\begingroup\$ I'm very happy with any feedback I can get . Being self-taught I have a lot to learn so I'll take any constructive criticism. \$\endgroup\$Sorade– Sorade2016年09月22日 17:28:08 +00:00Commented Sep 22, 2016 at 17:28
1 Answer 1
Use round parenthesis for generators
Also x for x in
was unnecessary:
return (filter(lambda a: a != '', row[span_start:span_end]) \
for row in itertools.islice(file_, start_line, lenght))
If you use Python 2 you should use itertools.ifilter
because it returns a generator while filter
returns a list.
The functions is pretty clear overall, I suggest you space your argument list as according to PEP8 conventions. Also investigate in easier to remember argument formats like f(file, line_range, inline_range)
where two tuples replace 4 arguments.