Replace one-liner sed/awk with python

Question 1

I have some files that I want to process, and I know how to do it in sed/awk (for each one):

awk '{if (index(0,ドル"#")!=1) {line++; if (line%3==1) {print 2,ドル3ドル}}}' q.post > q

or

grep -v "#" q.post | awk '{if (NR%3==1) {print 2,ドル3ドル}}'

It's one line, and rather beautiful and clear.

Now, my main program is in python (2.7). Calling sed/awk from python is a bit tedious—I get some error—and I'd rather use a nice pythonic way to do it.

So far I have:

 pp_files = glob.glob("*gauss.post")
 for pp in pp_files:
 ppf = open(pp)
 with open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 counter = 0
 temp = []
 for line in ppf.readlines():
 if not line.startswith("#"):
 temp.append(line)
 for line in temp:
 if counter % 3 == 0:
 outfile.write(" ".join(line.split()[1:3]) + '\n')
 counter += 1
 ppf.close()

Meh.

It works, but it's not beautiful. Is there a pythonic way, preferentially a clear one liner (not 10 imbricated list comprehension, to replace awk and sed ?

Thanks

Question 2

First you should add open(pp) to your with. Always use with with open. This is as it will always close the file, even if there is an error.

But onto your code. You seem to dislike comprehensions. I don't really get why. Take your code:

for line in ppf.readlines():
 if not line.startswith("#"):
 temp.append(line)

This can instead be:

[line for line in ppf if not line.startswith("#")]

I know which I find easier to read. But if you don't like it fair dues. After this I'd then slice the list, you want every third line. To do this we can use the slice operator, say you have the string abcdefghijk, but you only want every third character. You'd do 'abcdefghijk'[::3]. This gets adgj. This removes the need for counter, and so can simplify your code to:

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in [line for line in ppf if not line.startswith("#")][::3]:
 outfile.write(" ".join(line.split()[1:3]) + '\n')

But if your file is large it'll read all of it into a list, then take a third of it put it in another list. That's bad, instead if you use a generator comprehension and itertools.islice then you can achieve the same as above. But the program will use less memory.

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in islice((line for line in ppf if not line.startswith("#")), 0, None, 3):
 outfile.write(" ".join(line.split()[1:3]) + '\n')

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Accepted Answer · 2016-11-30 12:51:01Z

First you should add open(pp) to your with. Always use with with open. This is as it will always close the file, even if there is an error.

But onto your code. You seem to dislike comprehensions. I don't really get why. Take your code:

for line in ppf.readlines():
 if not line.startswith("#"):
 temp.append(line)

This can instead be:

[line for line in ppf if not line.startswith("#")]

I know which I find easier to read. But if you don't like it fair dues. After this I'd then slice the list, you want every third line. To do this we can use the slice operator, say you have the string abcdefghijk, but you only want every third character. You'd do 'abcdefghijk'[::3]. This gets adgj. This removes the need for counter, and so can simplify your code to:

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in [line for line in ppf if not line.startswith("#")][::3]:
 outfile.write(" ".join(line.split()[1:3]) + '\n')

But if your file is large it'll read all of it into a list, then take a third of it put it in another list. That's bad, instead if you use a generator comprehension and itertools.islice then you can achieve the same as above. But the program will use less memory.

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in islice((line for line in ppf if not line.startswith("#")), 0, None, 3):
 outfile.write(" ".join(line.split()[1:3]) + '\n')

Stack Exchange Network

Replace one-liner sed/awk with python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Replace one-liner sed/awk with python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions