7
\$\begingroup\$

I have some files that I want to process, and I know how to do it in sed/awk (for each one):

awk '{if (index(0,ドル"#")!=1) {line++; if (line%3==1) {print 2,ドル3ドル}}}' q.post > q 

or

grep -v "#" q.post | awk '{if (NR%3==1) {print 2,ドル3ドル}}' 

It's one line, and rather beautiful and clear.

Now, my main program is in python (2.7). Calling sed/awk from python is a bit tedious—I get some error—and I'd rather use a nice pythonic way to do it.

So far I have:

 pp_files = glob.glob("*gauss.post")
 for pp in pp_files:
 ppf = open(pp)
 with open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 counter = 0
 temp = []
 for line in ppf.readlines():
 if not line.startswith("#"):
 temp.append(line)
 for line in temp:
 if counter % 3 == 0:
 outfile.write(" ".join(line.split()[1:3]) + '\n')
 counter += 1
 ppf.close()

Meh.

It works, but it's not beautiful. Is there a pythonic way, preferentially a clear one liner (not 10 imbricated list comprehension, to replace awk and sed ?

Thanks

jacwah
2,69118 silver badges42 bronze badges
asked Nov 30, 2016 at 12:01
\$\endgroup\$

1 Answer 1

10
\$\begingroup\$

First you should add open(pp) to your with. Always use with with open. This is as it will always close the file, even if there is an error.

But onto your code. You seem to dislike comprehensions. I don't really get why. Take your code:

for line in ppf.readlines():
 if not line.startswith("#"):
 temp.append(line)

This can instead be:

[line for line in ppf if not line.startswith("#")]

I know which I find easier to read. But if you don't like it fair dues. After this I'd then slice the list, you want every third line. To do this we can use the slice operator, say you have the string abcdefghijk, but you only want every third character. You'd do 'abcdefghijk'[::3]. This gets adgj. This removes the need for counter, and so can simplify your code to:

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in [line for line in ppf if not line.startswith("#")][::3]:
 outfile.write(" ".join(line.split()[1:3]) + '\n')

But if your file is large it'll read all of it into a list, then take a third of it put it in another list. That's bad, instead if you use a generator comprehension and itertools.islice then you can achieve the same as above. But the program will use less memory.

for pp in pp_files:
 with open(pp) as ppf, open(pp[:pp.rfind(".post")] + "_clean.post", "w") as outfile:
 for line in islice((line for line in ppf if not line.startswith("#")), 0, None, 3):
 outfile.write(" ".join(line.split()[1:3]) + '\n')
answered Nov 30, 2016 at 12:51
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.