1
\$\begingroup\$

I have a file of the form -

>SDF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG...

>SBF123.1 blah blah

ATCTCTGGAAACTCGGTGAAAGAGAGTAT

AGTGATGAGGATGAGTGAG....

And I want to extract the various sections of this file into individual files (like here

I wrote the following code, but it runs too slow, as compared to when I did not have the close command in it. I had to incorporate the close command, since without it, I was getting the awk error - too many open files.

Here is the code -

cat C1_animal.fasta | awk -F ' ' '{
 if (substr(0,ドル 1, 1)==">") {filename=(substr(1,2ドル) ".fa")}
 print 0ドル >> filename; close (filename)
}'

How can I make this code more time efficient? I am new to awk.

mdfst13
22.4k6 gold badges34 silver badges70 bronze badges
asked Sep 3, 2021 at 18:17
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Try to close your filename only when it's necessary:

File actg.awk

BEGIN {
 FS=" "
}
/^>/ {
 if (filename != "") {
 close(filename)
 }
 filename = substr(1,2ドル) ".fa"
 next
}
filename != "" {
 print 0ドル > filename
}
END {
 close (filename)
}

With shell command:

awk -f actg.awk C1_animal.fasta

Note: if you are sure there is no line before the first "> ...", you can skip the filename != " " test

answered Sep 4, 2021 at 17:16
\$\endgroup\$
1
  • \$\begingroup\$ Thank you, this code worked nicely and was quite faster. Could you explain a little how this code works? I am still trying to laern awk \$\endgroup\$ Commented Sep 6, 2021 at 5:54

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.