I have a file of the form -
>SDF123.1 blah blah
ATCTCTGGAAACTCGGTGAAAGAGAGTAT
AGTGATGAGGATGAGTGAG...
>SBF123.1 blah blah
ATCTCTGGAAACTCGGTGAAAGAGAGTAT
AGTGATGAGGATGAGTGAG....
And I want to extract the various sections of this file into individual files (like here
I wrote the following code, but it runs too slow, as compared to when I did not have the close
command in it. I had to incorporate the close
command, since without it, I was getting the awk error - too many open files
.
Here is the code -
cat C1_animal.fasta | awk -F ' ' '{
if (substr(0,ドル 1, 1)==">") {filename=(substr(1,2ドル) ".fa")}
print 0ドル >> filename; close (filename)
}'
How can I make this code more time efficient? I am new to awk.
1 Answer 1
Try to close your filename
only when it's necessary:
File actg.awk
BEGIN {
FS=" "
}
/^>/ {
if (filename != "") {
close(filename)
}
filename = substr(1,2ドル) ".fa"
next
}
filename != "" {
print 0ドル > filename
}
END {
close (filename)
}
With shell command:
awk -f actg.awk C1_animal.fasta
Note: if you are sure there is no line before the first "> ...
", you can skip the filename != " "
test
-
\$\begingroup\$ Thank you, this code worked nicely and was quite faster. Could you explain a little how this code works? I am still trying to laern awk \$\endgroup\$user1995– user19952021年09月06日 05:54:04 +00:00Commented Sep 6, 2021 at 5:54