4
\$\begingroup\$

I am extracting exons details from a GTF file using command line in Unix like cut, awk, grep or sed.

input file.gtf:

chrI ce11_ws245Genes CDS 8378308 8378427 0.000000 - 0 gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes exon 8377602 8378427 0.000000 - . gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes CDS 8379137 8379239 0.000000 - 1 gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes exon 8379137 8379239 0.000000 - . gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes CDS 8379706 8379815 0.000000 - 0 gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes exon 8379706 8379815 0.000000 - . gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes CDS 8380330 8380445 0.000000 - 2 gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes exon 8380330 8380445 0.000000 - . gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 
chrI ce11_ws245Genes CDS 8388028 8388092 0.000000 - 1 gene_id "T19A6.1a.2"; transcript_id "T19A6.1a.2"; 

Desired output:

chrI 8377602 8378427 - T19A6.1a.2
chrI 8379137 8379239 - T19A6.1a.2
chrI 8379706 8379815 - T19A6.1a.2
chrI 8380330 8380445 - T19A6.1a.2

My successful attempts to solve the problem:

awk '/exon/ {print 1ドル " " 4ドル " " 5ドル " " 7ドル " " 10ドル;}' file.gtf | awk '{sub(/gene_id/,"",5ドル)};1' | awk -F'"' '{print 1,ドル 2ドル}'
grep 'exon' file.gtf | cut -f1,4,5,7,9 | cut -d ';' -f1 | awk '{sub(/gene_id/,"",5ドル)};1' | awk -F'"' '{print 1,ドル 2ドル}' 

steps:

  1. search for lines which contain the word 'exon'
  2. cut the fields of interest 1,4,5,7,9
  3. in field 9: cut using the delimiter ';'
  4. remove 'gene_id'
  5. remove the double quotations around the genes' names
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Oct 5, 2016 at 10:13
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Your code can be simplified with only awk script:

awk '/exon/ {gsub("[\";]","", 10ドル);print 1,ドル4,ドル5,ドル7,ドル10ドル}' file.gtk

gusb removes any occurrence " or ; in the 10th element.

answered May 23, 2018 at 7:15
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.