Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Revisions

9 of 10
added 104 characters in body; edited body
pmt0512
  • 79
  • 1
  • 9

Python programming

My assignment ask to make a function call readFasta that 
accepts 
one 
argument:
the
 name 
of 
a
 fasta
 format 
file
(fn) 
containing 
one 
or 
more 
sequences.
The 
function 
should 
read
 the 
file 
and
 return 
a
 dictionary 
where 
the 
keys 
are 
the 
fasta 
headers 
and 
the 
values
 are 
the 
corresponding 
sequences 
from 
file 
fn 
converted 
to 
strings.
 Make 
sure 
that
 you 
don’t 
include 
any 
new 
lines 
or 
other 
white space 
characters 
in 
the
 sequences 
in
 the 
dictionary.

For ex, if afile.fa looks like:

>one
atctac
>two
gggaccttgg
>three
gacattac

then the a.readFasta(f) returns:

[‘one’ : ‘atctac’,
‘two’ : ‘gggaccttgg’,
‘three’: ‘gacattac’]

If have tried to write some codes but as I am totally newbie in programming, it didnt work out very much for me. Can everyone please help me. Thank you so much. Here are my codes:

import gzip
def readFasta(fn):
 if fn.endswith('.gz'):
 fh = gzip.gzipfile(fn)
 else:
 fh = open(fn,'r')
 d = {}
 while 1:
 line = fh.readline()
 if not line:
 fh.close()
 break
 vals = line.rstrip().split('\t')
 number = vals[0]
 sequence = vals[1]
 if d.has_key(number):
 lst = d[number]
 if gene not in lst:
 # this test may not be necessary
 lst.append(sequence)
 else:
 d[number] = [sequence]
 return d

Here is what I got in my afile.txt

one atctac

two gggaccttgg

three gacattac

pmt0512
  • 79
  • 1
  • 9
lang-py

AltStyle によって変換されたページ (->オリジナル) /