Revision 5af901e2-d558-4708-b486-3e28326a3008 - Stack Overflow
My assignment ask to make a function call readFasta that
accepts
one
argument:
the
name
of
a
fasta
format
file
(fn)
containing
one
or
more
sequences.
The
function
should
read
the
file
and
return
a
dictionary
where
the
keys
are
the
fasta
headers
and
the
values
are
the
corresponding
sequences
from
file
fn
converted
to
strings.
Make
sure
that
you
don’t
include
any
new
lines
or
other
white space
characters
in
the
sequences
in
the
dictionary.
For ex, if afile.fa looks like:
>one
atctac
>two
gggaccttgg
>three
gacattac
then the a.readFasta(f) returns:
[‘one’ : ‘atctac’,
‘two’ : ‘gggaccttgg’,
‘three’: ‘gacattac’]
If have tried to write some codes but as I am totally newbie in programming, it didnt work out very much for me. Can everyone please help me. Thank you so much. Here are my codes:
import gzip
def readFasta(fn):
if fn.endswith('.gz'):
fh = gzip.gzipfile(fn)
else:
fh = open(fn,'r')
d = {}
while 1:
line = fh.readline()
if not line:
fh.close()
break
vals = line.rstrip().split('\t')
number = vals[0]
sequence = vals[1]
if d.has_key(number):
lst = d[number]
if gene not in lst:
# this test may not be necessary
lst.append(sequence)
else:
d[number] = [sequence]
return d