I need some help understanding a function that i want to use but I'm not entirely sure what some parts of it do. I understand that the function is creating dictionaries from reads out of a Fasta-file. From what I understand this is supposed to generate pre- and suffix dictionaries for ultimately extending contigs (overlapping dna-sequences). The code:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
dict = {}
multipleKeys = []
i = 1
for read in reads:
if read[0:lenKeys] in dict:
multipleKeys.append(read[0:lenKeys])
else:
dict[read[0:lenKeys]] = read[lenKeys:]
if verbose:
print("\rChecking suffix", i, "of", len(reads), end = "", flush = True)
i += 1
for key in set(multipleKeys):
del(dict[key])
if verbose:
print("\nCreated", len(dict), "suffixes with length", lenSuffix, \
"from", len(reads), "Reads. (", len(reads) - len(dict), \
"unambigous)")
return(dict)
Additional Information: reads = readFasta("smallReads.fna", verbose = True)
This is how the function is called:
if __name__ == "__main__":
reads = readFasta("smallReads.fna", verbose = True)
suffixDicts = makeSuffixDicts(reads, 10)
The smallReads.fna file contains strings of bases (Dna):
"> read 1
TTATGAATATTACGCAATGGACGTCCAAGGTACAGCGTATTTGTACGCTA
"> read 2
AACTGCTATCTTTCTTGTCCACTCGAAAATCCATAACGTAGCCCATAACG
"> read 3
TCAGTTATCCTATATACTGGATCCCGACTTTAATCGGCGTCGGAATTACT
Here are the parts I don't understand:
lenKeys = len(reads[0]) - lenSuffix
What does the value [0] mean? From what I understand "len" returns the number of elements in a list. Why is "reads" automatically a list? edit: It seems a Fasta-file can be declared as a List. Can anybody confirm that?
if read[0:lenKeys] in dict:
Does this mean "from 0 to 'lenKeys'"? Still confused about the value.
In another function there is a similar line: if read[-lenKeys:] in dict:
What does the "-" do?
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
Here I don't understand the parameters: How can reads be a parameter? What is lenSuffix = 20 in the context of this function other than a value subtracted from len(reads[0])?
What is verbose? I have read about a "verbose-mode" ignoring whitespaces but i have never seen it used as a parameter and later as a variable.
1 Answer 1
The tone of your question makes me feel like you're confusing things like program features (len, functions, etc) with things that were defined by the original programmer (the type of reads, verbose, etc).
def some_function(these, are, arbitrary, parameters):
pass
This function defines a bunch of parameters. They don't mean anything at all, other than the value I give to them implicitly. For example if I do:
def reverse_string(s):
pass
s is probably a string, right? In your example we have:
def makeSuffixDict(reads, lenSuffix = 20, verbose = True):
lenKeys = len(reads[0]) - lenSuffix
...
From these two lines we can infer a few things:
- the function will probably return a dictionary (from its name)
lenSuffixis anint, andverboseis abool(from their default parameters)readscan be indexed (string? list? tuple?)- the items inside
readshave length (string? list? tuple?)
Since Python is dynamically typed, this is ALL WE CAN KNOW about the function so far. The rest would be explained by its documentation or the way it's called.
That said: let me cover all your questions in order:
- What does the value [0] mean?
some_object[0]is grabbing the first item in a container.[1,2,3][0] == 1,"Hello, World!"[0] == "H". This is called indexing, and is governed by the__getitem__magic method
- From what I understand "len" returns the number of elements in a list.
lenis a built-in function that returns the length of an object. It is governed by the__len__magic method.len('abc') == 3, alsolen([1, 2, 3]) == 3. Note thatlen(['abc']) == 1, since it is measuring the length of the list, not the string inside it.
- Why is "reads" automatically a list?
readsis a parameter. It is whatever the calling scope passes to it. It does appear that it expects a list, but that's not a hard and fast rule!
- (various questions about slicing)
Slicing is doing
some_container[start_idx : end_idx [ : step_size]]. It does pretty much what you'd expect:"0123456"[0:3] == "012". Slice indexes are considered to be zero-indexed and lay between the elements, so[0:1]is identical to[0], except that slices return lists, not individual objects (so'abc'[0] == 'a'but'abc'[0:1] == ['a']). If you omit either start or end index, it is treated as the beginning or end of the string respectively. I won't go into step size here.Negative indexes count from the back, so
'0123456'[-3:] == '456'. Note that[-0]is not the last value,[-1]is. This is contrasted with[0]` being the first value.
- How can reads be a parameter?
Because the function is defined as
makeSuffixDict(reads, ...). That's what a parameter is.
- What is lenSuffix = 20 in the context of this function
Looks like it's the length of the expected suffix!
- What is
verbose?
verbosehas no meaning on its own. It's just another parameter. Looks like the author included theverboseflag so you could get output while the function ran. Notice all theif verboseblocks seem to do nothing, just provide feedback to the user.
4 Comments
reads in reads = readFasta("smallReads.fna", verbose = True) is in your module scope, while the reads inside makeSuffixDict is in the function scope. They're different! Though I would hazard a guess that makeSuffixDict is called with that same reads variable.reads is a list of strings, and each string is supposed to be the same length so lenKeys is calculated only once so read[:lenKeys] is everything before the suffix and read[lenSuffix:] is the suffix.
makeSuffixDictfunction expects thatreadsis in fact a list (if you don't pass it a list, it won't work). Do you have documentation for this function that specifies its requirements?read[:lenKeys]means "everything inreadup to index numberlenKeys". Similarly,read[-lenKeys]is just an index, but using a negative operator. So, "lenKeysobjects back from the end ofread".