5

I would like to iterate through every possible k-length string (called k-mer) given a list of symbols. For example, if k = 3 and symbols = {A, C, G, T}, then:

AAA
AAC
AAG
...
TTG
TTT

Here's my code to generate the string:

local k = 3
local bases = {'A', 'C', 'T', 'G'}
-- Generate the string (AAA...AAA)
local kmer_gen = {}
for i = 1,k do kmer_gen[i] = "A" end
local kmer = table.concat(kmer_gen)

It works but it surely does not look good. Can this be achieved more elegantly?

Now, I'm not sure how to iterate through the possible k-mers. One solution would be keeping substituting each character, but this doesn't seen efficient. Another way would be decoding from binary (each 2 bits represent a base), but the implementation is confusing and requires bitwise operations. Any other thoughts?

asked Nov 6, 2013 at 8:33

2 Answers 2

6

Here is a solution using an iterator. It's a nice example of coroutines, a technique well worth knowing in Lua. See also http://www.lua.org/pil/9.3.html.

local bases = {'A', 'C', 'T', 'G'}
local function allstrings(n,t,k,s)
 k=k or 1
 s=s or {}
 if k>n then
 coroutine.yield(table.concat(s))
 else
 for i=1,#t do
 s[k]=t[i]
 allstrings(n,t,k+1,s)
 end
 end
end
local function kmer(n,t)
 return coroutine.wrap(allstrings),n,t
end
for w in kmer(3,bases) do
 print(w)
end
answered Nov 6, 2013 at 10:44
Sign up to request clarification or add additional context in comments.

Comments

4

Here is a relatively simple tail-recursive solution I would probably use:

local bases = {'A', 'C', 'T', 'G'}
local function kmers(n, prev)
 prev = prev or {''}
 if n <= 0 then return prev end
 local k,r = 1,{}
 for i=1,#prev do
 for j=1,#bases do
 r[k] = prev[i] .. bases[j]
 k = k+1
 end
 end
 return kmers(n-1, r)
end
_3mers = kmers(3) -- usage example

Note: You could write r[#r+1] instead of managing k manually but doing so is not so complicated and significantly faster in cases like this (the # operator is O(log n)).

answered Nov 6, 2013 at 9:25

3 Comments

Have you measured "significantly faster"?
About 15% on the machine I'm running on for kmers(11) for instance (7.23s / 8.57s). Given the simplicity of the change I think it's worth it. Your solution is even faster (6.05s without printing).
Note: if I replace the last line of your solution by local t = {}; for w in kmer(11,bases) do t[#t+1] = w end to make an actual table it shoots to 11.41s. With this: local k,t = 1,{}; for w in kmer(11,bases) do t[k] = w; k=k+1 end 10.02s. The previous timing was with an empty do/end block. Also, all timings with Lua 5.1.5, not 5.2.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.