Asked 2 years, 8 months ago

Viewed 118 times

\$\begingroup\$

I'm working on the famous clump finding problem to learn Haskell. Part of the problem involve breaking nucleotide sequences, called kmers, into subsequences as follows:

ACTGCA -> [ACT,CTG,GCA]

This sliding window generates all possible kmers of length k, 3 in the above example.

The problem is my very simple algorithm is too slow to work on the E. Coli genome. Other algorithms in slower languages can generate the kmer list in seconds.

Here is what I wrote:

kmerBreak k genome@(x : xs)
 | k <= length genome = take k genome : kmerBreak k xs
 | otherwise = []

I would love some feedback on how to improve the performance of this simple algorithm. As well as my use of Haskell in general.

edited Jan 5, 2023 at 19:34

Sᴀᴍ Onᴇᴌᴀ's user avatar

Sᴀᴍ Onᴇᴌᴀ ♦

29.5k16 gold badges45 silver badges201 bronze badges

asked Jan 5, 2023 at 19:22

plaffont's user avatar

plaffont plaffont

353 bronze badges

\$\endgroup\$

Add a comment |

2 Answers 2

Sorted by: Reset to default

\$\begingroup\$

Simple mistake: a list does not contain its length, so length traverses the list to compute the length. This means that kmerBreak runs in time quadratic in the size of the input. A simple fix is to check instead that take k genome is indeed of length k, assuming that k is small. Or you can keep track of the length of genome as an extra parameter to the recursive function.

answered Jan 5, 2023 at 22:10

Li-yao Xia's user avatar

Li-yao Xia Li-yao Xia

5132 silver badges5 bronze badges

\$\endgroup\$

Add a comment |

\$\begingroup\$

As the previous poster suggested, you can check that take k returns a list of length k. And it is possible to do it like this:

import Data.List (tails)
kmerBreak k = takeWhile ((==k) . length) . map (take k) . tails

answered Jan 6, 2023 at 10:11

max taldykin's user avatar

max taldykin max taldykin

1,48110 silver badges13 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-hs

Stack Exchange Network

Slow Bioinformatics algorithm - Clump finding algorithm in Haskell

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Slow Bioinformatics algorithm - Clump finding algorithm in Haskell

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions