Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming code so as to use a functional style (for example in this question this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming code so as to use a functional style (for example in this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming code so as to use a functional style (for example in this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
- 50.1k
- 3
- 130
- 210
First, the output of yourthe function ibwt
doesn't match the example output in yourthe question:
>>> from pprint import pprint
>>> pprint(ibwt('ssyxxiit'))
['iisstxxy',
'xxiiysts',
'stxxsiyi',
'iystixsx',
'xsiyxtis',
'tixssyxi',
'yxtiissx',
'ssyxxiit']
>>> from pprint import pprint
>>> pprint(ibwt('ssyxxiit'))
['iisstxxy',
'xxiiysts',
'stxxsiyi',
'iystixsx',
'xsiyxtis',
'tixssyxi',
'yxtiissx',
'ssyxxiit']
(You You should always prepare your question by copying and pasting input and output from your source files and from an interactive session, rather than typing in what you think it should be. You can easilyIt's easy to introduce or cover up mistakes if you do the latter.)
So it looks as if you are missingthere is a missing transpose
:
>>> pprint(transpose(ibwt('ssyxxiit')))
['ixsixtys',
'ixtysixs',
'sixsixty',
'sixtysix',
'tysixsix',
'xsixtysi',
'xtysixsi',
'ysixsixt']
>>> pprint(transpose(ibwt('ssyxxiit')))
['ixsixtys',
'ixtysixs',
'sixsixty',
'sixtysix',
'tysixsix',
'xsixtysi',
'xtysixsi',
'ysixsixt']
With that bug fixed, let's see how long yourthe function takes on a medium-sized amount of data:
>>> import random, string
>>> from timeit import timeit
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt(t[1]), number=1)
49.83987808227539
>>> import random, string
>>> from timeit import timeit
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt(t[1]), number=1)
49.83987808227539
Not so good! What can we change? Well, an obvious idea is to cut out all those calls to transpose
. AtThe function ibwt
transposes the moment you transpose your matrix in order to sort it, and then transposetransposes it back again each time round the loop. By keeping the matrix in transposed form throughout, you canit would be possible to avoid having to call transpose
at all:
def ibwt2(s):
n"""Return =the len(inverse Burrows-Wheeler Transform matrix based on the
string s.
>>> ibwt2('SSYXXIIT')[3]
m'SIXTYSIX'
"""
matrix = [''] * nlen(s)
for _ in xrange(n)s:
mmatrix = sorted(s[i]i + m[i]j for i, j in xrangezip(ns, matrix))
return mmatrix
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt2(t[1]), number=1)
0.9414288997650146
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt2(t[1]), number=1)
0.9414288997650146
Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming your code so as to use a functional style (for example in this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
Pure functional programs can easily end up doing a lot of allocation and copying. For example, it is often tempting to transform your data into a format suitable for passing in to a function (rather than transforming your function so that it operates on the data you have). Thus in your
ibwt
function it is convenient to callsorted
on the data in row-major order, but convenient to append a new instance of the strings
in column-major order. In order to express both of these operations in the most convenient way, you havethe function has to transpose the matrix back and forth, and since youit do thisso functionally, each transposition has to copy the whole matrix.Pursuit of pure functional style often results in tacit or point-free programming tacit or point-free programming where variable names are avoided and data is routed from one function to another using combinators combinators. This can seem very elegant, but variable names have two roles in a program: they don't just move data around, they also act as mnemonics to remind programmers of what the data means. Point-free programs can end up being totally mysterious because you lose track of exactly what is being operated on.
I.......
I.......
S.......
S*......
T.......
X.......
X.......
Y.......
I.......
I.......
S.......
S*......
T.......
X.......
X.......
Y.......
S0...... I5......
S1...... I6......
Y2...... -> S0......
X3...... -> S1......
X4...... -> T7......
I5...... -> X3......
I6...... X4......
T7...... Y2......
S0...... I5......
S1...... I6......
Y2...... -> S0......
X3...... -> S1......
X4...... -> T7......
I5...... -> X3......
I6...... X4......
T7...... Y2......
def ibwt3(k, s):
"""
Return"""Return the `k`thkth row of the inverse Burrows-Wheeler Transform
matrix based on the string `s`s.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in range(len(s)):
t, k = permutation[k]
yield t
return ''.join(row(k))
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> ibwt3(*t) == s
True
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> ibwt3(*t) == s
True
>>> timeit(lambda:ibwt3(*t), number=1)
0.0013821125030517578
>>> timeit(lambda:ibwt3(*t), number=1)
0.0013821125030517578
(If you compare my ibwt3
to the sample Python implementation on Wikipedia sample Python implementation on Wikipedia you'll see that mine is not only much simpler, but also copes with arbitrary characters, not just bytes in the range 0–255. So either I've missed something important, or the Wikipedia article could be improved.)
First, the output of your function ibwt
doesn't match the example output in your question:
>>> from pprint import pprint
>>> pprint(ibwt('ssyxxiit'))
['iisstxxy',
'xxiiysts',
'stxxsiyi',
'iystixsx',
'xsiyxtis',
'tixssyxi',
'yxtiissx',
'ssyxxiit']
(You should always prepare your question by copying and pasting input and output from your source files and from an interactive session, rather than typing in what you think it should be. You can easily introduce or cover up mistakes if you do the latter.)
So it looks as if you are missing a transpose
:
>>> pprint(transpose(ibwt('ssyxxiit')))
['ixsixtys',
'ixtysixs',
'sixsixty',
'sixtysix',
'tysixsix',
'xsixtysi',
'xtysixsi',
'ysixsixt']
With that bug fixed, let's see how long your function takes on a medium-sized amount of data:
>>> import random, string
>>> from timeit import timeit
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt(t[1]), number=1)
49.83987808227539
Not so good! What can we change? Well, an obvious idea is to cut out all those calls to transpose
. At the moment you transpose your matrix in order to sort it, and then transpose it back again each time round the loop. By keeping the matrix in transposed form throughout, you can avoid having to call transpose
at all:
def ibwt2(s):
n = len(s)
m = [''] * n
for _ in xrange(n):
m = sorted(s[i] + m[i] for i in xrange(n))
return m
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt2(t[1]), number=1)
0.9414288997650146
Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming your code so as to use a functional style (for example in this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
Pure functional programs can easily end up doing a lot of allocation and copying. For example, it is often tempting to transform your data into a format suitable for passing in to a function (rather than transforming your function so that it operates on the data you have). Thus in your
ibwt
function it is convenient to callsorted
on the data in row-major order, but convenient to append a new instance of the strings
in column-major order. In order to express both of these operations in the most convenient way, you have to transpose the matrix back and forth, and since you do this functionally, each transposition has to copy the whole matrix.Pursuit of pure functional style often results in tacit or point-free programming where variable names are avoided and data is routed from one function to another using combinators. This can seem very elegant, but variable names have two roles in a program: they don't just move data around, they also act as mnemonics to remind programmers of what the data means. Point-free programs can end up being totally mysterious because you lose track of exactly what is being operated on.
I.......
I.......
S.......
S*......
T.......
X.......
X.......
Y.......
S0...... I5......
S1...... I6......
Y2...... -> S0......
X3...... -> S1......
X4...... -> T7......
I5...... -> X3......
I6...... X4......
T7...... Y2......
def ibwt3(k, s):
"""
Return the `k`th row of the inverse Burrows-Wheeler Transform
matrix based on the string `s`.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in range(len(s)):
t, k = permutation[k]
yield t
return ''.join(row(k))
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
>>> s = ''.join(random.choice(string.printable) for _ in xrange(1024))
>>> t = bwt(s)
>>> ibwt3(*t) == s
True
>>> timeit(lambda:ibwt3(*t), number=1)
0.0013821125030517578
(If you compare my ibwt3
to the sample Python implementation on Wikipedia you'll see that mine is not only much simpler, but also copes with arbitrary characters, not just bytes in the range 0–255. So either I've missed something important, or the Wikipedia article could be improved.)
First, the output of the function ibwt
doesn't match the example output in the question:
>>> from pprint import pprint
>>> pprint(ibwt('ssyxxiit'))
['iisstxxy',
'xxiiysts',
'stxxsiyi',
'iystixsx',
'xsiyxtis',
'tixssyxi',
'yxtiissx',
'ssyxxiit']
You should always prepare your question by copying and pasting input and output from source files and from an interactive session, rather than typing in what you think it should be. It's easy to introduce or cover up mistakes if you do the latter.
So it looks as if there is a missing transpose
:
>>> pprint(transpose(ibwt('ssyxxiit')))
['ixsixtys',
'ixtysixs',
'sixsixty',
'sixtysix',
'tysixsix',
'xsixtysi',
'xtysixsi',
'ysixsixt']
With that bug fixed, let's see how long the function takes on a medium-sized amount of data:
>>> import random, string
>>> from timeit import timeit
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt(t[1]), number=1)
49.83987808227539
Not so good! What can we change? Well, an obvious idea is to cut out all those calls to transpose
. The function ibwt
transposes the matrix in order to sort it, and then transposes it back again each time round the loop. By keeping the matrix in transposed form throughout, it would be possible to avoid having to call transpose
at all:
def ibwt2(s):
"""Return the inverse Burrows-Wheeler Transform matrix based on the
string s.
>>> ibwt2('SSYXXIIT')[3]
'SIXTYSIX'
"""
matrix = [''] * len(s)
for _ in s:
matrix = sorted(i + j for i, j in zip(s, matrix))
return matrix
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> timeit(lambda:ibwt2(t[1]), number=1)
0.9414288997650146
Something that's perhaps worth mentioning at this point is that you seem quite keen on transforming code so as to use a functional style (for example in this question, where you wanted to express a loop using reduce
). Transforming code into pure functional form is an excellent intellectual exercise, but you have to be aware of two problems that often arise:
Pure functional programs can easily end up doing a lot of allocation and copying. For example, it is often tempting to transform your data into a format suitable for passing in to a function (rather than transforming your function so that it operates on the data you have). Thus in
ibwt
it is convenient to callsorted
on the data in row-major order, but convenient to append a new instance of the strings
in column-major order. In order to express both of these operations in the most convenient way, the function has to transpose the matrix back and forth, and since it do so functionally, each transposition has to copy the whole matrix.Pursuit of pure functional style often results in tacit or point-free programming where variable names are avoided and data is routed from one function to another using combinators. This can seem very elegant, but variable names have two roles in a program: they don't just move data around, they also act as mnemonics to remind programmers of what the data means. Point-free programs can end up being totally mysterious because you lose track of exactly what is being operated on.
I.......
I.......
S.......
S*......
T.......
X.......
X.......
Y.......
S0...... I5......
S1...... I6......
Y2...... -> S0......
X3...... -> S1......
X4...... -> T7......
I5...... -> X3......
I6...... X4......
T7...... Y2......
def ibwt3(k, s):
"""Return the kth row of the inverse Burrows-Wheeler Transform
matrix based on the string s.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in s:
t, k = permutation[k]
yield t
return ''.join(row(k))
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
>>> s = ''.join(random.choice(string.printable) for _ in range(1024))
>>> t = bwt(s)
>>> ibwt3(*t) == s
True
>>> timeit(lambda:ibwt3(*t), number=1)
0.0013821125030517578
(If you compare my ibwt3
to the sample Python implementation on Wikipedia you'll see that mine is not only much simpler, but also copes with arbitrary characters, not just bytes in the range 0–255. So either I've missed something important, or the Wikipedia article could be improved.)
def ibwt3(k, s):
"""
Return the `k`th row of the inverse Burrows-Wheeler transformTransform
matrix forbased on the string `s`.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in xrangerange(len(s)):
t, k = permutation[k]
yield t
return ''.join(row(k))
def ibwt3(k, s):
"""
Return the `k`th row of the inverse Burrows-Wheeler transform
matrix for the string `s`.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row():
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in xrange(len(s)):
t, k = permutation[k]
yield t
return ''.join(row())
def ibwt3(k, s):
"""
Return the `k`th row of the inverse Burrows-Wheeler Transform
matrix based on the string `s`.
>>> ibwt3(3, 'SSYXXIIT')
'SIXTYSIX'
"""
def row(k):
permutation = sorted((t, i) for i, t in enumerate(s))
for _ in range(len(s)):
t, k = permutation[k]
yield t
return ''.join(row(k))
- 50.1k
- 3
- 130
- 210