Run length encoding of vectors in MATLAB: looped version and vectorized version

Question 1

A project I'm working on requires something akin to run length encoding on a vector in matlab. The data is in the form of a numeric vector, and the output is in the form of two vectors, elems and lens that hold the elements and their respective run lengths. For example data = [20 13 12 12 12 18 17 16 16 11] would give elems = [20 13 12 18 17 16 11] and lens = [1 1 3 1 1 2 1]. There is no minimum run length.

My first attempt uses loops:

clear all
data = [2 1 2 3 3 1 2 2 3 3 1 1 1 1 1 2 2 1 1 3 3 3 3 2 1 3 3 2 2 1 1 1 2 2 2 2 3 1 1 1];
n = length(data);
elems = zeros(n, 1);
lens = zeros(n, 1);
len = 1;
elem = data(1);
c = 1;
for k = 2:n
 if data(k) == elem
 len = len + 1;
 else
 elems(c) = elem;
 lens(c) = len;
 c = c + 1;
 elem = data(k);
 len = 1;
 end
end
elems(c) = elem;
lens(c) = len;
lens = lens(lens ~= 0);
elems = elems(lens ~= 0);

It's pretty basic although I couldn't think of a way to efficiently preallocate the elems and lens vectors to something smaller than n. I figured I could always start them small, e.g. n/2 or something, keep track of their size, then reallocate them to something larger, maybe 1.5 times the previous size, if they were running out of room, or something like that.

My second attempt is completely vectorized:

clear all
data = [2 1 2 3 3 1 2 2 3 3 1 1 1 1 1 2 2 1 1 3 3 3 3 2 1 3 3 2 2 1 1 1 2 2 2 2 3 1 1 1];
breaks = [true data(2:end) ~= data(1:end-1)];
run_starts = find([breaks true]);
elems_vec = data(breaks);
lens_vec = diff(run_starts);

Can either of these methods be improved? The algorithms are pretty simple, and Im really just looking for general feedback that can help me learn matlab a bit more.

Question 2

You can replace data(2:end) ~= data(1:end-1) with diff(data). Otherwise looks perfect to me, good job! +1

Question 3

@Divakar can you clarify what you mean? diff(data) is definitely not the same as data(2:end) ~= data(1:end-1) when using my example data (just run both in matlab to see that).

Question 4

Oh you are right, it's not the same. diff is basically difference between consecutive elements and it would work for your problem too - breaks = [true ; diff(data)];

Question 5

@Divakar I know what diff does, but run diff(data) and data(2:end) ~= data(1:end-1) on my example data and you can see clearly that you get different results (obviously, because for example, if you have [2 4], diff will return 2, but mine will return 1). Can you add an example of what you're talking about? If I make your replacement with diff in my second code sample, the code will fail because the elems_vec = data(breaks); will try to index data with a non-logical vector (since diff doesn't return 1's and 0's like my code).

Question 6

Sorry my bad really, I missed a tiny bit of code. So I meant this - breaks = [true diff(data)~=0] and in effect is same as data(2:end) ~= data(1:end-1) and thus would result in identical to your results. I am not sure if this is better with performance, which you can find out though. Rest of the code looked perfect to me.

Question 7

I'm not sure about the performance of my following code however I'm using tabulate() to do the job. As you can see, the loops for filtering elements not to compute the frequency of elements. In other words, you might want to look at tabulate() or hist() to compute the frequent occurrence of elements.

x = [20 13 12 12 12 18 17 16 16 11];
j = 1;
elems = []; % store repeated elements
for i = 1:length(x)
 if isempty(find(elems == x(i)))
 elems(j) = x(i);
 j = j + 1;
 end
end
elems
t = tabulate(x); % get frequency table
for i = 1:length(elems)
 lens(i) = t(elems(i),2); % get how many each element in elems vector occurs
end
lens

The result is

elems =
 20 13 12 18 17 16 11
lens =
 1 1 3 1 1 2 1

Question 8

I'll look into using hist(). I don't want to fork over more money to mathworks just for the statistics toolbox (for tabulate()).

CroCo CroCoCroCo 5743 gold badges6 silver badges14 bronze badges · Accepted Answer · 2014-10-07 21:36:41Z

I'm not sure about the performance of my following code however I'm using tabulate() to do the job. As you can see, the loops for filtering elements not to compute the frequency of elements. In other words, you might want to look at tabulate() or hist() to compute the frequent occurrence of elements.

x = [20 13 12 12 12 18 17 16 16 11];
j = 1;
elems = []; % store repeated elements
for i = 1:length(x)
 if isempty(find(elems == x(i)))
 elems(j) = x(i);
 j = j + 1;
 end
end
elems
t = tabulate(x); % get frequency table
for i = 1:length(elems)
 lens(i) = t(elems(i),2); % get how many each element in elems vector occurs
end
lens

The result is

elems =
 20 13 12 18 17 16 11
lens =
 1 1 3 1 1 2 1

I'll look into using hist(). I don't want to fork over more money to mathworks just for the statistics toolbox (for tabulate()).

Stack Exchange Network

Run length encoding of vectors in MATLAB: looped version and vectorized version

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Run length encoding of vectors in MATLAB: looped version and vectorized version

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions