5
\$\begingroup\$

I have a matrix in which the right-most elements are repeated YYYYMMDD dates in descending order, for example:

 40 1122 1711 20160326
 169 700 950 20160326
 40 1630 1711 20160326
 182 700 950 20160327
 40 1029 1711 20160327
 169 700 950 20160327
 40 1630 1711 20160327
 122 700 950 20160328
 40 1630 1711 20160328
 169 700 950 20160328
 40 1630 1711 20160328
3049 700 950 20160331
 40 1630 1711 20160331
3049 700 950 20160331
 40 1630 1711 20160331
 169 700 950 20160401
 40 1630 1711 20160401
 169 700 950 20160401
 40 1630 1711 20160401

Within each date, I want to keep only the row that corresponds to the largest element in the leftmost column. So I would like to produce a new matrix:

 169 700 950 20160326
 182 700 950 20160327
 169 700 950 20160328
3049 700 950 20160331
 169 700 950 20160401

The code I have now is:

idx1 = find([1;diff(A(:,4))]);
idx2 = find([diff(A(:,4));1]);
B = zeros(length(idx1),4);
for ii = 1:length(idx1)
 row_number = find(A(idx1(ii):idx2(ii),1) == max(A(idx1(ii):idx2(ii),1)),1);
 B(ii,:) = A(idx1(ii)+row_number-1,:);
end

Are there ways to improve this code? I'm looking for coding conventions, improve performance, possible vectorization etc.

syb0rg
21.9k10 gold badges113 silver badges192 bronze badges
asked Jun 30, 2016 at 10:21
\$\endgroup\$
1
  • 1
    \$\begingroup\$ It looks like there are about 4 rows for each date (except the first one, possibly excluded from your copy). Is this a coincidence, or a usable pattern within the data? \$\endgroup\$ Commented Jul 8, 2016 at 12:38

2 Answers 2

4
\$\begingroup\$

A few notes: (I implemented this in Octave, so there may be differences between my code and a proper Matlab implementation, but there shouldn't be).

  1. First we care about only unique dates, right? So lets grab indices for their groupings. I'm assuming this original data is stored in the matrix A.

    [X, y, z] = unique(A(:, 4))
    

    Now z is the only value I used from this output, so you could ignore the other returns by putting ~ in their place.

  2. Second, let's split out data up based on these groups. We can use the function accumarray() to accomplish this nicely. If we pass in the function handle @max to it, it'll even find the max within these groups.

    B = accumarray(z, A(:, 1), [], @max)
    

    There is some trick to get each row corresponding to that maximum value based on the z parameter, but after investigating this for a few hours I still haven't figured it out and have given up. You could try using find() to get these rows instead, using z to make sure we are grabbing the right values.

answered Jul 8, 2016 at 14:39
\$\endgroup\$
1
  • \$\begingroup\$ Nice =) accumarray with @max was really clever, but man I'm having a hard time trying to get the entire row out! I think I'm giving up too =/ It would be easy if it was only the first and last column, but grabbing those in between was hard. I posted another suggestion with sorting first, followed by unique. =) \$\endgroup\$ Commented Jul 9, 2016 at 14:53
1
\$\begingroup\$

The non-vectorized approach first:

  1. Your code looks pretty good, but you're performing some of the operation multiple times. This won't matter much for small matrices, but it's a good habit to try to avoid such overhead.

    For instance, the two vectors idx1 and idx2 could be created like this:

    ind = find([1;diff(A(:,4));1]);
    idx1 = ind(1:end-1);
    idx2 = ind(2:end)-1;
    

    It might look more cumbersome, but it will be a lot faster since there's only one call to find and only one call to diff.

  2. In general, it's better to use numel instead of length to find the number of elements in a vector. Not only is it more robust, it's also a lot faster for large vectors.

  3. You are not using i and j as variables. Good!


Vectorization

You can achieve this quite simply using a combination of sortrows and unique.

There is one important thing you should know first: When calling unique with two outputs, the first will give you the unique values, while the second output will give you the index of the last instance of the element.

You want the row with the largest value in the first column for each unique date. What you can do then is sort the rows of the matrix based on the values in the first column in ascending order. Then, you use unique on the last column to find the last instance of each date. Now, you can use the sorted matrix and the indices of the unique dates to find your final output matrix:

B = sortrows(A); % Sort the column based on the values in the first column
[~, ia] = unique(B(:,4)); % Find the last instance of each unique date
B(ia,:) % Use those indices in the sorted matrix to get the final output
ans =
 169 700 950 20160326
 182 700 950 20160327
 169 700 950 20160328
 3049 700 950 20160331
 169 700 950 20160401
answered Jul 9, 2016 at 14:43
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.