Extract the largest value for each day from a matrix

Question 1

I have a matrix in which the right-most elements are repeated YYYYMMDD dates in descending order, for example:

 40 1122 1711 20160326
 169 700 950 20160326
 40 1630 1711 20160326
 182 700 950 20160327
 40 1029 1711 20160327
 169 700 950 20160327
 40 1630 1711 20160327
 122 700 950 20160328
 40 1630 1711 20160328
 169 700 950 20160328
 40 1630 1711 20160328
3049 700 950 20160331
 40 1630 1711 20160331
3049 700 950 20160331
 40 1630 1711 20160331
 169 700 950 20160401
 40 1630 1711 20160401
 169 700 950 20160401
 40 1630 1711 20160401

Within each date, I want to keep only the row that corresponds to the largest element in the leftmost column. So I would like to produce a new matrix:

 169 700 950 20160326
 182 700 950 20160327
 169 700 950 20160328
3049 700 950 20160331
 169 700 950 20160401

The code I have now is:

idx1 = find([1;diff(A(:,4))]);
idx2 = find([diff(A(:,4));1]);
B = zeros(length(idx1),4);
for ii = 1:length(idx1)
 row_number = find(A(idx1(ii):idx2(ii),1) == max(A(idx1(ii):idx2(ii),1)),1);
 B(ii,:) = A(idx1(ii)+row_number-1,:);
end

Are there ways to improve this code? I'm looking for coding conventions, improve performance, possible vectorization etc.

Question 2

It looks like there are about 4 rows for each date (except the first one, possibly excluded from your copy). Is this a coincidence, or a usable pattern within the data?

Question 3

A few notes: (I implemented this in Octave, so there may be differences between my code and a proper Matlab implementation, but there shouldn't be).

First we care about only unique dates, right? So lets grab indices for their groupings. I'm assuming this original data is stored in the matrix A.
```
[X, y, z] = unique(A(:, 4))
```
Now z is the only value I used from this output, so you could ignore the other returns by putting ~ in their place.
Second, let's split out data up based on these groups. We can use the function accumarray() to accomplish this nicely. If we pass in the function handle @max to it, it'll even find the max within these groups.
```
B = accumarray(z, A(:, 1), [], @max)
```
There is some trick to get each row corresponding to that maximum value based on the z parameter, but after investigating this for a few hours I still haven't figured it out and have given up. You could try using find() to get these rows instead, using z to make sure we are grabbing the right values.

Question 4

Nice =) accumarray with @max was really clever, but man I'm having a hard time trying to get the entire row out! I think I'm giving up too =/ It would be easy if it was only the first and last column, but grabbing those in between was hard. I posted another suggestion with sorting first, followed by unique. =)

Question 5

The non-vectorized approach first:

Your code looks pretty good, but you're performing some of the operation multiple times. This won't matter much for small matrices, but it's a good habit to try to avoid such overhead.

For instance, the two vectors idx1 and idx2 could be created like this:
```
ind = find([1;diff(A(:,4));1]);
idx1 = ind(1:end-1);
idx2 = ind(2:end)-1;
```
It might look more cumbersome, but it will be a lot faster since there's only one call to find and only one call to diff.
In general, it's better to use numel instead of length to find the number of elements in a vector. Not only is it more robust, it's also a lot faster for large vectors.
You are not using i and j as variables. Good!

Vectorization

You can achieve this quite simply using a combination of sortrows and unique.

There is one important thing you should know first: When calling unique with two outputs, the first will give you the unique values, while the second output will give you the index of the last instance of the element.

You want the row with the largest value in the first column for each unique date. What you can do then is sort the rows of the matrix based on the values in the first column in ascending order. Then, you use unique on the last column to find the last instance of each date. Now, you can use the sorted matrix and the indices of the unique dates to find your final output matrix:

B = sortrows(A); % Sort the column based on the values in the first column
[~, ia] = unique(B(:,4)); % Find the last instance of each unique date
B(ia,:) % Use those indices in the sorted matrix to get the final output
ans =
 169 700 950 20160326
 182 700 950 20160327
 169 700 950 20160328
 3049 700 950 20160331
 169 700 950 20160401

syb0rg syb0rg 21.9k10 gold badges113 silver badges192 bronze badges · Answer 1 · 2016-07-08 14:39:15Z

A few notes: (I implemented this in Octave, so there may be differences between my code and a proper Matlab implementation, but there shouldn't be).

First we care about only unique dates, right? So lets grab indices for their groupings. I'm assuming this original data is stored in the matrix A.
```
[X, y, z] = unique(A(:, 4))
```
Now z is the only value I used from this output, so you could ignore the other returns by putting ~ in their place.
Second, let's split out data up based on these groups. We can use the function accumarray() to accomplish this nicely. If we pass in the function handle @max to it, it'll even find the max within these groups.
```
B = accumarray(z, A(:, 1), [], @max)
```
There is some trick to get each row corresponding to that maximum value based on the z parameter, but after investigating this for a few hours I still haven't figured it out and have given up. You could try using find() to get these rows instead, using z to make sure we are grabbing the right values.

Nice =) accumarray with @max was really clever, but man I'm having a hard time trying to get the entire row out! I think I'm giving up too =/ It would be easy if it was only the first and last column, but grabbing those in between was hard. I posted another suggestion with sorting first, followed by unique. =)

Stewie Griffin Stewie Griffin 2,0771 gold badge18 silver badges34 bronze badges · Answer 2 · 2016-07-09 14:43:05Z

The non-vectorized approach first:

Your code looks pretty good, but you're performing some of the operation multiple times. This won't matter much for small matrices, but it's a good habit to try to avoid such overhead.

For instance, the two vectors idx1 and idx2 could be created like this:
```
ind = find([1;diff(A(:,4));1]);
idx1 = ind(1:end-1);
idx2 = ind(2:end)-1;
```
It might look more cumbersome, but it will be a lot faster since there's only one call to find and only one call to diff.
In general, it's better to use numel instead of length to find the number of elements in a vector. Not only is it more robust, it's also a lot faster for large vectors.
You are not using i and j as variables. Good!

Vectorization

You can achieve this quite simply using a combination of sortrows and unique.

There is one important thing you should know first: When calling unique with two outputs, the first will give you the unique values, while the second output will give you the index of the last instance of the element.

You want the row with the largest value in the first column for each unique date. What you can do then is sort the rows of the matrix based on the values in the first column in ascending order. Then, you use unique on the last column to find the last instance of each date. Now, you can use the sorted matrix and the indices of the unique dates to find your final output matrix:

B = sortrows(A); % Sort the column based on the values in the first column
[~, ia] = unique(B(:,4)); % Find the last instance of each unique date
B(ia,:) % Use those indices in the sorted matrix to get the final output
ans =
 169 700 950 20160326
 182 700 950 20160327
 169 700 950 20160328
 3049 700 950 20160331
 169 700 950 20160401

Stack Exchange Network

Extract the largest value for each day from a matrix

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extract the largest value for each day from a matrix

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions