I have a matrix in which the right-most elements are repeated YYYYMMDD dates in descending order, for example:
40 1122 1711 20160326
169 700 950 20160326
40 1630 1711 20160326
182 700 950 20160327
40 1029 1711 20160327
169 700 950 20160327
40 1630 1711 20160327
122 700 950 20160328
40 1630 1711 20160328
169 700 950 20160328
40 1630 1711 20160328
3049 700 950 20160331
40 1630 1711 20160331
3049 700 950 20160331
40 1630 1711 20160331
169 700 950 20160401
40 1630 1711 20160401
169 700 950 20160401
40 1630 1711 20160401
Within each date, I want to keep only the row that corresponds to the largest element in the leftmost column. So I would like to produce a new matrix:
169 700 950 20160326
182 700 950 20160327
169 700 950 20160328
3049 700 950 20160331
169 700 950 20160401
The code I have now is:
idx1 = find([1;diff(A(:,4))]);
idx2 = find([diff(A(:,4));1]);
B = zeros(length(idx1),4);
for ii = 1:length(idx1)
row_number = find(A(idx1(ii):idx2(ii),1) == max(A(idx1(ii):idx2(ii),1)),1);
B(ii,:) = A(idx1(ii)+row_number-1,:);
end
Are there ways to improve this code? I'm looking for coding conventions, improve performance, possible vectorization etc.
-
1\$\begingroup\$ It looks like there are about 4 rows for each date (except the first one, possibly excluded from your copy). Is this a coincidence, or a usable pattern within the data? \$\endgroup\$syb0rg– syb0rg2016年07月08日 12:38:17 +00:00Commented Jul 8, 2016 at 12:38
2 Answers 2
A few notes: (I implemented this in Octave, so there may be differences between my code and a proper Matlab implementation, but there shouldn't be).
First we care about only unique dates, right? So lets grab indices for their groupings. I'm assuming this original data is stored in the matrix
A
.[X, y, z] = unique(A(:, 4))
Now
z
is the only value I used from this output, so you could ignore the other returns by putting~
in their place.Second, let's split out data up based on these groups. We can use the function
accumarray()
to accomplish this nicely. If we pass in the function handle@max
to it, it'll even find the max within these groups.B = accumarray(z, A(:, 1), [], @max)
There is some trick to get each row corresponding to that maximum value based on the
z
parameter, but after investigating this for a few hours I still haven't figured it out and have given up. You could try usingfind()
to get these rows instead, usingz
to make sure we are grabbing the right values.
-
\$\begingroup\$ Nice =)
accumarray
with@max
was really clever, but man I'm having a hard time trying to get the entire row out! I think I'm giving up too =/ It would be easy if it was only the first and last column, but grabbing those in between was hard. I posted another suggestion with sorting first, followed by unique. =) \$\endgroup\$Stewie Griffin– Stewie Griffin2016年07月09日 14:53:28 +00:00Commented Jul 9, 2016 at 14:53
The non-vectorized approach first:
Your code looks pretty good, but you're performing some of the operation multiple times. This won't matter much for small matrices, but it's a good habit to try to avoid such overhead.
For instance, the two vectors
idx1
andidx2
could be created like this:ind = find([1;diff(A(:,4));1]); idx1 = ind(1:end-1); idx2 = ind(2:end)-1;
It might look more cumbersome, but it will be a lot faster since there's only one call to
find
and only one call todiff
.In general, it's better to use
numel
instead oflength
to find the number of elements in a vector. Not only is it more robust, it's also a lot faster for large vectors.You are not using
i
andj
as variables. Good!
Vectorization
You can achieve this quite simply using a combination of sortrows
and unique
.
There is one important thing you should know first: When calling unique
with two outputs, the first will give you the unique values, while the second output will give you the index of the last instance of the element.
You want the row with the largest value in the first column for each unique date. What you can do then is sort the rows of the matrix based on the values in the first column in ascending order. Then, you use unique
on the last column to find the last instance of each date. Now, you can use the sorted matrix and the indices of the unique dates to find your final output matrix:
B = sortrows(A); % Sort the column based on the values in the first column
[~, ia] = unique(B(:,4)); % Find the last instance of each unique date
B(ia,:) % Use those indices in the sorted matrix to get the final output
ans =
169 700 950 20160326
182 700 950 20160327
169 700 950 20160328
3049 700 950 20160331
169 700 950 20160401
Explore related questions
See similar questions with these tags.