I have written a script which is supposed to call the function at the bottom for each unique combination of values in the parameter vectors. That function is a simulation which I intend to run over all the unique combinations of those parameters. The simulation is run on a cluster, and the sim_ind
for each of these jobs is pulled from the environment.
Currently, my solution is to build a matrix with each row corresponding to a unique set of parameters. Then the sim_ind
value (passed from the environment) is used to determine which unique set of parameters will be used in that iteration by indexing the matrix.
The following is the code that I am using, which works:
% getting sim_ind from env
AI = getenv('PBS_ARRAYID');
sim_ind = str2num(AI);
time_slots = 40000;
% parameter vectors
up_down_ratio_arr = [1];
traffic_load_arr = [ 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 100, Inf ];
blocked_modes_ind_arr = [1 2 3];
traffic_model_arr = [1 2];
% total number of unique combinations
uniq = length(up_down_ratio_arr) * length(traffic_load_arr) * length(blocked_modes_ind_arr) * length(traffic_model_arr);
parameters = zeros(uniq, 4);
% propagating the parameter list matrix
counter = 0;
for i = up_down_ratio_arr
for j = traffic_load_arr
for k = blocked_modes_ind_arr
for l = traffic_model_arr
counter = counter + 1;
parameters(counter, 1) = i;
parameters(counter, 2) = j;
parameters(counter, 3) = k;
parameters(counter, 4) = l;
end
end
end
end
% wrapping the sim_ind
ind = mod(sim_ind, uniq);
% mod maps all multiples of the largest index to zero
if ind == 0
ind = uniq;
end
% retrieving a parameter set
up_down_ratio = parameters(ind,1);
traffic_load = parameters(ind,2);
blocked_modes_ind = parameters(ind,3);
traffic_model = parameters(ind,4);
% call the routine
MainCode_Single_Backahul_Cell_Sch(time_slots, up_down_ratio, traffic_load, blocked_modes_ind, traffic_model, ceil(sim_ind/uniq) );
Is there a better way?
I have a version in which I use mod
to map each value of the sim_ind
parameter to a value that exists in the range of each vector. The problem with this solution was that not all the combination were unique. For example, in the following code traffic_load_arr
is of size 10, and traffic_model_arr
is of size 2. Therefore, whenever the 8th element of traffic_load_arr
is selected; i.e. mod(sim_ind, 10) == 8
, then the 2nd element of traffic_model_arr
must also be selected. As a results, the parameter combination of traffic_load == 10
and traffic_model == 2
will never be formed.
I would appreciate for all code to be cross compatible with MATLAB and Octave.
2 Answers 2
For a full factorial experiment, you may want to avoid constructing the whole matrix just to get the factor levels for a single experiment, especially if you have a large number of factors and levels and you are using a language where loops are not very efficient. Here is how you can do it using mod
.
In a single factor experiment, you run the experiment over the range of values of the first factor. This is easy to do using mod
: use mod(sim_ind, nlevels_factor1)
, or
mod(sim_ind-1, nlevels_factor1) + 1
if you want to start at one (since MATLAB indexes arrays from 1).
To extend the experiment to a second factor, you want this new factor to have its first level value for the first nlevels_factor1
runs, then its second level value for the second nlevels_factor1
runs, etc... all the way up till the nlevels_factor1*nlevels_factor2
th run.
You'll use mod
again, but now you want the first argument to mod
to be 1 for the first nlevels_factor1
runs, 2 for the next nlevels_factor1
runs, etc. So instead of passing the sim_index
(or sim_index-1
) to mod, you'll pass ceil(sim_index/nlevels_factor1)
, which has value 1 for the first nlevels_factor1
runs, 2 for the next nlevels_factor1
runs, etc. Your factor level for the second factor will be
mod(ceil(sim_index/nlevels_factor1) - 1, nlevels_factor2) + 1
if you want to start from 1.
You can continue this procedure to extend your experiment to any number of factors. Just make the divisor inside the ceil
equal to the product of the number of levels of all of the previous factors.
Here's what it might look like for your example:
% getting sim_ind from env
AI = getenv('PBS_ARRAYID');
sim_ind = str2num(AI);
time_slots = 40000;
% parameter vectors
up_down_ratio_arr = [1];
traffic_load_arr = [ 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 100, Inf ];
blocked_modes_ind_arr = [1 2 3];
traffic_model_arr = [1 2];
size_up_down_ratio = length(up_down_ratio_arr);
size_traffic_load = length(traffic_load_arr);
size_blocked_modes = length(blocked_modes_ind_arr);
size_traffic_model = length(traffic_model_arr);
% total number of unique combinations
uniq = size_up_down_ratio*size_traffic_load*size_blocked_modes*size_traffic_model;
% repeat each value once: 1,2,3,4,5,6,7,8,9,10,1,2,3,4...
i_up_down_ratio = mod(sim_ind - 1,size_up_down_ratio) + 1;
% repeat each value size_so_far times:
% 1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3...
size_so_far = size_up_down_ratio;
i_traffic_load = mod(ceil(sim_ind/size_so_far) - 1, size_traffic_load) + 1;
size_so_far = size_so_far*size_traffic_load;
i_blocked_modes = mod(ceil(sim_ind/size_so_far) - 1, size_blocked_modes) + 1;
size_so_far = size_so_far*size_blocked_modes;
i_traffic_model = mod(ceil(sim_ind/size_so_far) - 1, size_traffic_model) + 1;
up_down_ratio = up_down_ratio_arr(i_up_down_ratio);
traffic_load = traffic_load_arr(i_traffic_load);
blocked_modes_ind = blocked_modes_ind_arr(i_blocked_modes);
traffic_model = traffic_model_arr(i_traffic_model);
% call the routine
MainCode_Single_Backahul_Cell_Sch(time_slots, up_down_ratio, traffic_load, blocked_modes_ind, traffic_model, ceil(sim_ind/uniq) );
-
1\$\begingroup\$ Very nice! =) +1 from me! \$\endgroup\$Stewie Griffin– Stewie Griffin2016年08月05日 07:27:44 +00:00Commented Aug 5, 2016 at 7:27
NOTE: The code in this answer is rather compact, in order to avoid a lot of scrolling. You might want to put in some additional spaces and line breaks.
You know how to do this! Your code looks very nice and clean!
But, this hurts my MATLAB eyes:
for i = up_down_ratio_arr
for j = traffic_load_arr
for k = blocked_modes_ind_arr
for l = traffic_model_arr
counter = counter + 1;
parameters(counter, 1) = i;
parameters(counter, 2) = j;
parameters(counter, 3) = k;
parameters(counter, 4) = l;
end
end
end
end
It's the prime example of something that should be done without loops... One thing that was hurting even more was that I apparently couldn't make a vectorized approach that was faster. The reason why it appeared to be slow was because I tried timing it with tic/toc
.
Tips: Don't use tic/toc
when benchmarking, use the superior timeit
! When benchmarking with timeit
I got results that was much in favor of vectorization. (Phew)
The vectorized approach (creds to Luis Mendo) in a function, so that I can time it:
function parameters = vectorized_approach(up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr)
vectors = {up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr};
n = numel(vectors); %// number of vectors
parameters = cell(1,n); %// pre-define to generate comma-separated list
[parameters{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
parameters = cat(n+1, parameters{:}); %// concat the n n-dim arrays along dimension n+1
parameters = reshape(parameters,[],n);
end
Your loop approach is placed in the function loop_approach
(I won't repeat it here).
up_down_ratio_arr = [1];
traffic_load_arr = [ 0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10, 100, Inf ];
blocked_modes_ind_arr = [1 2 3];
traffic_model_arr = [1 2];
f = @() loop_approach(up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr);
g = @() vectorized_approach(up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr);
isequal(f(),g())
ans =
1
timeit(f)/timeit(g)
ans =
1.6072
So, for your input variables, the vectorized approach is about 60% faster than the loops. Let's make the up_down_ratio_arr
a bit longer:
up_down_ratio_arr = [1:5];
f = @() loop_approach(up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr);
g = @() vectorized_approach(up_down_ratio_arr, traffic_load_arr, blocked_modes_ind_arr, traffic_model_arr);
isequal(f(),g())
ans =
1
timeit(f)/timeit(g)
ans =
7.2192
So, the vectorized approach is quite a bit faster for the data you posted in the question, and a lot faster for larger data sets.
Let's continue:
In general, numel
is better than length
, at least for long vectors. I recommend you get used to using it every time!
This, could be simplified:
if ind == 0
ind = uniq;
end
This does the same job, and avoids the if
: ind(ind == 0) = uniq;
.
I don't think I would have done ceil(sim_ind/uniq)
in the function call. I suggest you calculate this and give it a descriptive name (probably the same as in the function you're calling). Then use that variable as input argument.