Correct implementation of Akaike Information criteria in Octave function?

Question 1

I've written the following Octave function which is designed to be repetively called in a Bayesian optimisation routine and which outputs a value J, the Akaike Information Criteria for the Logistic regression model being tested, given the data

function J = rvfl_training_of_cyclic_embedding ( x )
global sample_features ; global sample_targets ;
epsilon = 1e-15 ; ## to ensure log() does not give out a nan
## check input x
if ( numel( x ) != 6 )
 error( 'The input vector x must be of length 6.' ) ;
endif
## get the parameters from input x
hidden_layer_size = floor( x( 1 ) ) ; ## number of neurons in hidden layer
randomisation_type = floor( x( 2 ) ) ; ## 1 == uniform, 2 == Gaussian
scale_mode = floor( x( 3 ) ) ; ## 1 will scale the features for all neurons, 2 will scale the features for each hidden
## neuron separately, 3 will scale the range of the randomization for uniform distribution
scale = x( 4 ) ; ## Linearly scale the random features before feeding into the nonlinear activation function. 
## In this implementation, we consider the threshold which leads to 0.99 of the maximum/minimum 
## value of the activation function as the saturating threshold.
## scale = 0.9 means all the random features will be linearly scaled
## into 0.9 * [ lower_saturating_threshold , upper_saturating_threshold ].
if_output_bias = floor( x( 5 ) + 0.5 ) ; ## Use output bias, or not? 1 == yes , 0 == no.
lambda = x( 6 ) ; ## the regularization coefficient lambda 
rand( 'seed' , 0 ) ;
randn( 'seed' , 0 ) ;
sample_targets_temp = sample_targets ;
[ Nsample , Nfea ] = size( sample_features ) ;
######### get type of randomisation from input x #################
if ( randomisation_type == 1 ) ## uniform randomisation 
 if ( scale_mode == 3 ) ## range scaled for uniform randomisation
 Weight = scale * ( rand( Nfea , hidden_layer_size ) * 2 - 1 ) ; ## scaled uniform random input weights to hidden layer
 Bias = scale * rand( 1 , hidden_layer_size ) ; ## scaled random bias weights to hidden layer
 else
 Weight = rand( Nfea , hidden_layer_size ) * 2 - 1 ; ## unscaled random input weights to hidden layer
 Bias = rand( 1 , hidden_layer_size ) ; ## unscaled random bias weights to hidden layer
 endif
elseif ( randomisation_type == 2 ) ## gaussian randomisation
 Weight = randn( Nfea , hidden_layer_size ) ; ## gaussian random input weights to hidden layer
 Bias = randn( 1 , hidden_layer_size ) ; ## gaussian random bias weights to hidden layer
else
 error( 'only Gaussian and Uniform are supported' )
endif
############################################################################
Bias_train = repmat( Bias , Nsample , 1 ) ;
H = sample_features * Weight + Bias_train ;
k_parameters = numel( Weight ) + numel( Bias_train ) ;
## Activation Function 
Saturating_threshold = [ -2.1 , 2.1 ] ;
Saturating_threshold_activate = [ 0 , 1 ] ;
if ( scale_mode == 1 )
 ## scale the features for all neurons 
 [ H , k , b ] = Scale_feature( H , Saturating_threshold , scale ) ;
elseif ( scale_mode == 2 ) 
 ## else scale the features for each hidden neuron separately
 [ H , k , b ] = Scale_feature_separately( H , Saturating_threshold , scale ) ;
endif
## actual activation, the radial basis function
H = exp( -abs( H ) ) ;
if ( if_output_bias == 1 )
 ## we will use an output bias
 H = [ H , ones( Nsample , 1 ) ] ; 
endif
## the direct link scaling options, concatenate hidden layer and sample_features 
if ( scale_mode == 1 )
 ## scale the features for all neurons
 sample_features_temp = sample_features .* k + b ;
 H = [ H , sample_features_temp ] ;
elseif ( scale_mode == 2 )
 ## else scale the features for each hidden neuron separately
 [ sample_features_temp , ktr , btr ] = Scale_feature_separately( sample_features , Saturating_threshold_activate , scale ) ;
 H = [ H , sample_features_temp ] ;
else
 H = [ H , sample_features ] ;
endif
H( isnan( H ) ) = 0 ; ## avoids any 'blowups' due to nans in H
## do the regularized least squares for concatenated hidden layer output
## and the original, possibly scaled, input sample_features
if ( hidden_layer_size < Nsample )
 beta = ( eye( size( H , 2 ) ) / lambda + H' * H ) \ H' * sample_targets_temp ;
else
 beta = H' * ( ( eye( size( H , 1 ) ) / lambda + H * H' ) \ sample_targets_temp ) ; 
endif
k_parameters = k_parameters + numel( beta ) ;
## get the model predicted target output
sample_targets_temp = H * beta ;
############################################################################
## the final logistic output
final_output = 1.0 ./ ( 1.0 .+ exp( -sample_targets_temp ) ) ;
max_likelihood = sum( log( final_output .+ epsilon ) .* sample_targets + log( 1 .- final_output .+ epsilon ) .* ( 1 .- sample_targets ) ) ;
## get Akaike Information criteria
J = 2 * k_parameters - 2 * max_likelihood ;
rand( 'state' ) ; randn( 'state' ) ; ## reset rng
endfunction

Most of the code is an adaptation of a github repository and I'm not too worried about that, but my main concern is the coding of the Akaike criteria towards the end as this is my own addition.

Comments as to this implementation are most welcome.

EDIT (in reponse to comment)

Some simple test code for the function is:

global sample_features = 2 .* ( rand( 10 , 3 ) .- 0.5 ) ;
global sample_targets = [ 1 ; 1 ; 1 ; 1 ; 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
x = [ 3 , 1 , 3 , 0.9 , 1 , 0.1 ] ; 
J = rvfl_training_of_cyclic_embedding ( x ) ;

although in reality the function is supplied to an optimisation routine thus:

fun = 'rvfl_training_of_cyclic_embedding' ;
[ xmin , fmin ] = bayesoptcont( fun , nDimensions , params , lb , ub ) ;

where nDimensions is the length of x, params is a structure containing options for the Bayesian optimisation routine and not pertinent to this question, and lb and ub are vectors with lower and upper bounds respectively for the values contained in x. The "bayesoptcont" function is part of the BayesOpt library.

Question 2

Please add some test code to demonstrate how this function is used. You do have an automatic test suite for this complicated piece of code, don't you?

Question 3

Why use global variables? Why not pass those values into the function as arguments?

sample_features = 2 .* ( rand( 10 , 3 ) .- 0.5 ) ;
sample_targets = [ 1 ; 1 ; 1 ; 1 ; 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
x = [ 3 , 1 , 3 , 0.9 , 1 , 0.1 ] ; 
J = rvfl_training_of_cyclic_embedding ( x, sample_features, sample_targets ) ;

Global variables make debugging harder and actually slow down the code. You should avoid them where possible. Here they don't do anything special, to me it looks like a lazy solution to a simple problem.

To call your optimization algorithm with this function, simply do

fun = @( x ) rvfl_training_of_cyclic_embedding ( x, sample_features, sample_targets );

The function does

rand( 'seed' , 0 ) ;
randn( 'seed' , 0 ) ;

at the beginning and

rand( 'state' ) ; randn( 'state' ) ; ## reset rng

at the end. This is a really bad practice. rand('seed',0) sets the random number generator to the 'seed' method, which is an old-fashioned RNG with really poor properties. rand('state') returns it to the 'state' method, which is the Mersenne Twister according to the documentation. The only reason to use the old generator is to reproduce old results. Please don't use it!

Style:

I have been using MATLAB since somewhere in the 1990's. I like the Octave project, but I dislike a few additions to the language they made. Using these additions corners your code making it unavailable to us MATLAB users. If instead of using endif and endfor you simply write end, and instead of using # for comments you use %, then your code can more or less directly run under MATLAB. The endfor addition seems pointless to me, indenting should show what the end applies to. The # comments are also highly unnecessary IMO.

If you write a comment block at the beginning of your function (either before or after the function line) then that block will be displayed when you type help rvfl_training_of_cyclic_embedding.

Cris Luengo Cris Luengo 6,9911 gold badge14 silver badges37 bronze badges · Answer 1 · 2019-12-13 12:59:44Z

Why use global variables? Why not pass those values into the function as arguments?

sample_features = 2 .* ( rand( 10 , 3 ) .- 0.5 ) ;
sample_targets = [ 1 ; 1 ; 1 ; 1 ; 0 ; 0 ; 0 ; 0 ; 1 ; 0 ] ;
x = [ 3 , 1 , 3 , 0.9 , 1 , 0.1 ] ; 
J = rvfl_training_of_cyclic_embedding ( x, sample_features, sample_targets ) ;

Global variables make debugging harder and actually slow down the code. You should avoid them where possible. Here they don't do anything special, to me it looks like a lazy solution to a simple problem.

To call your optimization algorithm with this function, simply do

fun = @( x ) rvfl_training_of_cyclic_embedding ( x, sample_features, sample_targets );

The function does

rand( 'seed' , 0 ) ;
randn( 'seed' , 0 ) ;

at the beginning and

rand( 'state' ) ; randn( 'state' ) ; ## reset rng

at the end. This is a really bad practice. rand('seed',0) sets the random number generator to the 'seed' method, which is an old-fashioned RNG with really poor properties. rand('state') returns it to the 'state' method, which is the Mersenne Twister according to the documentation. The only reason to use the old generator is to reproduce old results. Please don't use it!

Style:

I have been using MATLAB since somewhere in the 1990's. I like the Octave project, but I dislike a few additions to the language they made. Using these additions corners your code making it unavailable to us MATLAB users. If instead of using endif and endfor you simply write end, and instead of using # for comments you use %, then your code can more or less directly run under MATLAB. The endfor addition seems pointless to me, indenting should show what the end applies to. The # comments are also highly unnecessary IMO.

If you write a comment block at the beginning of your function (either before or after the function line) then that block will be displayed when you type help rvfl_training_of_cyclic_embedding.

Stack Exchange Network

Correct implementation of Akaike Information criteria in Octave function?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Correct implementation of Akaike Information criteria in Octave function?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions