Transformation to Independent Univariate Sample
Description
Leave-one-out method gives approximately independent sample of standard multivariate normal distribution, which then produces sample of standard univariate normal distribution.
Usage
Multi.to.Uni(x)
Arguments
x
multivariate data matrix
Details
Let \bar{X}_{-k} and S_{-k} are the sample mean sample variance
covariance matrix obtained by using all but k^{th} data point. Then
S_{-k}^{-1/2} (X_k - \bar{X}_{-k}) , k = 1,... n are approximately
independently distributed as N_p(0, I). Thus all n \times p
entries in the data matrix so constructed can be treated as
univariate samples of size n \times p from N(0, 1).
Value
Data frame contains univariate data and the index from multivariate data.
Examples
set.seed(1)
x <- MASS::mvrnorm(100, mu = rep(0, 5), diag(5))
df <- Multi.to.Uni(x)
qqnorm(df$x.new); abline(0, 1)
Graphical plots to assess multivariate normality assumption.
Description
Cumulant generating functions of normally distributed
random variables has derivatives of order higher than 3 are all 0.
Hence, plots of empirical third/fourth order derivatives with large value
or high slope gives indication of non-normality.
Multivariate_CGF_PLot estimates and provides confidence region for
average (or any linear combination) of third/fourth derivatives of empirical
cumulant function at the points t = t^*1_p. Plots for
p = 2, 3, \dots, 10 will be faster to obtain, as confidence regions
and other necessary parameters are available in mt3_lst_param.rda and
mt4_lst_param.rda.
Higher dimension requires expensive computational cost.
Usage
d3hCGF_plot(x, alpha = 0.05)
d4hCGF_plot(x, alpha = 0.05)
Arguments
x
Data matrix of size n \times p
alpha
Significant level (default is .05)
Value
d3hCGF_plot returns plot relying in third derivatives.
d4hCGF_plot returns plot relying in forth derivatives.
See Also
Examples
set.seed(1234)
p <- 3
x <- MASS::mvrnorm(500, rep(0, p), diag(p))
d3hCGF_plot(x)
d4hCGF_plot(x)
Graphical plots to assess multivariate univarite assumption of data.
Description
Plots the empirical third/fourth derivatives of cumulant generating function together with confidence probability region. Indication of non-normality is either violation of probability bands or curves with high slope.
Usage
dhCGF_plot1D(x, alpha = 0.05, method)
Arguments
x
Univariate data
alpha
Significant level (default is .05)
method
string, "T3" used the third derivatives,
and "T4" uses the fourth derivatives.
Value
Plots
References
Ghosh S (1996). “A New Graphical Tool to Detect Non-Normality.” Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 691-702. doi:10.1111/j.2517-6161.1996.tb02108.x.
Examples
set.seed(123)
x <- rnorm(100)
dhCGF_plot1D(x, method = "T3")
dhCGF_plot1D(x, method = "T4")
Graphical plots to assess the univarite noramality assumption of data.
Description
Score function of a univariate normal distribution is a straight line. A non-linear graph of score function estimator shows evidence of non-normality.
Outliers are detected using the 2-sigma bands method.
Usage
cox(x, P = NULL, lambda = 0.5, x.dist = NULL)
score_plot1D(x, P = NULL, lambda = 0.5, x.dist = NULL, ori.index = NULL)
Arguments
x
univariate data.
P
vector of weight.
lambda
smoothing parameter, default is 0.5.
x.dist
the minimum distance between two data points in vector x.
ori.index
original index of vector x, default is NULL
when index is just the order.
Details
To avoid the singularity of coefficient matrices in spline method, points
with distance less than x.dist are merged and weight of the
representative points is updated by the summation of weight of
discarded points.
Under null hypothesis, a unbiased estimator score function of a
given data point x_k is
\hat{\psi}(x_k) = \dfrac{n - 4}{n - 2} \dfrac{x_k - \bar{X}_{-k}}{S_{-k}^2}
and if a_{k} is the estimate score from function cox at
the point x_k, then
a_k\in \hat{\psi}(x_k) \pm 2 \sqrt{\hat{\text{Var}}(\hat{\psi}(x_k))}.
Hence points outside the 2-sigma bands are outliers.
Value
cox returns the estimate of score function.
x: The updated univariate data if merging happens.a: Score value estimated atx.P: Updated weight (if merging happens).slt: Index of merged data point (isNULLifx.dist = NULL).
score_plot1D returns score functions together with
2-sigma bands for outlier detection.
plot: plot of estimate score function and its band.outlier: index of outliers.
References
Ng PT (1994). “Smoothing Spline Score Estimation.” SIAM Journal on Scientific Computing, 15(5), 1003-1025. doi:10.1137/0915061, https://doi.org/10.1137/0915061.
Examples
set.seed(1)
x <- rnorm(100, 2, 4)
re <- cox(sort(x))
plot(re$x, re$a, xlab = "x", ylab = "Estimated Score",
main = "Estimator of score function")
abline(0, 1)
set.seed(1)
x <- rnorm(100, 2, 4)
score_plot1D(sort(x))
Linear combinations of distinct derivatives of empirical cumulant generating function (CGF).
Description
Linear combination of third/fourth derivatives of CGF gives an asymptotically
univariate Gaussian process with mean 0 and covariance between two points
t \in \mathbb{R}^p and s \in \mathbb{R}^p is defined.
We consider vector t and s as the form t = t^*1_p
and s = s^*1_p.
Usage
mt3_covLtLs(l, p, bigt = seq(-1, 1, 0.05)/sqrt(p), sTtTs = NULL, seed = 1)
mt4_covLtLs(l, p, bigt = seq(-1, 1, 0.05)/sqrt(p), sTtTs = NULL, seed = 1)
Arguments
l
vector of linear combination of size equal to the number of distinct
derivatives, see l_dhCGF() .
p
dimension of multivariate random vector which data are collected.
bigt
array of value t^* and s^*.
sTtTs
Covariance matrix of derivatives vector,
see covTtTs() . Default is NULL,
when the algorithm
will call mt3_covTtTs() or
mt4_covTtTs() .
seed
Random seed to get the estimate of the supremum of the univariate Gaussian process obtained from the linear combination.
Value
sLtLscovariance matrix of the linear combination of distinct derivatives, which is a zero-mean Gaussian process.m.supLtMonte-Carlo estimates of supremum of this Gaussian process
mt3_covLtLs returns values related to the use of third derivatives.
mt4_covLtLs returns values related to the use of fourth derivatives.
Examples
bigt <- seq(-1, 1, .5)
p <- 2
# Third derivatives
lT3 <- l_dhCGF(p)[[1]]
l3 <- rep(1/sqrt(lT3), lT3)
mt3_covLtLs(l = l3, p = p, bigt = bigt/sqrt(p), seed = 1)
#fourth derivatives
lT4 <- l_dhCGF(p)[[2]]
l4 <- rep(1/sqrt(lT4), lT4)
mt4_covLtLs(l = l4, p = p, bigt = bigt/sqrt(p), seed = 1)
Covariance matrix of derivatives of sample cumulant generating function (CGF).
Description
Stacking third/fourth derivatives of sample CGF together
to obtain a vector, which (under normality assumption on data) approaches
a normally distributed vector with zero mean and a covariance matrix.
More specifically, covTsTs computes covariance between any two
points as the form t = t^*1_p and s = s^*1_p.
Usage
mt3_covTtTs(bigt, p = 1, pos.matrix = NULL)
mt4_covTtTs(bigt, p = 1, pos.matrix = NULL)
Arguments
bigt
array contains value of t^*.
p
dimension of multivariate random vector which data are collected.
pos.matrix
matrix containing information of position of any
derivatives. Default is NULL, the function will call
mt3_pos() or mt4_pos() .
Details
Number of distinct third derivatives is
l_{T_3}= p + 2 \times \begin{pmatrix}
p\2円
\end{pmatrix} + \begin{pmatrix}
p \\ 3
\end{pmatrix}
Number of distinct fourth derivatives is
l_{T_4} = p + 3 \times \begin{pmatrix}
p\2円
\end{pmatrix} + 3 \times \begin{pmatrix}
p \\ 3
\end{pmatrix} + \begin{pmatrix}
p \\ 4
\end{pmatrix}
For each pairs of (t^*, s^*), covTsTt results a covariance
matrix of size l_{T_3} \times l_{T_3} or l_{T_4} \times l_{T_4}.
Value
A 2 dimensional upper triangular array, with size equals to
length of bigt. Each element contains a covariance matrix of
derivatives sequences between any two points t = t^* 1_p and
s = s^*1_p.
mt3_covTsTt returns the resulting third derivatives.
mt4_covTsTt returns the resulting forth derivatives.
Examples
bigt <- seq(-1, 1, .5)
p <- 2
# Third derivatives
mt3_pos.matrix <- mt3_pos(p)
sTsTt3 <- mt3_covTtTs(bigt = bigt, p = p, pos.matrix = mt3_pos.matrix)
dim(sTsTt3)
sTsTt3[1:5, 1:5]
# Fourth derivatives
mt4_pos.matrix <- mt4_pos(p)
sTsTt4 <- mt4_covTtTs(bigt = bigt, p = p, pos.matrix = mt4_pos.matrix)
dim(sTsTt4)
sTsTt4[1:5, 1:5]
Covariance matrix of derivatives of sample moment generating function (MGF).
Description
Stacking derivatives upto the third/fourth orders of sample MGF
together to obtain a vector, which (under normality assumption) approaches
a multivariate normally distributed vector
with zero mean and a covariance matrix.
covZtZs calculates covariance between any two points
t and s in \mathbb{R}^p.
Usage
mt3_covZtZs(t, s, pos.matrix = NULL)
mt4_covZtZs(t, s, pos.matrix = NULL)
Arguments
t, s
a vector of length p.
pos.matrix
matrix contains information of positions of derivatives.
Default is NULL, where the function will call
mt3_pos() or mt4_pos() .
Value
mt3_covZtZs Covariance matrix relating to the use
of third derivatives.
mt4_covZtZs Covariance matrix relating to the use of
fourth derivatives. This also contains information on the third
third derivatives mt3_covZtZs.
Examples
set.seed(1)
p <- 3
x <- MASS::mvrnorm(100, rep(0, p), diag(p))
t <- rep(0.2, p)
s <- rep(-.3, p)
# Using third derivatives
pos.matrix3 <- mt3_pos(p)
sZtZs3 <- mt3_covZtZs(t, s, pos.matrix = pos.matrix3)
dim(sZtZs3)
sZtZs3[1:5, 1:5]
# Using fourth derivatives
sZtZs4 <- mt4_covZtZs(t, s)
dim(sZtZs4)
sZtZs4[1:5, 1:5]
Calculation of derivatives of empirical cumulant generating function (CGF).
Description
Get the third/fortth derivatives of sample CGF at a given point.
Usage
d3hCGF(myt, x)
d4hCGF(myt, x)
l_dhCGF(p)
dhCGF1D(t, x)
Arguments
myt, t
numeric vector of length p.
x
data matrix.
p
Dimension.
Details
Estimator of standardized cumulant function is
\log\hat{M}_X(t) = \log \left(\dfrac{1}{n}
\sum_{i = 1}^n \exp(t'S^{\frac{-1}{2}}(X_i - \bar{X})) \right)
and its
k^{th}
order derivatives is defined as
T_k(t) = \dfrac{\partial^k}{
\partial t_{j_1}t_{j_2} \dots t_{j_k}} \log(\hat{M}_X(t)), t \in \mathbb{R}^p
where t_{j_1}t_{j_2} \dots t_{j_k} are the corresponding components
of vector t \in \mathbb{R}^p.
Value
d3hCGF returns the sequence of third derivatives of
empirical CGF, ordered by index of j_1 \leq j_2 \leq j_3 \leq p.
d4hCGF returns the sequence of fourth derivatives of empirical
CGF ordered by index of j_1 \leq j_2 \leq j_3 \leq j_4 \leq p.
l_dhCGF returns number of distinct third and
fourth derivatives.
dhCGF1D returns third/fourth derivatives of univariate
empirical CGF, which are d3hCGF and d4hCGF when p = 1.
Examples
p <- 3
# Number of distinct derivatives
l_dhCGF(p)
set.seed(1)
x <- MASS::mvrnorm(100, rep(0, p), diag(p))
myt <- rep(.2, p)
d3hCGF(myt = myt, x = x)
d4hCGF(myt = myt, x = x)
#Univariate data
set.seed(1)
x <- rnorm(100)
t <- .3
dhCGF1D(t, x)
Moment generating functions (MGF) of standard normal distribution.
Description
Get the polynomial term in the expression of derivatives of moment
generating function of N_p(0, I_p), with
respect to a given component and its exponent. Up to eighth order.
Usage
dMGF(tab, t, coef = TRUE)
Arguments
tab
a dataframe with the first column contain indices of components
of a multivariate random vector \bold{X}, and the second column is the
order derivatives with respect to that components.
t
vector in \mathbb{R}^p.
coef
take TRUE or FALSE value to
obtain only polynomial or whole expression by multiplying the
polynomial term with the exponent term \exp(.5 t't).
Details
For a standard multivariate normal random variables Y \sim N_p(0, I_p)
\mathbb{E}\left(Y_1^{k_1} ... Y_p^{k_p} \exp(t'X)\right) =
\dfrac{\partial^{k_1}\dots
\partial^{k_p}}{t_1^{k_1} \dots t_p^{k_p}} \exp(t't/2) =
\mu^{(k_1)} (t_1) ... \mu^{(k_p)}(t_p) \exp(t't/2)
For example,
\mathbb{E}Y_2^4 \exp(t'Y) = \dfrac{\partial^4}{\partial t_2^4} \exp(t't/2)
= \mu^{(4)}(t_2) \exp(t't/2).
Value
Value of derivatives.
Examples
#Calculation of above example
t <- rep(.2, 7)
tab <- data.frame(j = 2, exponent = 4)
dMGF(tab, t = t)
dMGF(tab, t = t, coef = FALSE)
Get parameters for plots derivatives of multivariate CGF to assess normality assumption.
Description
Obtain necessary parameters to build a graphical test using the third/fourth derivatives of cumulant generating function.
Usage
mt3_get_param(p, bigt = seq(-1, 1, by = 0.05)/sqrt(p), l = NULL)
mt4_get_param(p, bigt = seq(-1, 1, by = 0.05)/sqrt(p), l = NULL)
Arguments
p
Dimension.
bigt
Array containing value of t^*.
l
Linear transformation of vector of third/fourth distinct derivatives, default is their average.
Value
pDimension.lTNumber of distinct third/fourth order derivatives.sTtTsTwo dimensional array, each element contains covariance matrix of vector of derivatives, the function calledmt3_covTtTs(), ormt4_covTtTs().l.sTtTsCovariance matrix of linear combination of distinct derivatives, the function calledmt3_covLtLs(), ormt4_covLtLs().m.supLTThe Monte Carlo estimate of expected value supremum of the Gaussian process, seecovLtLs().
mt3_get_param returns necessary parameters for the 2D plot
relying on third derivatives.
mt4_get_param returns necessary parameters for the 2D plot
relying on fourth derivatives.
See Also
covZtZs() ,
covLtLs() , covTtTs()
Examples
p <- 2
mt3 <- mt3_get_param(p, bigt = seq(-1, 1, .5)/sqrt(p))
names(mt3)
mt4 <- mt4_get_param(p, bigt = seq(-1, 1, .5)/sqrt(p))
names(mt4)
Best Linear Transformations
Description
The algorithm uses gradient descent algorithm to obtain the maximum of the square of sample skewness, of the kurtosis or of their average under any univariate linear transformation of the multivariate data.
Usage
linear_transform(
x,
l0 = rep(1, ncol(x)),
method = "both",
epsilon = 1e-10,
iter = 5000,
stepsize = 0.001
)
Arguments
x
multivariate data matrix.
l0
starting point for projection algorithm,
default is rep(1, ncol(x)).
method
character strings,
one of c("skewness", "kurtosis", "both").
epsilon
bounds on error of optimal solution, default is 1e-10.
iter
number of iteration of projection algorithm,
default is 5000.
stepsize
gradient descent stepsize, default is .001.
Value
max_result: The maximum value after linear transformation.x_uni: Univariate data after transformation.vector_k: Vector of the "best" linear transformation.error: Error of projection algorithm.iteration: Number of iteration.
See Also
Examples
set.seed(1)
x <- MASS::mvrnorm(100, mu = rep(0, 2), diag(2))
linear_transform(x, method = "skewness")$max_result
linear_transform(x, method = "kurtosis")$max_result
linear_transform(x, method = "both")$max_result
From derivatives of MGF to derivatives of CGF.
Description
Taylor expansion implies that vectors of derivatives of
\log(\hat{M}_X(t)) can be approximated
by a linear combination of vectors of derivatives of \hat{M}_X(t).
matrix_A results the corresponding
linear combinations.
Usage
mt3_matrix_A(t)
mt4_matrix_A(t)
Arguments
t
vector of \mathbb{R}^p
Value
mt3_matrix_A returns coefficient matrix relating to the use
of third derivatives.
mt4_matrix_A returns coefficient matrix relating to the
use of fourth derivatives.
Examples
p <- 3
t <- rep(.2, p)
A3 <- mt3_matrix_A(t)
dim(A3)
A3[1:5, 1:5]
A4 <- mt4_matrix_A(t)
dim(A4)
A4[1:5, 1:5]
Derivatives of empirical moment generating function (MGF).
Description
Given dimension p, returns a dataframe containing the position of
all derivatives of
estimator of moment generating function \hat{M}_X(t),
upto third/fourth order.
Usage
mt3_rev_pos(j1, j2, j3, p)
mt3_pos(p)
mt4_pos(p)
Arguments
j1
Index of the first variables
j2
Index of the first variables, should be at least j1
j3
Index of the first variables, should be at least j2
p
Dimension
Details
The estimator of multivariate moment generating function is
\hat{M}_X(t) = \dfrac{1}{n} \sum_{i = 1}^n \exp(t'X_i)
The chain containing all derivatives up to the third order is
Z = \bigg(\hat{M}, \hat{M}^{001}, \dots \hat{M}^{00p},
\hat{M}^{011}, \hat{M}^{012}, \dots \hat{M}^{0pp}, \hat{M}^{111},
\hat{M}^{112}, \dots \hat{M}^{ppp}\bigg)'
and
\hat{M} = \hat{M}^{000}(t)= \hat{M}_X(t)
\hat{M}^{j_1j_2j_3}(t) =
\dfrac{\partial^k}{\partial t_{j_1} t_{j_2} t_{j_3}} \hat{M}(t)
where k is the number of j_1, j_2, j_3 different from 0.
Similar notation is applied when fourth derivatives is used.
Value
mt3_rev_pos returns the position of this particular derivative
in the chain of all derivatives, up to third order.
mt3_pos an array contaning all position with respect
to index of j_1, j_2, j_3.
mt4_pos an array contaning all position with respect to
the index of j_1, j_2, j_3, j_4.
Examples
mt3_rev_pos(1, 2, 2, p = 3)
p <- 3
mt3_pos(p)
mt4_pos(p)
Sample skewness and Sample Kurtosis.
Description
Sample skewness and Sample Kurtosis.
Usage
kurtosis(x)
skewness(x)
Arguments
x
univariate data sample
Details
Sample kurtosis is
\hat{\kappa}_4 =
\dfrac{1}{n-1} \sum_{i = 1}^n \left(\dfrac{X_i - \bar{X}}{S}\right)^4.
Sample skewness is
\hat{\kappa}_3 =
\dfrac{1}{n-1} \sum_{i = 1}^n \left(\dfrac{X_i - \bar{X}}{S}\right)^3.
Value
kurtosis returns sample kurtosis.
skewness returns sample skewness.
Examples
set.seed(123)
y <- rnorm(100)
kurtosis(y)
set.seed(123)
x <- rnorm(100)
skewness(x)