RcppDynProg
Description
Rcpp dynamic programming solutions for partitioning and machine learning problems. Includes out of sample fitting applications. Also supplies additional custom coders for the vtreat package. Please see https://github.com/WinVector/RcppDynProg for details.
Author(s)
John Mount
See Also
Useful links:
Report bugs at https://github.com/WinVector/RcppDynProg/issues
Build all partitions into intervals.
Description
Build all partitions into intervals.
Usage
all_partitions(n, kmax = n)
Arguments
n
integer, sequence lenght to choose from.
kmax
int, maximum number of segments in solution.
Value
list of all partitions.
Examples
all_partitions(4, 2)
const_cost
Description
Calculate out of sample total square error cost of using mean of points to estimate other points in interval. Zero indexed.
Usage
const_cost(y, w, min_seg, i, j)
Arguments
y
NumericVector, values to group in order.
w
NumericVector, weights.
min_seg
positive integer, minimum segment size (>=1).
i
integer, first index (inclusive).
j
integer, j>=i last index (inclusive);
Value
scalar, const cost of [i,...,j] interval (inclusive).
Examples
const_cost(c(1, 1, 2, 2), c(1, 1, 1, 1), 1, 0, 3)
const_cost_logistic
Description
Calculate logistic cost of using mean of points to estimate other points in interval. Zero indexed.
Usage
const_cost_logistic(y, w, min_seg, i, j)
Arguments
y
NumericVector, 0/1 values to group in order (should be in interval [0,1]).
w
NumericVector, weights (should be positive).
min_seg
positive integer, minimum segment size (>=1).
i
integer, first index (inclusive).
j
integer, j>=i last index (inclusive);
Value
scalar, const cost of [i,...,j] interval (inclusive).
Examples
const_cost_logistic(c(0.1, 0.1, 0.2, 0.2), c(1, 1, 1, 1), 1, 0, 3)
const_costs
Description
Built matrix of total out of sample interval square error costs for held-out means. One indexed.
Usage
const_costs(y, w, min_seg, indices)
Arguments
y
NumericVector, values to group in order.
w
NumericVector, weights.
min_seg
positive integer, minimum segment size (>=1).
indices
IntegerVector, order list of indices to pair.
Value
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
Examples
const_costs(c(1, 1, 2, 2), c(1, 1, 1, 1), 1, 1:4)
const_costs_logistic
Description
Built matrix of interval logistic costs for held-out means. One indexed.
Usage
const_costs_logistic(y, w, min_seg, indices)
Arguments
y
NumericVector, 0/1 values to group in order (should be in interval [0,1]).
w
NumericVector, weights (should be positive).
min_seg
positive integer, minimum segment size (>=1).
indices
IntegerVector, order list of indices to pair.
Value
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
Examples
const_costs_logistic(c(0.1, 0.1, 0.2, 0.2), c(1, 1, 1, 1), 1, 1:4)
lin_cost
Description
Calculate cost of using linear model fit on points to estimate other points in the interval. Zero indexed.
Usage
lin_cost(x, y, w, min_seg, i, j)
Arguments
x
NumericVector, x-coords of values to group.
y
NumericVector, values to group in order.
w
NumericVector, weights.
min_seg
positive integer, minimum segment size (>=1).
i
integer, first index (inclusive).
j
integer, j>=i last index (inclusive);
Value
scalar, linear cost of [i,...,j] interval (inclusive).
Examples
lin_cost(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 1, 0, 3)
lin_cost_logistic logistic deviance pricing
Description
Calculate deviance cost of using logistic model fit on points to estimate other points in the interval. Fits are evaluated in-sample. Zero indexed.
Usage
lin_cost_logistic(x, y, w, min_seg, i, j)
Arguments
x
NumericVector, x-coords of values to group.
y
NumericVector, values to group in order (should be in interval [0,1]).
w
NumericVector, weights (positive).
min_seg
positive integer, minimum segment size (>=1).
i
integer, first index (inclusive).
j
integer, j>=i last index (inclusive);
Value
scalar, linear cost of [i,...,j] interval (inclusive).
Examples
lin_cost_logistic(c(1, 2, 3, 4, 5, 6, 7), c(0, 0, 1, 0, 1, 1, 0), c(1, 1, 1, 1, 1, 1, 1), 3, 0, 6)
lin_costs
Description
Built matrix of interval costs for held-out linear models. One indexed.
Usage
lin_costs(x, y, w, min_seg, indices)
Arguments
x
NumericVector, x-coords of values to group.
y
NumericVector, values to group in order.
w
NumericVector, weights.
min_seg
positive integer, minimum segment size (>=1).
indices
IntegerVector, ordered list of indices to pair.
Value
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
Examples
lin_costs(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 1, 1:4)
lin_costs_logistic deviance costs.
Description
Built matrix of interval deviance costs for held-out logistic models. Fits are evaluated in-sample. One indexed.
Usage
lin_costs_logistic(x, y, w, min_seg, indices)
Arguments
x
NumericVector, x-coords of values to group.
y
NumericVector, values to group in order (should be in interval [0,1]).
w
NumericVector, weights (should be positive).
min_seg
positive integer, minimum segment size (>=1).
indices
IntegerVector, ordered list of indices to pair.
Value
xcosts NumericMatix, for j>=i xcosts(i,j) is the cost of partition element [i,...,j] (inclusive).
Examples
lin_costs_logistic(c(1, 2, 3, 4, 5, 6, 7), c(0, 0, 1, 0, 1, 1, 0), c(1, 1, 1, 1, 1, 1, 1), 3, 1:7)
In sample logistic predictions (in link space).
Description
logistic regression predictions. Zero indexed.
Usage
logistic_fits(x, y, w, i, j)
Arguments
x
NumericVector, expanatory variable.
y
NumericVector, 0/1 values to fit.
w
NumericVector, weights (required, positive).
i
integer, first index (inclusive).
j
integer, last index (inclusive).
Value
vector of predictions for interval.
Examples
set.seed(5)
d <- data.frame(x = rnorm(10))
d$y <- d$x + rnorm(nrow(d))>0
weights <- runif(nrow(d))
m <- glm(y~x, data = d, family = binomial, weights = weights)
d$pred1 <- predict(m, newdata = d, type = "link")
d$pred2 <- logistic_fits(d$x, d$y, weights, 0, nrow(d)-1)
d <- d[order(d$x), , drop = FALSE]
print(d)
logistic_fit
Description
Calculate y ~ sigmoid(a + b x) using iteratively re-weighted least squares. Zero indexed.
Usage
logistic_solve1(x, y, w, initial_link, i, j, skip)
Arguments
x
NumericVector, expanatory variable.
y
NumericVector, 0/1 values to fit.
w
NumericVector, weights (required, positive).
initial_link
initial link estimates (required, all zeroes is a good start).
i
integer, first index (inclusive).
j
integer, last index (inclusive).
skip
integer, index to skip (-1 to not skip).
Value
vector of a and b.
Examples
set.seed(5)
d <- data.frame(
x = rnorm(10),
y = sample(c(0,1), 10, replace = TRUE)
)
weights <- runif(nrow(d))
m <- glm(y~x, data = d, family = binomial, weights = weights)
coef(m)
logistic_solve1(d$x, d$y, weights, rep(0.0, nrow(d)), 0, nrow(d)-1, -1)
Piecewise constant fit.
Description
vtreat custom coder based on RcppDynProg::solve_for_partition().
Usage
piecewise_constant(varName, x, y, w = NULL)
Arguments
varName
character, name of variable to work on.
x
numeric, input values.
y
numeric, values to estimate.
w
numeric, weights.
Examples
piecewise_constant("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
Piecewise constant fit coder factory.
Description
Build a piecewise constant fit coder with some parameters bound in.
Usage
piecewise_constant_coder(
penalty = 1,
min_n_to_chunk = 1000,
min_seg = 10,
max_k = 1000
)
Arguments
penalty
per-segment cost penalty.
min_n_to_chunk
minimum n to subdivied problem.
min_seg
positive integer, minimum segment size.
max_k
maximum segments to divide into.
Value
a vtreat coder
Examples
coder <- piecewise_constant_coder(min_seg = 1)
coder("x", 1:8, c(-1, -1, -1, -1, 1, 1, 1, 1))
Piecewise linear fit.
Description
vtreat custom coder based on RcppDynProg::solve_for_partition().
Usage
piecewise_linear(varName, x, y, w = NULL)
Arguments
varName
character, name of variable to work on.
x
numeric, input values.
y
numeric, values to estimate.
w
numeric, weights.
Examples
piecewise_linear("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
Piecewise linear fit coder factory.
Description
Build a piecewise linear fit coder with some parameters bound in.
Usage
piecewise_linear_coder(
penalty = 1,
min_n_to_chunk = 1000,
min_seg = 10,
max_k = 1000
)
Arguments
penalty
per-segment cost penalty.
min_n_to_chunk
minimum n to subdivied problem.
min_seg
positive integer, minimum segment size.
max_k
maximum segments to divide into.
Value
a vtreat coder
Examples
coder <- piecewise_linear_coder(min_seg = 1)
coder("x", 1:8, c(1, 2, 3, 4, 4, 3, 2, 1))
compute the price of a partition solution (and check is valid).
Description
compute the price of a partition solution (and check is valid).
Usage
score_solution(x, solution)
Arguments
x
NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive).
solution
vector of indices
Value
price
Examples
x <- matrix(c(1,1,5,1,1,0,5,0,1), nrow=3)
s <- c(1, 2, 4)
score_solution(x, s)
Solve for a piecewise linear partiton.
Description
Solve for a good set of right-exclusive x-cuts such that the
overall graph of y~x is well-approximated by a piecewise linear
function. Solution is a ready for use with
with base::findInterval() and stats::approx()
(demonstrated in the examples).
Usage
solve_for_partition(
x,
y,
...,
w = NULL,
penalty = 0,
min_n_to_chunk = 1000,
min_seg = 1,
max_k = length(x)
)
Arguments
x
numeric, input variable (no NAs).
y
numeric, result variable (no NAs, same length as x).
...
not used, force later arguments by name.
w
numeric, weights (no NAs, positive, same length as x).
penalty
per-segment cost penalty.
min_n_to_chunk
minimum n to subdivied problem.
min_seg
positive integer, minimum segment size.
max_k
maximum segments to divide into.
Value
a data frame appropriate for stats::approx().
Examples
# example data
d <- data.frame(
x = 1:8,
y = c(1, 2, 3, 4, 4, 3, 2, 1))
# solve for break points
soln <- solve_for_partition(d$x, d$y)
# show solution
print(soln)
# label each point
d$group <- base::findInterval(
d$x,
soln$x[soln$what=='left'])
# apply piecewise approximation
d$estimate <- stats::approx(
soln$x,
soln$pred,
xout = d$x,
method = 'linear',
rule = 2)$y
# show result
print(d)
Solve for a piecewise constant partiton.
Description
Solve for a good set of right-exclusive x-cuts such that the
overall graph of y~x is well-approximated by a piecewise linear
function. Solution is a ready for use with
with base::findInterval() and stats::approx()
(demonstrated in the examples).
Usage
solve_for_partitionc(
x,
y,
...,
w = NULL,
penalty = 0,
min_n_to_chunk = 1000,
min_seg = 1,
max_k = length(x)
)
Arguments
x
numeric, input variable (no NAs).
y
numeric, result variable (no NAs, same length as x).
...
not used, force later arguments by name.
w
numeric, weights (no NAs, positive, same length as x).
penalty
per-segment cost penalty.
min_n_to_chunk
minimum n to subdivied problem.
min_seg
positive integer, minimum segment size.
max_k
maximum segments to divide into.
Value
a data frame appropriate for stats::approx().
Examples
# example data
d <- data.frame(
x = 1:8,
y = c(-1, -1, -1, -1, 1, 1, 1, 1))
# solve for break points
soln <- solve_for_partitionc(d$x, d$y)
# show solution
print(soln)
# label each point
d$group <- base::findInterval(
d$x,
soln$x[soln$what=='left'])
# apply piecewise approximation
d$estimate <- stats::approx(
soln$x,
soln$pred,
xout = d$x,
method = 'constant',
rule = 2)$y
# show result
print(d)
solve_interval_partition interval partition problem.
Description
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k<=kmax where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
Usage
solve_interval_partition(x, kmax)
Arguments
x
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive).
kmax
int, maximum number of segments in solution.
Value
dynamic program solution.
Examples
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3)
solve_interval_partition(costs, nrow(costs))
solve_interval_partition (R version)
Description
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
Usage
solve_interval_partition_R(x, kmax)
Arguments
x
NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive).
kmax
int, maximum number of steps in solution.
Value
dynamic program solution.
Examples
x <- matrix(c(1,1,5,1,1,0,5,0,1), nrow=3)
k <- 3
solve_interval_partition_R(x, k)
solve_interval_partition(x, k)
solve_interval_partition interval partition problem with a bound on number of steps.
Description
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k<=kmax where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
Usage
solve_interval_partition_k(x, kmax)
Arguments
x
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive).
kmax
int, maximum number of segments in solution.
Value
dynamic program solution.
Examples
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3)
solve_interval_partition(costs, nrow(costs))
solve_interval_partition interval partition problem, no boun on the number of steps.
Description
Not working yet.
Usage
solve_interval_partition_no_k(x)
Arguments
x
square NumericMatix, for j>=i x(i,j) is the cost of partition element [i,...,j] (inclusive).
Details
Solve a for a minimal cost partition of the integers [1,...,nrow(x)] problem where for j>=i x(i,j). is the cost of choosing the partition element [i,...,j]. Returned solution is an ordered vector v of length k where: v[1]==1, v[k]==nrow(x)+1, and the partition is of the form [v[i], v[i+1]) (intervals open on the right).
Value
dynamic program solution.
Examples
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3)
solve_interval_partition(costs, nrow(costs))
Summarize data (for debugging).
Description
Summarize data (for debugging).
Usage
summarize_input(x, y, w, i, j, skip)
Arguments
x
NumericVector, expanatory variable.
y
NumericVector, 0/1 values to fit.
w
NumericVector, weights (required, positive).
i
integer, first index (inclusive).
j
integer, last index (inclusive).
skip
integer, index to skip (-1 to not skip).
Value
summary list
Examples
costs <- matrix(c(1.5, NA ,NA ,1 ,0 , NA, 5, -1, 1), nrow = 3)
solve_interval_partition(costs, nrow(costs))
xlin_fits
Description
Calculate out of sample linear fit predictions using regularization. Zero indexed.
Usage
xlin_fits(x, y, w, i, j)
Arguments
x
NumericVector, explanatory variable (length>=2).
y
NumericVector, values fit.
w
NumericVector, weights (positive).
i
integer, first index (inclusive).
j
integer, j>=i+2 last index (inclusive);
Value
vector of predictions.
Examples
xlin_fits(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 0, 3)
xlin_fits_R
Description
Calculate out of sample linear fit predictions.
Usage
xlin_fits_R(x, y, w)
Arguments
x
NumericVector, x-coords of values to group (length>=2).
y
NumericVector, values to group in order.
w
NumericVector, weights (positive).
Value
vector of predictions.
Examples
xlin_fits_R(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1))
xlin_fits_R
Description
Calculate out of sample linear fit predictions.
Usage
xlin_fits_V(x, y, w)
Arguments
x
NumericVector, x-coords of values to group (length>=2).
y
NumericVector, values to group in order.
w
NumericVector, weights (positive).
Value
vector of predictions.
Examples
xlin_fits_V(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1))
xlin_fits_R
Description
Calculate out of sample linear fit predictions.
Usage
xlin_fits_lm(x, y, w)
Arguments
x
NumericVector, x-coords of values to group (length>=2).
y
NumericVector, values to group in order.
w
NumericVector, weights (positive).
Value
vector of predictions.
Examples
xlin_fits_lm(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1))
xlin_pfits
Description
Calculate out of sample linear fit predictions using pseudo-inverse. Please see: https://win-vector.com/2019/01/08/a-beautiful-2-by-2-matrix-identity/. Zero indexed.
Usage
xlin_pfits(x, y, w, i, j)
Arguments
x
NumericVector, explanatory variable (length>=2).
y
NumericVector, values to fit.
w
NumericVector, weights (positive).
i
integer, first index (inclusive).
j
integer, j>=i+2 last index (inclusive);
Value
vector of predictions.
Examples
xlin_pfits(c(1, 2, 3, 4), c(1, 2, 2, 1), c(1, 1, 1, 1), 0, 3)
Out of sample logistic predictions (in link space).
Description
1-hold out logistic regression predections. Zero indexed.
Usage
xlogistic_fits(x, y, w, i, j)
Arguments
x
NumericVector, expanatory variable.
y
NumericVector, 0/1 values to fit.
w
NumericVector, weights (required, positive).
i
integer, first index (inclusive).
j
integer, last index (inclusive).
Value
vector of predictions for interval.
Examples
set.seed(5)
d <- data.frame(x = rnorm(10))
d$y <- d$x + rnorm(nrow(d))>0
weights <- runif(nrow(d))
m <- glm(y~x, data = d, family = binomial, weights = weights)
d$pred1 <- predict(m, newdata = d, type = "link")
d$pred2 <- xlogistic_fits(d$x, d$y, weights, 0, nrow(d)-1)
d <- d[order(d$x), , drop = FALSE]
print(d)