DataFusionGDM: Getting Started

DataFusionGDM Team

Overview

DataFusionGDM provides tools to simulate genetic distance matrices (GDM), compare/alignment of distance spaces via MDS and Procrustes, and evaluate imputation under structured missingness (BESMI).

Installation

 # Install from GitHub
 if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes")
remotes::install_github("jiashuaiz/DataFusion-GDM")

Simulation and visualization

 library(DataFusionGDM)
res <- run_genetic_scenario("default", n_pops = 30, seed = 2025)
 #> 
 #> Raw distances before transformation :
 #> Dimensions: 30 30 
 #> Is symmetric: TRUE 
 #> Range of distances: 0 30.34014 
 #> Mean distance: 15.88792 
 #> Diagonal values - min: 0 max: 0 
 #> Quantiles: 0%: 0 25%: 7.938344 50%: 16.54102 75%: 23.1751 100%: 30.34014 
 #> 
 #> Final genetic distances after transformation :
 #> Dimensions: 30 30 
 #> Is symmetric: TRUE 
 #> Range of distances: 0 1 
 #> Mean distance: 0.7895569 
 #> Diagonal values - min: 0 max: 0 
 #> Quantiles: 0%: 0 25%: 0.7268217 50%: 0.8404666 75%: 0.9433305 100%: 1
 # Display MDS plot (heatmap requires ComplexHeatmap, not shown here)
res$plots$mds()

MDS + Procrustes

 # create two related matrices from the simulated GDM
G <- res$results$distance_matrix
A <- G + matrix(rnorm(length(G), 0, 0.02), nrow = nrow(G)); diag(A) <- 0
B <- G + matrix(rnorm(length(G), 0.03, 0.02), nrow = nrow(G)); diag(B) <- 0
 
mds <- perform_mds(A, B)
Yt <- apply_procrustes(mds$X, mds$Y, mds$Y)
B_cal <- coords_to_distances(Yt)
 mean((A - B)^2)
 #> [1] 0.001681216
 mean((A - B_cal)^2)
 #> [1] 0.01596116

BESMI (single dataset)

 # Prepare a masked dataset in-memory
mask <- matrix(FALSE, nrow = nrow(G), ncol = ncol(G))
sel <- seq_len(min(5, nrow(G)))
mask[sel, sel] <- TRUE
M_input <- G; M_input[mask] <- NA
 
 # Impute
impt <- besmi_iterative_imputation(M_input, M_mask = mask, M_real = G, max_iterations = 3)
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Number of logged events: 10
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Number of logged events: 15
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
 #> fold
 #> Warning: Number of logged events: 20
 str(impt$metrics)
 #> 'data.frame': 3 obs. of 9 variables:
 #> $ k : logi NA NA NA
 #> $ bs : logi NA NA NA
 #> $ iteration : num 1 2 3
 #> $ imputation_dis : num 0.316 0.333 0.317
 #> $ propagation_dis: num 0.0668 0.0619 0.0598
 #> $ runtime : num NA NA NA
 #> $ improvement_pct: num NA NA NA
 #> $ converged : logi FALSE FALSE FALSE
 #> $ averaged : logi FALSE TRUE TRUE

Reproducible pipelines

See inst/examples for fuller pipelines that write results to disk in a project context.

AltStyle によって変換されたページ (->オリジナル) /