Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs
A value or the magrittr placeholder.
rhs
A function call using the magrittr semantics.
Value
The result of calling rhs(lhs).
callRelatedness
Description
A function that takes PMR observations, and (given a prior distribution for degrees of relatedness) returns the posterior probabilities of all pairs of individuals being (a) the same individual/twins, (b) first-degree related, (c) second-degree related or (d) "unrelated" (third-degree or higher). The highest posterior probability degree of relatedness is also returned as a hard classification. Options include setting the background relatedness (or using the sample median), a minimum number of overlapping SNPs if one uses the sample median for background relatedness, and a minimum number of overlapping SNPs for including pairs in the analysis.
Usage
callRelatedness(
pmr_tibble,
class_prior = rep(0.25, 4),
average_relatedness = NULL,
median_co = 500,
filter_n = 1
)
Arguments
pmr_tibble
a tibble that is the output of the processEigenstrat function.
class_prior
the prior probabilities for same/twin, 1st-degree, 2nd-degree, unrelated, respectively.
average_relatedness
a single numeric value, or a vector of numeric values, to use as the average background relatedness. If NULL, the sample median is used.
median_co
if average_relatedness is left NULL, then the minimum cutoff for the number of overlapping snps to be included in the median calculation is 500.
filter_n
the minimum number of overlapping SNPs for which pairs are removed from the entire analysis. If NULL, default is 1.
Value
results_tibble: A tibble containing 13 columns:
row: The row number
pair: the pair of individuals that are compared.
relationship: the highest posterior probability estimate of the degree of relatedness.
pmr: the pairwise mismatch rate (mismatch/nsnps).
sd: the estimated standard deviation of the pmr.
mismatch: the number of sites which did not match for each pair.
nsnps: the number of overlapping snps that were compared for each pair.
ave_re;: the value for the background relatedness used for normalisation.
Same_Twins: the posterior probability associated with a same individual/twins classification.
First_Degree: the posterior probability associated with a first-degree classification.
Second_Degree: the posterior probability associated with a second-degree classification.
Unrelated: the posterior probability associated with an unrelated classification.
BF: A strength of confidence in the Bayes Factor associated with the highest posterior probability classification compared to the 2nd highest. (No longer included)
Examples
callRelatedness(counts_example,
class_prior=rep(0.25,4),
average_relatedness=NULL,
median_co=5e2,filter_n=1
)
counts_example
Description
this is an example of the tibble made by processEigenstrat().
Usage
counts_example
Format
counts_example
A data frame with 15 rows and 4 columns:
- pair
the pair of individuals that are compared
- nsnps
the number of overlapping snps that were compared for each pair.
- mismatch
the number of sites which did not match for each pair.
- pmr
the pairwise mismatch rate (mismatch/nsnps).
get column
Description
get column
Usage
get_column_new(genofile, col = 1)
Arguments
genofile
genofile
col
column to return
Value
column of numbers
plotLOAF
Description
Plots all (sorted by increasing value) observed PMR values with maximum posterior probability classifications represented by colour and shape. Options include a cut off for the minimum number of overlapping SNPs, the max number of pairs to plot and x-axis font size.
Usage
plotLOAF(in_tibble, nsnps_cutoff = NULL, N = NULL, fntsize = 7, verbose = TRUE)
Arguments
in_tibble
a tibble that is the output of the callRelatedness() function.
nsnps_cutoff
the minimum number of overlapping SNPs for which pairs are removed from the plot. If NULL, default is 500.
N
the number of (sorted by increasing PMR) pairs to plot. Avoids plotting all pairs (many of which are unrelated).
fntsize
the fontsize for the x-axis names.
verbose
if TRUE, then information about the plotting process is sent to the console
Value
a ggplot object
Examples
relatedness_example
plotLOAF(relatedness_example)
plotSLICE
Description
A function for plotting the diagnostic information when classifying a specific pair (defined by the row number or pair name) of individuals. Output includes the PDFs for each degree of relatedness (given the number of overlapping SNPs) in panel A, and the normalised posterior probabilities for each possible degree of relatedness.
Usage
plotSLICE(
in_tibble,
row,
title = NULL,
class_prior = rep(1/4, 4),
showPlot = TRUE,
which_plot = 0,
labels = NULL
)
Arguments
in_tibble
a tibble that is the output of the callRelatedness() function.
row
either the row number or pair name for which the posterior distribution is to be plotted.
title
an optional title for the plot. If NULL, the pair from the user-defined row is used.
class_prior
the prior probabilities for same/twin, 1st-degree, 2nd-degree, unrelated, respectively.
showPlot
If TRUE, display plot. If FALSE, just pass plot as a variable.
which_plot
if 1, returns just the plot of the posterior distributions, if 2 returns just the normalised posterior values. Anything else returns both plots.
labels
a length two character vector of labels for plots. Default is no labels.
Value
a two-panel diagnostic ggplot object
Examples
plotSLICE(relatedness_example, row = 1)
process Eigenstrat data - alternative version
Description
A function that takes paths to an eigenstrat trio (ind, snp and geno file) and returns the pairwise mismatch rate for all pairs on a thinned set of SNPs. Options include choosing thinning parameter, subsetting by population names, and filtering out SNPs for which deamination is possible.
Usage
processEigenstrat(
indfile,
genofile,
snpfile,
filter_length = NULL,
pop_pattern = NULL,
filter_deam = FALSE,
outfile = NULL,
chromosomes = NULL,
verbose = TRUE
)
Arguments
indfile
path to eigenstrat ind file
genofile
path to eigenstrat geno file.
snpfile
path to eigenstrat snp file.
filter_length
the minimum distance between sites to be compared (to reduce the effect of LD).
pop_pattern
a character vector of population names to filter the ind file if only some populations are to compared.
filter_deam
a TRUE/FALSE for if C->T and G->A sites should be ignored.
outfile
(OPTIONAL) a path and filename to which we can save the output of the function as a TSV, if NULL, no back up saved. If no outfile, then a tibble is returned.
chromosomes
the chromosome to filter the data on.
verbose
controls printing of messages to console
Value
out_tibble: A tibble containing four columns:
Examples
# Use internal files to the package as an example
indfile <- system.file("extdata", "example.ind.txt", package = "BREADR")
genofile <- system.file("extdata", "example.geno.txt", package = "BREADR")
snpfile <- system.file("extdata", "example.snp.txt", package = "BREADR")
processEigenstrat(
indfile, genofile, snpfile,
filter_length=1e5,
pop_pattern=NULL,
filter_deam=FALSE
)
process Eigenstrat data
Description
A function that takes paths to an eigenstrat trio (ind, snp and geno file) and returns the pairwise mismatch rate for all pairs on a thinned set of SNPs. Options include choosing thinning parameter, subsetting by population names, and filtering out SNPs for which deamination is possible.
Usage
processEigenstrat_old(
indfile,
genofile,
snpfile,
filter_length = NULL,
pop_pattern = NULL,
filter_deam = FALSE,
outfile = NULL,
chromosomes = NULL,
verbose = TRUE
)
Arguments
indfile
path to eigenstrat ind file
genofile
path to eigenstrat geno file.
snpfile
path to eigenstrat snp file.
filter_length
the minimum distance between sites to be compared (to reduce the effect of LD).
pop_pattern
a character vector of population names to filter the ind file if only some populations are to compared.
filter_deam
a TRUE/FALSE for if C->T and G->A sites should be ignored.
outfile
(OPTIONAL) a path and filename to which we can save the output of the function as a TSV, if NULL, no back up saved. If no outfile, then a tibble is returned.
chromosomes
the chromosome to filter the data on.
verbose
controls printing of messages to console
Value
out_tibble: A tibble containing four columns:
Examples
# Use internal files to the package as an example
indfile <- system.file("extdata", "example.ind.txt", package = "BREADR")
genofile <- system.file("extdata", "example.geno.txt", package = "BREADR")
snpfile <- system.file("extdata", "example.snp.txt", package = "BREADR")
processEigenstrat_old(
indfile, genofile, snpfile,
filter_length=1e5,
pop_pattern=NULL,
filter_deam=FALSE
)
read_ind
Description
read_ind
Usage
read_ind(filename)
Arguments
filename
a IND text file.
Value
tibble with column headings: ind (CHR), sex (CHR), pop (CHR)
Examples
ind_snpfile <- system.file("extdata", "example.ind.txt", package = "BREADR")
read_ind(ind_snpfile)
read_snp
Description
read_snp
Usage
read_snp(filename)
Arguments
filename
a SNP text file.
Value
tibble with column headings: snp (CHR), chr (DBL), pos (DBL), site (DBL), anc (CHR), and der (CHR).
Examples
std_snpfile <- system.file("extdata", "example.snp.txt", package = "BREADR")
broken_snpfile <- system.file("extdata", "broken.snp.txt", package = "BREADR")
read_snp(std_snpfile)
read_snp(broken_snpfile)
relatedness_example
Description
this is an example of the tibble made by callRelatedness()
Usage
relatedness_example
Format
relatedness_example
A data frame with 15 rows and 13 columns:
- row
The row number
- pair
the pair of individuals that are compared.
- relationship
the highest posterior probability estimate of the degree of relatedness.
- pmr
the pairwise mismatch rate (mismatch/nsnps).
- sd
the estimated standard deviation of the pmr.
- mismatch
the number of sites which did not match for each pair.
- nsnps
the number of overlapping snps that were compared for each pair.
- ave_re
the value for the background relatedness used for normalisation.
- Same_Twins
the posterior probability associated with a same individual/twins classification.
- First_Degree
the posterior probability associated with a first-degree classification.
- Second_Degree
the posterior probability associated with a second-degree classification.
- Unrelated
the posterior probability associated with an unrelated classification.
- BF
A strength of confidence in the Bayes Factor associated with the highest posterior probability classification compared to the 2nd highest.
saveSLICES
Description
Plots all pairwise diagnostic plots (in a tibble as output by callRelatedness), as produced by plotSLICE, to a folder. Options include the width and height of the output files, and the units in which these dimensions are measured.
Usage
saveSLICES(
in_tibble,
outFolder = NULL,
width = 297,
height = 210,
units = "mm",
verbose = TRUE
)
Arguments
in_tibble
a tibble that is the output of the callRelatedness() function.
outFolder
the folder into which all diagnostic plots will be saved
width
the width of the output PDFs.
height
the height of the output PDFs.
units
the units for the height and width of the output PDFs.
verbose
Controls the printing of progress to console.
Value
nothing
Examples
saveSLICES(relatedness_example[1:3, ], outFolder = tempdir())
sim_geno
Description
Simulated geno file of eigenstrat format
Usage
sim_geno(n_ind, n_snp, filename)
Arguments
n_ind
number of individuals
n_snp
number of SNPs
filename
filename of export
Value
NULL exports a file
Examples
## Not run:
sim_geno(10, 5, "geno.txt")
## End(Not run)
split line
Description
takes a line for a SNP file and splits into parts.
Usage
split_line(x)
Arguments
x
line from SNP file
Value
tibble with 6 columns.
Examples
split_line("1_14.570829090394763 1 0.000000 14 A X")
split_line("rs3094315 1 0.0 752566 G A")
test_degree
Description
Test if a degree of relatedness is consistent with an observed PMR
Usage
test_degree(in_tibble, row, degree, verbose = TRUE)
Arguments
in_tibble
a tibble that is the output of the callRelatedness() function.
row
either the row number or pair name for which the posterior distribution is to be plotted.
degree
the degree of relatedness to be tested.
verbose
a logical (boolean) for whether all test output should be printed to screen.
Value
the associated p-value for the test
Examples
test_degree(relatedness_example, 1, 1)