[フレーム]
Forgot your password?
Please wait...

We can help you reset your password using the email address linked to your Project Euclid account.

Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches. Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content. Contact customer_support@projecteuclid.org with any questions.
View Project Euclid Privacy Policy
All Fields are Required
*
*
*
*
Password Requirements: Minimum 8 characters, must include as least one uppercase, one lowercase letter, and one number or permitted symbol Valid Symbols for password:
~ Tilde
! Exclamation Mark
@ At sign
$ Dollar sign
^ Caret
( Opening Parenthesis
) Closing Parenthesis
_ Underscore
. Period
*
Please wait...
Web Account created successfully
Project Euclid
Advanced Search
Home > Journals > Ann. Appl. Stat. > Volume 4 > Issue 2
VOL. 4 · NO. 2 | June 2010
Content Email Alerts notify you when new content has been published.
Visit My Account to manage your email alerts.
Please select when you would like to receive an alert.
Please wait...
Alert saved!
VIEW ALL ABSTRACTS +
This will count as one of your downloads.
You will have access to both the presentation and article (if available).
This content is available for download via your institution's subscription. To access this item, please sign in to your personal account.
Forgot your password?
No Project Euclid account? Create an account
Frontmatter
Ann. Appl. Stat. 4 (2), (June 2010)
No abstract available
Ann. Appl. Stat. 4 (2), (June 2010)
No abstract available
Special Section on Network Modeling
Stephen E. Fienberg
Ann. Appl. Stat. 4 (2), 533-534, (June 2010) DOI: 10.1214/10-AOAS365
No abstract available
Eric P. Xing, Wenjie Fu, Le Song
Ann. Appl. Stat. 4 (2), 535-566, (June 2010) DOI: 10.1214/09-AOAS311
KEYWORDS: Dynamic networks, Network tomography, mixed membership stochastic blockmodels, state-space models, Time-varying networks, mixed membership model, Graphical model, variational inference, Bayesian inference, Social network, gene regulation network

In a dynamic social or biological environment, the interactions between the actors can undergo large and systematic changes. In this paper we propose a model-based approach to analyze what we will refer to as the dynamic tomography of such time-evolving networks. Our approach offers an intuitive but powerful tool to infer the semantic underpinnings of each actor, such as its social roles or biological functions, underlying the observed network topologies. Our model builds on earlier work on a mixed membership stochastic blockmodel for static networks, and the state-space model for tracking object trajectory. It overcomes a major limitation of many current network inference techniques, which assume that each actor plays a unique and invariant role that accounts for all its interactions with other actors; instead, our method models the role of each actor as a time-evolving mixed membership vector that allows actors to behave differently over time and carry out different roles/functions when interacting with different peers, which is closer to reality. We present an efficient algorithm for approximate inference and learning using our model; and we applied our model to analyze a social network between monks (i.e., the Sampson’s network), a dynamic email communication network between the Enron employees, and a rewiring gene interaction network of fruit fly collected during its full life cycle. In all cases, our model reveals interesting patterns of the dynamic roles of the actors.

Tom A. B. Snijders, Johan Koskinen, Michael Schweinberger
Ann. Appl. Stat. 4 (2), 567-588, (June 2010) DOI: 10.1214/09-AOAS313
KEYWORDS: Graphs, longitudinal data, method of moments, stochastic approximation, Robbins–Monro algorithm

A model for network panel data is discussed, based on the assumption that the observed data are discrete observations of a continuous-time Markov process on the space of all directed graphs on a given node set, in which changes in tie variables are independent conditional on the current graph. The model for tie changes is parametric and designed for applications to social network analysis, where the network dynamics can be interpreted as being generated by choices made by the social actors represented by the nodes of the graph. An algorithm for calculating the Maximum Likelihood estimator is presented, based on data augmentation and stochastic approximation. An application to an evolving friendship network is given and a small simulation study is presented which suggests that for small data sets the Maximum Likelihood estimator is more efficient than the earlier proposed Method of Moments estimator.

Ya Xu, Justin S. Dyer, Art B. Owen
Ann. Appl. Stat. 4 (2), 589-614, (June 2010) DOI: 10.1214/09-AOAS293
KEYWORDS: graph Laplacian, kriging, PageRank, Random walk

In semi-supervised learning on graphs, response variables observed at one node are used to estimate missing values at other nodes. The methods exploit correlations between nearby nodes in the graph. In this paper we prove that many such proposals are equivalent to kriging predictors based on a fixed covariance matrix driven by the link structure of the graph. We then propose a data-driven estimator of the correlation structure that exploits patterns among the observed response values. By incorporating even a small fraction of observed covariation into the predictions, we are able to obtain much improved prediction on two graph data sets.

Ricardo Silva, Katherine Heller, Zoubin Ghahramani, Edoardo M. Airoldi
Ann. Appl. Stat. 4 (2), 615-644, (June 2010) DOI: 10.1214/09-AOAS321
KEYWORDS: network analysis, Bayesian inference, variational approximation, ranking, information retrieval, data integration, Saccharomyces cerevisiae

Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects S = {A(1) : B(1), A(2) : B(2), ..., A(N) : B(N)}, measures how well other pairs A : B fit in with the set S. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in S? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.

Nicholas A. Heard, David J. Weston, Kiriaki Platanioti, David J. Hand
Ann. Appl. Stat. 4 (2), 645-662, (June 2010) DOI: 10.1214/10-AOAS329
KEYWORDS: Dynamic networks, Bayesian inference, counting processes, hurdle models

Learning the network structure of a large graph is computationally demanding, and dynamically monitoring the network over time for any changes in structure threatens to be more challenging still.

This paper presents a two-stage method for anomaly detection in dynamic graphs: the first stage uses simple, conjugate Bayesian models for discrete time counting processes to track the pairwise links of all nodes in the graph to assess normality of behavior; the second stage applies standard network inference tools on a greatly reduced subset of potentially anomalous nodes. The utility of the method is demonstrated on simulated and real data sets.

Gareth M. James, Chiara Sabatti, Nengfeng Zhou, Ji Zhu
Ann. Appl. Stat. 4 (2), 663-686, (June 2010) DOI: 10.1214/10-AOAS350
KEYWORDS: Transcription regulation networks, L_1 penalty, E. coli, sparse network

In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increasing the difficulty of the problem. Based on previous experimental work, it is often the case that partial information about the TRN is available. For example, certain TFs may be known to regulate a given gene or in other cases a connection may be predicted with a certain probability. In general, the biology of the problem indicates there will be very few connections between TFs and genes. Several methods have been proposed for estimating TRNs. However, they all suffer from problems such as unrealistic assumptions about prior knowledge of the network structure or computational limitations. We propose a new approach that can directly utilize prior information about the network structure in conjunction with observed gene expression data to estimate the TRN. Our approach uses L1 penalties on the network to ensure a sparse structure. This has the advantage of being computationally efficient as well as making many fewer assumptions about the network structure. We use our methodology to construct the TRN for E. coli and show that the estimate is biologically sensible and compares favorably with previous estimates.

Hugo Zanghi, Franck Picard, Vincent Miele, Christophe Ambroise
Ann. Appl. Stat. 4 (2), 687-714, (June 2010) DOI: 10.1214/10-AOAS359
KEYWORDS: Graph clustering, EM Algorithms, online strategies, web graph structure analysis

In this paper we adapt online estimation strategies to perform model-based clustering on large networks. Our work focuses on two algorithms, the first based on the SAEM algorithm, and the second on variational methods. These two strategies are compared with existing approaches on simulated and real data. We use the method to decipher the connexion structure of the political websphere during the US political campaign in 2008. We show that our online EM-based algorithms offer a good trade-off between precision and speed, when estimating parameters for mixture distributions in the context of random graphs.

Mahendra Mariadassou, Stéphane Robin, Corinne Vacher
Ann. Appl. Stat. 4 (2), 715-742, (June 2010) DOI: 10.1214/10-AOAS361
KEYWORDS: Ecological networks, host–parasite interactions, latent structure, mixture model, random graph, valued graph, variational method

As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.

We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems.

Articles
Anat Sakov, Ilan Golani, Dina Lipkind, Yoav Benjamini
Ann. Appl. Stat. 4 (2), 743-763, (June 2010) DOI: 10.1214/09-AOAS304
KEYWORDS: robustness, LOWESS, path data, behavior genetics, Outliers, regression quantile, running median, boundary estimation, center estimation

In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present insights and solutions in the context of behavior genetics, where data consists of a time series of locations of a mouse in a circular arena. In order to estimate the location, velocity and acceleration of the mouse, and identify stops, we use a nonstandard mix of robust and resistant methods: LOWESS and repeated running median. In addition, we argue that protection against small deviations from experimental protocols can be handled automatically using statistical methods. In our case, it is of biological interest to measure a rodent’s distance from the arena’s wall, but this measure is corrupted if the arena is not a perfect circle, as required in the protocol. The problem is addressed by estimating robustly the actual boundary of the arena and its center using a nonparametric regression quantile of the behavioral data, with the aid of a fast algorithm developed for that purpose.

Genevera I. Allen, Robert Tibshirani
Ann. Appl. Stat. 4 (2), 764-790, (June 2010) DOI: 10.1214/09-AOAS314
KEYWORDS: Matrix-variate normal, Covariance estimation, imputation, EM algorithm, transposable data

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so-called transposable regularized covariance models allow for maximum likelihood estimation of the mean and nonsingular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

Michael R. Huber, Rodney X. Sturdivant
Ann. Appl. Stat. 4 (2), 791-804, (June 2010) DOI: 10.1214/09-AOAS301
KEYWORDS: exponential distribution, Poisson process, memoryless property, rare baseball events, Goodness-of-fit test

How often can we expect a Major League Baseball team to score at least 20 runs in a single game? Considered a rare event in baseball, the outcome of scoring at least 20 runs in a game has occurred 224 times during regular season games since 1901 in the American and National Leagues. Each outcome is modeled as a Poisson process; the time of occurrence of one of these events does not affect the next future occurrence. Using various distributions, probabilities of events are generated, goodness-of-fit tests are conducted, and predictions of future events are offered. The statistical package R is employed for analysis.

Dipankar Bandyopadhyay, Debajyoti Sinha, Stuart Lipsitz, Elizabeth Letourneau
Ann. Appl. Stat. 4 (2), 805-829, (June 2010) DOI: 10.1214/09-AOAS295
KEYWORDS: Bridge density, Change-point, Dirichlet prior, Markov chain Monte Carlo

Existing state-wide data bases on prosecutors’ decisions about juvenile offenders are important, yet often un-explored resources for understanding changes in patterns of judicial decisions over time. We investigate the extent and nature of change in judicial behavior toward juveniles following the enactment of a new set of mandatory registration policies between 1992 and 1996 via analyzing the data on prosecutors’ decisions of moving forward for youths repeatedly charged with sexual violence in South Carolina. To analyze this longitudinal binary data, we use a random effects logistic regression model via incorporating an unknown change-point year. For convenient physical interpretation, our models allow the proportional odds interpretation of effects of the explanatory variables and the change-point year with and without conditioning on the youth-specific random effects. As a consequence, the effects of the unknown change-point year and other factors can be interpreted as changes in both within youth and population averaged odds of moving forward. Using a Bayesian paradigm, we consider various prior opinions about the unknown year of the change in the pattern of prosecutors’ decision. Based on the available data, we make posteriori conclusions about whether a change-point has occurred between 1992 and 1996 (inclusive), evaluate the degree of confidence about the year of change-point, estimate the magnitude of the effects of the change-point and other factors, and investigate other provocative questions about patterns of prosecutors’ decisions over time.

John Hughes, John Fricks, William Hancock
Ann. Appl. Stat. 4 (2), 830-848, (June 2010) DOI: 10.1214/09-AOAS299
KEYWORDS: Maximum likelihood methods, Poisson random field, fluorescence microscopy, particle tracking, organelle, Molecular motor, nanotechnology

We introduce a procedure to automatically count and locate the fluorescent particles in a microscopy image. Our procedure employs an approximate likelihood estimator derived from a Poisson random field model for photon emission. Estimates of standard errors are generated for each image along with the parameter estimates, and the number of particles in the image is determined using an information criterion and likelihood ratio tests. Realistic simulations show that our procedure is robust and that it leads to accurate estimates, both of parameters and of standard errors. This approach improves on previous ad hoc least squares procedures by giving a more explicit stochastic model for certain fluorescence images and by employing a consistent framework for analysis.

Carrie A. Hosman, Ben B. Hansen, Paul W. Holland
Ann. Appl. Stat. 4 (2), 849-870, (June 2010) DOI: 10.1214/09-AOAS315
KEYWORDS: Causal inference, hidden bias, observational study

Omitted variable bias can affect treatment effect estimates obtained from observational data due to the lack of random assignment to treatment groups. Sensitivity analyses adjust these estimates to quantify the impact of potential omitted variables. This paper presents methods of sensitivity analysis to adjust interval estimates of treatment effect—both the point estimate and standard error—obtained using multiple linear regression. Central to our approach is what we term benchmarking, the use of data to establish reference points for speculation about omitted confounders. The method adapts to treatment effects that may differ by subgroup, to scenarios involving omission of multiple variables, and to combinations of covariance adjustment with propensity score stratification. We illustrate it using data from an influential study of health outcomes of patients admitted to critical care.

Audrey Qiuyan Fu, Diane P. Genereux, Reinhard Stöger, Charles D. Laird, Matthew Stephens
Ann. Appl. Stat. 4 (2), 871-892, (June 2010) DOI: 10.1214/09-AOAS297
KEYWORDS: Bayesian inference, DNA methylation, transmission fidelity, epigenetics, hairpin-bisulfite PCR, hierarchical models, Markov chain Monte Carlo (MCMC), measurement error, multi-site models, stationarity

We develop Bayesian inference methods for a recently-emerging type of epigenetic data to study the transmission fidelity of DNA methylation patterns over cell divisions. The data consist of parent-daughter double-stranded DNA methylation patterns with each pattern coming from a single cell and represented as an unordered pair of binary strings. The data are technically difficult and time-consuming to collect, putting a premium on an efficient inference method. Our aim is to estimate rates for the maintenance and de novo methylation events that gave rise to the observed patterns, while accounting for measurement error. We model data at multiple sites jointly, thus using whole-strand information, and considerably reduce confounding between parameters. We also adopt a hierarchical structure that allows for variation in rates across sites without an explosion in the effective number of parameters. Our context-specific priors capture the expected stationarity, or near-stationarity, of the stochastic process that generated the data analyzed here. This expected stationarity is shown to greatly increase the precision of the estimation. Applying our model to a data set collected at the human FMR1 locus, we find that measurement errors, generally ignored in similar studies, occur at a nontrivial rate (inappropriate bisulfite conversion error: 1.6% with 80% CI: 0.9–2.3%). Accounting for these errors has a substantial impact on estimates of key biological parameters. The estimated average failure of maintenance rate and daughter de novo rate decline from 0.04 to 0.024 and from 0.14 to 0.07, respectively, when errors are accounted for. Our results also provide evidence that de novo events may occur on both parent and daughter strands: the median parent and daughter de novo rates are 0.08 (80% CI: 0.04–0.13) and 0.07 (80% CI: 0.04–0.11), respectively.

Woncheol Jang, Ji Meng Loh
Ann. Appl. Stat. 4 (2), 893-915, (June 2010) DOI: 10.1214/09-AOAS307
KEYWORDS: Bandwidth selection, grouped data, kernel density estimator, line transect sampling, smoothed bootstrap

Line transect sampling is a method used to estimate wildlife populations, with the resulting data often grouped in intervals. Estimating the density from grouped data can be challenging. In this paper we propose a kernel density estimator of wildlife population density for such grouped data. Our method uses a combined cross-validation and smoothed bootstrap approach to select the optimal bandwidth for grouped data. Our simulation study shows that with the smoothing parameter selected with this method, the estimated density from grouped data matches the true density more closely than with other approaches. Using smoothed bootstrap, we also construct bias-adjusted confidence intervals for the value of the density at the boundary. We apply the proposed method to two grouped data sets, one from a wooden stake study where the true density is known, and the other from a survey of kangaroos in Australia.

Kristin P. Lennox, David B. Dahl, Marina Vannucci, Ryan Day, Jerry W. Tsai
Ann. Appl. Stat. 4 (2), 916-942, (June 2010) DOI: 10.1214/09-AOAS296
KEYWORDS: Bayesian nonparametrics, Density estimation, dihedral angles, Protein structure prediction, torsion angles, von Mises distribution

By providing new insights into the distribution of a protein’s torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.

Kimberly F. Sellers, Galit Shmueli
Ann. Appl. Stat. 4 (2), 943-961, (June 2010) DOI: 10.1214/09-AOAS306
KEYWORDS: Conway–Maxwell-Poisson (COM-Poisson) distribution, dispersion, generalized linear models (GLM), generalized Poisson

Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway–Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.

Qunhua Li, Michael J. MacCoss, Matthew Stephens
Ann. Appl. Stat. 4 (2), 962-987, (June 2010) DOI: 10.1214/09-AOAS316
KEYWORDS: mixture model, nested structure, EM algorithm, protein identification, peptide identification, mass spectrometry, proteomics

Mass spectrometry provides a high-throughput way to identify proteins in biological samples. In a typical experiment, proteins in a sample are first broken into their constituent peptides. The resulting mixture of peptides is then subjected to mass spectrometry, which generates thousands of spectra, each characteristic of its generating peptide. Here we consider the problem of inferring, from these spectra, which proteins and peptides are present in the sample. We develop a statistical approach to the problem, based on a nested mixture model. In contrast to commonly used two-stage approaches, this model provides a one-stage solution that simultaneously identifies which proteins are present, and which peptides are correctly identified. In this way our model incorporates the evidence feedback between proteins and their constituent peptides. Using simulated data and a yeast data set, we compare and contrast our method with existing widely used approaches (PeptideProphet/ProteinProphet) and with a recently published new approach, HSM. For peptide identification, our single-stage approach yields consistently more accurate results. For protein identification the methods have similar accuracy in most settings, although we exhibit some scenarios in which the existing methods perform poorly.

Xiaodan Fan, Saumyadipta Pyne, Jun S. Liu
Ann. Appl. Stat. 4 (2), 988-1013, (June 2010) DOI: 10.1214/09-AOAS300
KEYWORDS: cell cycle, periodically expressed gene, microarray time series, Meta-analysis, fission yeast, Schizosaccharomyces pombe, Markov chain Monte Carlo

The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed genes. To address the problem, we introduce a Bayesian model to integrate multiple independent microarray data sets from three recent genome-wide cell cycle studies on fission yeast. A hierarchical model was used for data integration. In order to facilitate an efficient Monte Carlo sampling from the joint posterior distribution, we develop a novel Metropolis–Hastings group move. A surprising finding from our integrated analysis is that more than 40% of the genes in fission yeast are significantly periodically expressed, greatly enhancing the reported 10–15% of the genes in the current literature. It calls for a reconsideration of the periodically expressed gene detection problem.

Wei Sun, Fred A. Wright
Ann. Appl. Stat. 4 (2), 1014-1033, (June 2010) DOI: 10.1214/09-AOAS298
KEYWORDS: Permutation p-value, gene expression quantitative trait loci (eQTL), effective number of independent tests

Permutation p-values have been widely used to assess the significance of linkage or association in genetic studies. However, the application in large-scale studies is hindered by a heavy computational burden. We propose a geometric interpretation of permutation p-values, and based on this geometric interpretation, we develop an efficient permutation p-value estimation method in the context of regression with binary predictors. An application to a study of gene expression quantitative trait loci (eQTL) shows that our method provides reliable estimates of permutation p-values while requiring less than 5% of the computational time compared with direct permutations. In fact, our method takes a constant time to estimate permutation p-values, no matter how small the p-value. Our method enables a study of the relationship between nominal p-values and permutation p-values in a wide range, and provides a geometric perspective on the effective number of independent tests.

Maria L. Rizzo, Gábor J. Székely
Ann. Appl. Stat. 4 (2), 1034-1055, (June 2010) DOI: 10.1214/09-AOAS245
KEYWORDS: Distance components, DISCO, multisample problem, test equal distributions, multivariate, nonparametric MANOVA extension

In classical analysis of variance, dispersion is measured by considering squared distances of sample elements from the sample mean. We consider a measure of dispersion for univariate or multivariate response based on all pairwise distances between-sample elements, and derive an analogous distance components (DISCO) decomposition for powers of distance in (0, 2]. The ANOVA F statistic is obtained when the index (exponent) is 2. For each index in (0, 2), this decomposition determines a nonparametric test for the multi-sample hypothesis of equal distributions that is statistically consistent against general alternatives.

Martin Slawski, Wolfgang zu Castell, Gerhard Tutz
Ann. Appl. Stat. 4 (2), 1056-1080, (June 2010) DOI: 10.1214/09-AOAS302
KEYWORDS: generalized linear model, regularization, Sparsity, p ≫ n, Lasso, Elastic net, Random fields, Model selection, signal regression

In generalized linear regression problems with an abundant number of features, lasso-type regularization which imposes an l1-constraint on the regression coefficients has become a widely established technique. Deficiencies of the lasso in certain scenarios, notably strongly correlated design, were unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301–320] introduced the elastic net. In this paper we propose to extend the elastic net by admitting general nonnegative quadratic constraints as a second form of regularization. The generalized ridge-type constraint will typically make use of the known association structure of features, for example, by using temporal- or spatial closeness.

We study properties of the resulting "structured elastic net" regression estimation procedure, including basic asymptotics and the issue of model selection consistency. In this vein, we provide an analog to the so-called "irrepresentable condition" which holds for the lasso. Moreover, we outline algorithmic solutions for the structured elastic net within the generalized linear model family. The rationale and the performance of our approach is illustrated by means of simulated and real world data, with a focus on signal regression.

Stergios B. Fotopoulos, Venkata K. Jandhyala, Elena Khapalova
Ann. Appl. Stat. 4 (2), 1081-1104, (June 2010) DOI: 10.1214/09-AOAS294
KEYWORDS: Ladder epochs, likelihood ratio, maximum likelihood estimate, random walk with negative drift

We derive exact computable expressions for the asymptotic distribution of the change-point mle when a change in the mean occurred at an unknown point of a sequence of time-ordered independent Gaussian random variables. The derivation, which assumes that nuisance parameters such as the amount of change and variance are known, is based on ladder heights of Gaussian random walks hitting the half-line. We then show that the exact distribution easily extends to the distribution of the change-point mle when a change occurs in the mean vector of a multivariate Gaussian process. We perform simulations to examine the accuracy of the derived distribution when nuisance parameters have to be estimated as well as robustness of the derived distribution to deviations from Gaussianity. Through simulations, we also compare it with the well-known conditional distribution of the mle, which may be interpreted as a Bayesian solution to the change-point problem. Finally, we apply the derived methodology to monthly averages of water discharges of the Nacetinsky creek, Germany.

Jan van den Brakel, Joeri Roels
Ann. Appl. Stat. 4 (2), 1105-1138, (June 2010) DOI: 10.1214/09-AOAS305
KEYWORDS: intervention analysis, response bias, structural time series models, survey sampling

An important quality aspect of official statistics produced by national statistical institutes is comparability over time. To maintain uninterrupted time series, surveys conducted by national statistical institutes are often kept unchanged as long as possible. To improve the quality or efficiency of a survey process, however, it remains inevitable to adjust methods or redesign this process from time to time. Adjustments in the survey process generally affect survey characteristics such as response bias and therefore have a systematic effect on the parameter estimates of a sample survey. Therefore, it is important that the effects of a survey redesign on the estimated series are explained and quantified. In this paper a structural time series model is applied to estimate discontinuities in series of the Dutch survey on social participation and environmental consciousness due to a redesign of the underlying survey process.

PERSONAL SIGN IN
Full access may be available with your subscription
Forgot your password?
PURCHASE THIS CONTENT
PURCHASE SINGLE ARTICLE
Price: ADD TO CART
Includes PDF & HTML, when available
PURCHASE SINGLE ARTICLE
This article is only available to subscribers. It is not available for individual sale.
My Library
You currently do not have any folders to save your paper to! Create a new folder below.
Back to Top

KEYWORDS/PHRASES

Keywords
in
Remove
in
Remove
in
Remove
+ Add another field

PUBLICATION TITLE:


PUBLICATION YEARS

Range
Single Year

Clear Form

AltStyle によって変換されたページ (->オリジナル) /