Making
Logistic
Regression A Core Data Mining Tool With TR-IRLS
(2005)
This short paper is the easiest, fastest way
to learn about Truncated Regularized Iteratively Re-weighted Least
Squares (TR-IRLS), my algorithm for fast, parameter-free logistic
regression. TR-IRLS can also be used for any generalized linear
model. This
High-Dimensional
Probabilistic Classification for Drug Discovery
(2004)
Discriminative probabilistic classifiers have been used successfully on
large life-sciences datasets, but high dimensionalities have prohibited
the use of nonparametric class probability estimation. This paper
explores a method (SLAMDUNK) which addresses
Alias
Detection in Link
Data Sets
(2004)
An active learning approach to deciding whether two names correspond to
the same entity, combining string similarity information and context
similarity.
Tractable
Learning of
Large Bayes Net Structures from Sparse Data
(2004)
in this paper we propose an algorithm that allows to learn a Bayes Net
structure from sparse data (e.g., power-law distributed) with over
100,000 variables. we also report time and performance accuracy when
applied to several very large datasets
Empirical
Bayes
Screening for Link Analysis
(2003)
An algorithm for discovering top N strange co-occurences of size 2,3,4,
etc Uses ideas of frequent sets, but stratifies them according to a
statistically justified hierarchical bayes model, using empirical bayes
to find the parameters
Tractable
Group
Detection on Large Link Data Sets
(2003)
We present the k-groups algorithm, an improvement of the GDA algorithm
that includes significant computational advantages. The k-groups
algorithm allows tractable group detection on large data sets.
Stochastic
Link and
Group Detection
(2002)
This paper introduces the GDA algorithm. We use noisy link data
(n-tuples of entities) to learn underlying groupings of entities.
Locally
Weighted
Learning
(1997)
Survey of the use of kernel functions in kernel regression, locally
weighted regression and related function approximators.