Zoltán Gyöngyi, Hector Garcia-Molina.
Web Spam Taxonomy.
First International Workshop on Adversarial Information Retrieval on the
Web (at the 14th
International World Wide Web Conference), Chiba, Japan, 2005.
Zoltán Gyöngyi, Hector Garcia-Molina and Jan Pedersen.
Combating Web Spam with TrustRank.
30th International Conference on Very Large Data Bases (VLDB),
Toronto, Canada, 2004.
Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, Jan Pedersen.
Link Spam Detection Based on Mass Estimation.
32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, 2006.
paper,
presentation
S. Dumais, M. Banko, E. Brill, J. Lin and A. Ng
(2002). P. Bennett, S. Dumais and E. Horvitz (2002).
Web question answering: Is more always better? In Proceedings of SIGIR'02, Aug 2002,
pp. 291-298.
E. Cohen. Size-Estimation Framework with Applications to
Transitive Closure and Reachability. Journal of Computer
and System Sciences55 (1997), pp. 441-453.
M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. Ullman,
``Computing
Iceberg Queries Efficiently,''
1998 VLDB.
Postscript.
H. Toivonen, ``Sampling Large Databases for Association Rules,''
VLDB 1996, pp. 134-145.
Postscript.
J. S. Park, M.-S. Chen, and P. S. Yu, ``An Effective Hash-Based Algorithm
for Mining Association Rules,''
1995 SIGMOD, pp. 175--186.
PDF
R. Agrawal, T. Imielinski, A. Swami: ``Mining Associations between Sets of Items
in Massive Databases'', Proc. of the ACM
SIGMOD Int'l Conference on Management of Data,
Washington D.C., May 1993, 207-216.
Postscript.
PDF.
R. Agrawal, R. Srikant: ``Fast Algorithms for Mining Association Rules'',
Proc. of the 20th Int'l Conference on Very Large
Databases, Santiago, Chile, Sept. 1994.
Postscript.
PDF.
Stream Mining
M. Datar, A. Gionis, P. Indyk, and R. Motwani,
"Maintaining Stream Statistics Over Sliding Windows,"
SIAM J. Computing, 31 (2002): 1794-1813.
On-Line.
N. Alon, Y. Matias, and M. Szegedy,
"The Space Complexity of Approximating Frequency Moments,"
28th STOC, pp. 20-29, 1996.
P. Flajolet and G. N. Martin,
"Probabilistic Counting for Database Applications,"
JCSS 31:2 (Sept., 1985), pp. 182-209. Also 24th FOCS,
pp. 76-82, 1983.
J. Vitter,
"Random Sampling with a Reservoir,"
ACM Trans. on Mathmatical Software 11:1 (1985), pp. 37-57.
Babcock et al.,
"Models and Issues in Data Streams,"
21st PODS (2002).
On-line.
Clustering
B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan,
"Maintaining Variance and k-Medians Over Data Stream Windows,"
2003 PODS. See PS, PDF, etc..
P. Bradley, U. Fayyad, and C. Reina, ``Scaling Clustering Algorithms to
Large Databases,'' 1998 KDD.
S. Guha, R. Rastogi, and K. Shim, ``CURE: An Efficient Clustering
Algorithm for Large Databases,'' SIGMOD 1998.