This is the errata sheet for the first edition. It is no longer being maintained.
For errata in the second edition, please see
.
Page numbers refer to the pages in the book's hardcopy edition, not the downloads.
We shall endeavor to keep the downloads up-to-date.
SectionLocationProblemReported ByDate Reported
1.1.5
p. 4. l. 13
"orignal" should be "original".
Ed Knorr
3/5/12
1.4
p. 16, 3 lines above Sect. 1.5
Delete "what".
Rok Sosic
3/18/13
1.5
p. 16, l. -10
"many" should be "many instances".
Rok Sosic
3/18/13
2.2.2
p. 23, l. 1. 2
"Grouping and aggregation" should better be referred to as "grouping by key".
Rok Sosic
3/18/13
2.3.2
p. 27, Fig. 2.4 caption
"fives" should be "five".
Ed Knorr
3/5/12
2.3.10
p. 35, l. 13
right parenthesis missing at the end
Anastasios Gounaris
7/10/13
2.3.11
p. 36, l. 12
"However, it" should be "However, in".
Waleed Hameid
5/1/12
2.4.1
p. 37, l. -7
"ts" should be "its".
Aris Anagnostopoulos and Rok Sosic
3/2/13
2.4.2
p. 39, l. 20
R should be P twice in the displayed expression.
Aris Anagnostopoulos
3/2/13
2.5.1
p. 43, l. 14
"gigabit" should be "one gigabit per second".
Ed Knorr
3/5/12
2.5.1
p. 44, l. 15
"of" should be "on".
Anastasios Gounaris
7/10/13
2.5.2
p. 44, l. -12
"not use" should be "not use it".
Rok Sosic
3/18/13
2.5.3
p. 46, Fig. 2.8
"h(T.C)=1" should be "g(T.C)=1".
Waleed Hameid
4/29/12
2.5.3
p. 48, l. -2 of the box
Delete "inversely".
Rok Sosic
3/18/13
3.4.2
p. 69, l. 9
The "threshold" should be defined as the value of s for which the
probability of being a candidate reaches 1/2.
Rok Sosic
3/18/13
3.4.2
p. 69, l. -4
0.328 should be 0.672.
Robert West
5/2/12
3.6.3
p. 81, l. -11
"0.2 and 0.6" should be "0.8 and 0.4".
Amitabh Chaudhary
4/6/14
3.6.3
p. 82, l. 6 below figure
The figures given there are actually for a use of Example 3.19 followed by Example 3.18.
If we use Example 3.18 first, and then Example 3.19, we get (0.2, 0.8, .9991285, 0.0000004).
Zhou Jingbo
7/11/13
3.7.2
p. 85, l. 1
d2/180 should be (180-d2)/180.
Nicholas Zhao
1/21/13
3.7.4
p. 86, l. -3
Remove d from d cos θ
Wang Bin
6/7/12
3.9.6
p. 101, l. 16, 17
"prefix" should be "suffix" in both lines.
Weng Zhen-Bin
11/7/13
3.9.6
p. 102, l. 13
Right parenthesis needed after 9+j.
Weng Zhen-Bin
11/7/13
4.2.1
p. 113, l. -15
capitalize "for".
Rok Sosic
3/18/13
4.2.2
p. 113, l. -3, .2
"URL's" should be "IP addresses".
Rok Sosic
3/18/13
4.4.1
p. 119, l. 7, 8, and 10
"URL" should be "IP address".
Aris Anagnostopoulos and Rok Sosic
3/2/13
4.5.1
p. 122, l. 21
Delete one "the".
Rok Sosic
3/18/13
4.5.3
p. 124
All the occurrences of X.value in this section should be 2*X.value - 1.
Ge Qi, Greg Lee
11/4/14
4.5.6
p. 126, l. -2
"Exercise 4.7" should be "Exercise 4.5.3".
Wang Bin
6/7/12
4.5.6
p. 127, l. 3
"induction on n" should be "induction on m".
Wang Bin
6/7/12
4.6.2
p. 128, bottom
There are actually six rules needed. The sixth is that every position with
a 1 is in some bucket.
Aris Anagnostopoulos
3/2/13
4.6.3
p. 129, l. -14
"log n" should be "log N".
Wang Bin
6/7/12
4.6.4
p. 129, l. -4
"highest" should be "earliest" or "lowest".
Robert West
5/2/12
4.6.6
p. 131, l. -3
The condition must also be relaxed for the buckets of size 1. There may be
any number between 1 and r of these too.
Aris Anagnostopoulos
3/2/13
5.4.2
p. 165 l. 11-12
These lines are better expressed: "of the fraction of the Web, m/n, that
is in the spam farm."
Aris Anagnostopoulos
3/2/13
5.4.5
p. 166, l. -15
"measure" should be "measure for each page".
Rok Sosic
3/18/13
5.5.2
p. 171, l. -1
A should be L.
Wang Bin
6/7/12
SECTION
PAGE/LINE
WHAT
WHO
WHEN
6.1.1
p. 178, l. 5-6
In Fig. 6.2, the entry for {cat, a} should be 2,3,7 and the entry for {and, a} should be
{2,7}. As a result, 4-5 lines below the figure, there should be 5 frequent pairs, including
{cat, a}. Further, in the paragraph below that, we should discover that {dog, cat, a} is
a frequent triple.
Robin Bennett
3/19/13
6.1.2
p. 179, l. 3 of box
"of" between "pairs" and "items".
Ed Knorr
3/5/12
6.1.2
p. 179, l. -2 of box
One too many left quotes.
Ed Knorr
3/5/12
6.1.3
p. 180, l. -14, -13
The confidence is actually 3/5, since "and" also appears in basket (7).
Ed Knorr
3/5/12
6.1.3
p. 181, l. 1
"That its" should be "That is".
Robert West
5/2/12
6.1.3
p. 181, l. 8
"bear" should be "beer".
Ed Knorr
3/5/12
6.2.5
p. 187, l. -4
"exceed" should be "are equal to or greater than".
Anastasios Gounaris
7/10/13
6.2.5
p. 188, l. -14
The units are bytes for both space counts.
Aris Anagnostopoulos
3/2/13
6.2.5
p. 188, l. -3, -2
"frequent pairs" should be "pairs of frequent items".
Anastasios Gounaris
7/10/13
6.3
p. 193, 195, 197
In each figure, "strucrure" should be "structure".
Ed Knorr
3/5/12
6.3.1
p. 194, l. -7, -16
"frequent pairs" should be "pairs of frequent items".
Robert West
5/2/12
6.3.2
p. 194, l. -9
"multistage" should be capitalized.
Ed Knorr
3/5/12
6.3.2
p. 196, l. 3 of box
"hash" should be "hashes".
Rok Sosic
3/18/13
6.4.2
p. 201, l. -20
"though" should be "through".
Anastasios Gounaris
7/10/13
6.4.5
p. 203, l. 22
"finite" should be "nonzero".
Aris Anagnostopoulos
3/2/13
6.5.2
p. 207, l. 21
"for that element" should be "for that item".
Wang Bin
6/7/12
6.5.3
p. 208, l. -15
Replace "this standard" by "this algorithm".
Anastasios Gounaris
7/10/13
7.1
p. 213, l. -13
"agglomerative" should be "point-assignment".
Aris Anagnostopoulos
3/2/13
7.1.3
p. 216, formula at bottom of page
The sum in the numerator should start at i = 1, not 0.
Anastasios Gounaris
7/10/13
7.1.4
p. 217, Exercise 7.1.2, l. 2
"Eclidean" should be "Euclidean".
Angad Singh
3/5/12
7.3.2
p. 228, l. 5
Period should be a comma in (12,3).
Anastasios Gounaris
7/10/13
7.3.4
p. 231, l. 7-8 of box
"we can add the squares of the components of the vector to SUMSQ to get the new SUMSQ."
Anastasios Gounaris
7/10/13
7.5.2
p. 238, l. -2
"close as" should be "as close as".
Aris Anagnostopoulos
3/2/13
7.5.4
p. 241, l. 9
Add "the clustroid of" after "and then to".
Anastasios Gounaris
7/10/13
7.6.4
p. 245, l. -10
"centroids" should be "clustroids".
Anastasios Gounaris
7/10/13
7.7
p. 250, l. 18
"DBMO" should be "BDMO".
Wang Bin
6/7/12
8.4.7
p. 268, l. 21
The exponent on e should be -fi (minus sign is missing).
Zack Taylor
8/14/12
9.1.1
p. 279, l. 1
"those" should be "all".
Rok Sosic
3/18/13
9.1.2
p. 280, l. 7 below the box
"all the many" should be "each of the".
Rok Sosic
3/18/13
9.2.2
p. 282, l. -12
"item" should be "items".
Rok Sosic
3/18/13
9.2.2
p. 282, l. -10
"yields" should be "yield".
Rok Sosic
3/18/13
9.2.4
p. 285, l. 12
"feature" should be "features".
Rok Sosic
3/18/13
9.3.1
p. 292, l. 11
.386 should be .380.
Oscar Wu
8/2/12
9.3.1
p. 292, l. 14-15
The conclusion is backward. In fact, a higher (positive) cosine means a smaller
angle and therefore greater similarity. In this case, cosine distance suggests A
is more similar to B than to C.
Oscar Wu
8/2/12
9.4.2
p. 300, l. 3
Right parenthesis missing before = sign.
Dennis Sidharta
3/5/12