Bartleby Related Questions Icon
Related questions
Question
Transcribed Image Text:3. Suppose that for a particular data set, we perform hierarchical clustering using single and
complete linkages. We obtain two dendrograms.
(a) At a certain point on the single linkage dendrogram, the clusters {1,2,3} and {4,5} fuse.
On the complete linkage dendrogram, the clusters {1,2,3} and {4, 5} also fuse at a certain
point. Which fusion will occur higher on the tree, or will they fuse at the same height, or
is there not enough information to tell?
(b) At a certain point on the single linkage dendrogram, the clusters {5} and {6} fuse. On the
complete linkage dendrogram, the clusters {5} and {6} also fuse at a certain point. Which
fusion will occur higher on the tree, or will they fuse at the same height, or is there not
enough information to tell?
Expert Solution
Check MarkThis question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
bartleby
This is a popular solution
bartleby
Trending nowThis is a popular solution!
bartleby
Step by stepSolved in 3 steps
Knowledge Booster
Background pattern image
Similar questions
- 1.) The regulation of electric and gas utilities is an important public policy question affecting consumer’s choice and cost of energy provider. To inform deliberation on public policy, data on eight numerical variables have been collected for a group of energy companies. To summarize the data, hierarchical clustering has been executed using Euclidean distance as the similarity measure and Ward’s method as the clustering method. Based on the following dendrogram, what is the most appropriate number of clusters to organize these utility companies? Answer: What is the most appropriate number of clusters to organize these utility companies?arrow_forwardConsider a database organized in the form of a rooted tree. Suppose that we insert a dummy vertex between each pair of vertices. Show that, if we follow the tree protocol on the new tree, we get better concurrency than if we follow the tree protocol on the original tree.arrow_forwardConsider the following data entries: 21, 24, 4, 10, 30, 34, 12, 13, 16. Build up step by step an order 2 B+ tree index for the above data entries. Also, clearly state what makes that B+ tree order 2.arrow_forward
- You have hypothetically obtained access to a database listing the locations (in x, y coordinates) and names of all nearby restaurants. Only the following kind of questions are required: The question seeks to determine whether the given location is suitable for an eating establishment. I was wondering if you could explain why R-tree indexing was better than B-tree. Why?arrow_forwardBelow is a dataset: Data Point F1 F2 P1 1 | 2 P2 1.5 1.5 P3 1.5 P4 1.5 P5 4.5 P6 2.5 7.0 P7 3 6.5 Apply agglomerative hierarchical clustering algorithms on the above dataset. Use single link and complete link to calculate the proximity. Show the dendrogram. Choose a cutoff point and report the generated clusters based on your cutoff point.arrow_forwardDiscuss the CAP theorem and its relevance in the context of distributed database architectures.arrow_forward
- Consider the following query Q over R(A,B,C) and S(D,E,F):SELECT A,FFROM R, SWHERE C=D AND A < 1000 AND E='e'Assume there is a clustering index on E of S, an index on D of S, an index on C of R, and a clustering index on A of R. Assume the indices have equal height. Assume R and S have roughly equal sizes (in both number of blocks and number of records), and assume that σE='e'(S) is twice the size of σA<1000(R).6.a Draw the canonical query tree for the query Q.6.b Transform the canonical query tree for Q into a final query tree that is efficient to execute.6.c Describe the best query evaluation plan (with minimal cost) for Q for the information given.6.d Draw another left-deep query tree whose cost is higher than the one given in 6.c.6.e Give one example instance of relational algebra rewriting, for each rewriting rule that was used in the trees of 6.a and 6.b.arrow_forwardHow should Prim's approach to the Minimum Spanning Tree issue be used? Is one data structure inherently better than the others? Why. Explain.arrow_forwardFollowing is the table of pairwise Euclidean distances between 5 data points a b c d e a 0 b 9 0 c 3 7 0 d 6 5 9 0 e 11 10 2 8 0 Assuming we are using single-linkage ( min distance ) for hierarchical clustering. The order in which the clusters will be formed is: Group of answer choices a. Points b and d are merged first to form cluster {b, d}.Then point a is incorporated into cluster {b, d}. Then points c and e are merged into {c, e}. Finally, clusters {b, d, a} and {c, e} are merged. b. Points c and e are merged first to form cluster {c, e}.Then point b is incorporated into cluster {c, e}. Then points a and d are merged into {a, d}. Finally, clusters {c, e, b} and {a, d} are merged. c. Points c and e are merged first to form cluster {c, e}.Then point a is incorporated into cluster {c, e}. Then points b and d are merged into {b, d}. Finally, clusters {c, e, a} and {b, d} are merged. d. Points b and d are merged first to form...arrow_forward
- Suppose that the data mining task is to cluster points (with (x,y) representing location coordinates) into three clusters, where the points are A1(2,10), A2(8,4), A3(7,5), A4(1,2), A5 (2,5), A6(5,8), A7(6,4), A8(4,9). The distance function is Euclidean distance and k-means (with k=3) for clustering. Suppose initially we assign A1, A4, and A6 as the centroid for the three clusters. Show the three cluster centroids as well as a set of points in each cluster after each round until convergence.arrow_forward2) Given a Clustering task, how you can evaluate the performance on the test set and how wewould know if the clusters are correct. Explain any three possible solutions.arrow_forwardAssume that we use cosine similarity as the similarity measure. In the hierarchical agglomerative clustering (HAC), we need to define a good way to measure the similarity of two clusters. One usual way is to use the group average similarity between documents in two clusters. Formally, for two cluster C; and C, let C= C;U Cj, n = C, we define 1 sim(C,, C) = E s(x, y) Σ n.(n- 1) x.y € C, x =y Where s(x, is the cosine similarity between y) and y. Given a list of clusters C. C2, .... Cm assume that their pairwise similarities are saved in a two dimensional array **", of size m 2. Given three clusters C, C;, and Cr show that there is a way to compute sim(C;U C;, C) in constant time. Note that we ignore the dimensionality in time complexity.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios