CS345, Autumn 2006: Data Mining.

Course Info | Handouts | Assignments | Project | Course Outline | Resources and Reading | Frequently Asked Questions


Course Information

Final Exam: The final exam will be on Wed, Dec 13th, from 12:15pm-3:15pm. Location: 380-380X (in the basement of the math corner). The final is open book and open notes, but laptops are prohibited. Bring a calculator. Here is last year's final.

The Final for 2006.

Instructors: Anand Rajaraman (anand @ kosmix dt com), Jeffrey D. Ullman (ullman @ gmail dt com).

TA: Jeff Klingner

Email Address for Questions: cs345a-aut0607-staff @ lists dt stanford dt edu (This is the best way to reach all three of us simultaneously)

Meeting: MW 3:15 - 4:30PM; Room: 200-030 (In the history corner, the part of the quad closest to Hoover tower.)

Office Hours: Instructors will be available after classes that they teach. Jeff Ullman is in 433 Gates and Anand in 413 Gates. Jeff Klingner's office hours: Tuesdays 10am-noon & Thursdays 3pm-5pm, Gates 396, or by appointment.

Prerequisites: CS145 or equivalent.

Materials: There is no text, but students will use the Gradiance automated homework system for which a nominal fee will be charged. Notes and/or slides will be posted on-line. You can see earlier versions of the notes and slides covering Data Mining. Not all these topics will be covered this year.

Requirements: There will be periodic homeworks (some on-line, using the Gradiance system), a final exam, and a project on web-mining, using the Stanford WebBase. The homework will count just enough to encourage you to do it, about 20%. The project and final will account for the bulk of the credit, in roughly equal proportions.

Newsgroup: There is a class newsgroup: su.class.cs345a on nntp.stanford.edu. You can use the newsgroup to share datasets, form study groups, or find project partners. The course staff will not read the newsgroup regularly, and we won't use it for any official announcements. To get in touch with us, use cs345a-aut0607-staff @ lists dt stanford dt edu.


Handouts

DateTopicPowerPoint SlidesPDF Document
9/25 Introductory Remarks PPT PDF
9/25 Introduction to Web Mining PPT PDF
9/27 Association Rules 1 PPT PDF
10/2 Association Rules 2 PPT PDF
10/4 Page Rank PPT PDF
10/9 Topic-Specific Page Rank PPT PDF
10/11 HITS and Spam PPT PDF
10/16 Near-Neighbors and Minhashing PPT PDF
10/18 Locality-Sensitive Hashing PPT PDF
10/23 Clustering - Part 1 PPT PDF
10/25 Recommendation Systems PPT PDF
10/30 Clustering - Part 2 PPT PDF
11/01 Structured Data Extraction PPT PDF
11/06 Virtual Databases PPT PDF
11/06 Compact Skeletons PPT PDF
11/13 Online Algorithms, Search Advertising PPT PDF
11/15 Stream Mining 1 PPT PDF
11/27 Stream Mining 2 PPT PDF
11/27 Stream Mining 3 PPT PDF
11/29 Stream Mining 4 PPT PDF

Assignments

Some of the homework will be on the Gradiance system. You should go there to open your account, and enter the class code that will be told to you in class. You can try the work as many times as you like, and we hope everyone will eventually get 100%. The secret is that each of the questions involves a "long-answer" problem, which you should work. The Gradiance system gives you random right and wrong answers each time you open it, and thus samples your knowledge of the full problem. While there are ways to game the system, we group several questions at a time, so it is hard to get 100% without actually working the problems. Also notice that you have to wait 10 minutes between openings, so brute-force random guessing will not work.

Solutions appear after the problem-set is due. However, you must submit at least once, so your most recent solution appears with the solutions embedded.

AssignmentDue Date
Association Rules #1 Tuesday, Oct. 10 (11:59PM)
Association Rules #2 Wednesday, Oct. 11 (11:59PM)
Page Rank Monday, Oct. 16 (11:59PM)
Minhashing, LSH Wednesday, Oct. 30 (11:59PM)
HITS, TSPR, Spam Monday Oct. 30 (11:59PM)
Distance Measures Monday, Nov. 6 (11:59PM)
Recommendation Systems Wednesday, Nov. 8 (11:59PM)
Clustering Monday, Nov. 13 (11:59PM)
Stream Mining Wednesday, Dec. 6 (11:59PM)

Project

CS345A Project specification:

Presentation Schedule

DateTimePresenter(s)Project Title
12/43:15-4:00Gred LindenGuest Lecture: Amazon's Recommendation Engine
12/44:00-4:10Abhita Chugh and Ravi Tiruvury Detecting Web Spam with CombinedRank
12/44:10-4:20Rahul Thathoo and Zahid KhanTowards Implementing Better Movie Recommendation Systems
12/44:20-4:30Brian Tran and Minho KimTopic Specific Recommendation
12/44:30-4:40David ReissIdentifying terms with similar meanings across corpora
12/44:40-4:50NielFred PicciottoFinding Interesting Videos Early via Trend-Setting Viewers
12/44:50-5:00Sean KandelWeb Data Extraction Using Tag Trees
12/45:00-5:10Priyank ChodisettiA shot at Netflix Challenge - Hybrid Recommendation System
12/63:15-3:25Hayato AkatsukaWeather Mining
12/63:25-3:35Alex GiladiUsing LSH for motion estimation
12/63:35-3:45Joseph BonneauSports Peformance and Salary
12/63:45-3:55Negin NejatiWeb Mining for Extracting Relations
12/63:55-4:05Vincenzo Di Nicola and Jyotika Prasad42: A Web Based Question Answering System
12/64:05-4:15Manjunath RajashekharFrequent Itemsets Mining in Distributed Wireless Sensor Networks
12/64:15-4:25Hao LiuClustering Based News Event Detection and Tracking
12/64:25-4:35Jack ChengImprovements on Netflix Recommendation System Using Data-mining Algorithms
12/64:35-4:45Arpit Aggarwal and Omkar MateRecommendation System for Portfolio Management
12/64:45-4:55Romain ColleNear-duplicates detection: Comparison of the two algorithms seen in class
12/64:55-5:05Alan Sheinberg and Greg NelsonNetflix Challenge: Combined Collaborative Filtering
12/65:05-5:15Fred WulffCourse Helper: A Course Recommendation System

Course Outline

Here is a tentative schedule of topics:

DateTopicLecturer
09/25 Introduction JDU, AR
09/27 Association Rules JDU
10/02 Association Rules JDU
10/04 Link Analysis AR
10/09 Link Analysis AR
10/11 Spam Detection AR
10/16 Minhashing, Shingles JDU
10/18 LSH JDU
10/23 Clustering JDU
10/25 Recommendation Systems AR
10/30 Clustering JDU
11/01 Extracting Structured Data from the Web AR
11/06 Extracting Structured Data from the Web AR
11/08 Data Visualization JK
11/13 Advertising on the web AR
11/15 Stream Mining JDU
11/27 Stream Mining JDU
11/29 Stream Mining JDU
12/04 Project Reports students
12/06 Project Reports students
12/13 Final Exam, 12:15pm - 3:15pm

Resources and Readings

AltStyle によって変換されたページ (->オリジナル) /