Divider
Machine Learning, 10-701 and 15-781, 2003
Tom M. Mitchell & Andrew W. Moore
School of Computer Science, Carnegie Mellon University
Fall 2003
Divider
It is hard to imagine anything more fascinating than
automated systems that improve their own
performance. The study of learning from data is
commercially and scientifically important. This
course is designed to give a graduate-level student a
thorough grounding in the methodologies,
technologies, mathematics and algorithms currently
needed by people who do research in learning and data
mining or who may need to apply learning or data
mining techniques to a target problem.
The topics of the course draw from classical
statistics, from machine learning, from data mining,
from Bayesian statististics and from statistical
algorithmics.
Students entering the class with a pre-existing
working knowledge of probability, statistics and
algorithms will be at an advantage, but the class has
been designed so that anyone with a strong numerate
background can catch up and fully participate.
Class lectures: Tuesdays & Thursdays 10:30am-11:50am, Wean Hall
7500 starting on Thursday September 4th, 2003
Review sessions: Thursdays 5:00pm- 6:15pm, Newell
Simon Hall 1305 starting on Thursday September 11st, 2003 (details)
Instructors:
Teaching Assistants:
- Ning Hu, Wean Hall 3711, x8-1557,
Office hours: Fridays 3:00pm-5:00pm
- Jiayong Zhang, NSH 4108, x8-8461,
Office hours: Tuesdays 2:00pm-4:00pm
- Rong Zhang, Wean Hall 5302,
Office hours: Wednesdays 2:00pm-4:00pm
Textbook:
Course Website (this page):
Grading:
- Final grades will be based on midterm (25%), homework (5 assignments and
2 miniprojects, 35%), and final exam (40%)
Policy on late homework:
- Homework is worth full credit at the beginning of class on the due date.
- It is worth half credit for the next 48 hours.
- It is worth zero credit after that.
- You must turn in at least 6 of the 7 homework (5 assignments and 2 miniprojects),
even if for zero credit, in order to pass the course.
- Free exemption: We will ignore your lowest homework grade for the semester.
Policy on collaboration:
- You may wish to discuss the homework with other students. If you like, you
may form groups of two students and turn in one homework solution with
up to two names on it. (Of course collaboration on exams is cheating
and grounds for immediate failure and worse!)
Homework assignments
- HW1: Decision Trees. Out Sep 18th, due 10:30am Thursday October 2nd
(duration 2 weeks).
- HW2: Neural Nets and Cross-validation. Out Oct 2nd, due 10:30am Oct 14th
(duration 12 days).
- HW3: Miniproject on Text Classification or Face Recognition. Out Oct 14th,
due 10:30am Oct 28th (duration 2 weeks).
- HW4: VC-dimension, SVM, and/or KNN. Out Oct 28th, due 10:30am Nov 4th
(duration 1 week).
- HW5: Bayes Nets. Out Nov 4th, due Nov 11th (duration 1 week).
- HW6: Miniproject (may be on any ML topic and data set). Out Nov 11th, due
10:30am Nov 25th (duration 2 weeks).
- HW7: GMM, K-means, HMM, and MDP. Out Nov 25th, due 10:30am Dec 4th
(duration 9 days).
Lecture schedule (and online slides if available)
Dates
Module 1
Instructor: Andrew Moore
- Sep. 4
- Sep. 9
- Sep. 11
- Sep. 16
- Sep. 18
- Sep. 23
Topics: (These topics will be covered during period Sep. 4 ~ Sep.
23)
Decision Trees, Probabilistic Methods, Bayes Classifiers,
Gaussians, Maximum Likelihood Estimation, Gaussian
Bayes Classifiers, Regression
Materials:
- Decision Trees (reading: Machine Learning, Chapt. 3)
- Probabilistic Data Mining and Density Estimation
- Maximum Likelihood Estimation
- Gaussian Bayes Classifiers
- Regression
Progress:
Module 1 finished.
Dates
Module 2
Instructor: Tom Mitchell
- Sep. 25
- Sep. 30
- Oct. 2
- Oct. 7
- Oct. 9
- Oct. 14 - Midterm
- Oct. 16
- Oct. 21
Topics:
Bayesian text classification, Neural nets, Cross-validation, PAC Learning,
VC-dimension, Minimum Description Lenght principle, Structural Risk Minimization
Materials:
- Bayesian text classification (reading: Machine Learning, Chapt. 6.9, 6.10)
- Neural networks (reading: Machine Learning, Chapt. 4)
- Computational Learning Theory (reading: Machine Learning, Chapt. 7)
- Overfitting, Cross Validation, MDL, Structural Risk Minimization
Progress:
Module 2 finished.
Dates
Module 3
Instructor: Andrew Moore
- Oct. 23
- Oct. 28
- Oct. 30
- Nov. 4
- Nov. 6 - No lecture
- Nov. 11
- Nov. 13
- Nov. 18 - No lecture
- Nov. 20
- Nov. 25
- Dec. 2
- Dec. 4
Topics:
KNN, Bayesian Networks: Semantics, Inference and Learning, Mixture
Models, K-Means, Hierarchical clustering, HMMs and MDPs
Materials:
- Instance-based Learning
- Bayes Net
- Gaussian Mixture Models
- K-means and Hierarchical Clustering
- Hidden Markov Models
- Markov Decision Processes
- Reinforcement Learning
Progress:
Module 3 finished.
Review sessions
Date
Time
Place
Instructor
Topic
Sep. 8 Mon
6:30pm ~ 7:45pm
WeH 7500
Andrew Moore
Sep. 11 Thu
5:00pm ~ 6:15pm
NSH 1305
Andrew Moore
Sep. 18 Thu
4:30pm ~ 5:30pm
NSH 1305
Andrew Moore
Recent Lectures Review
Sep. 25 Thu
5:00pm ~ 6:15pm
NSH 1305
Rong Zhang
Homework 1 Help Session
Oct. 2 Thu
5:00pm ~ 6:15pm
NSH 1305
Jiayong Zhang
Homework 2 Help Session
Oct. 9 Thu
5:00pm ~ 6:15pm
NSH 1305
Andrew Moore
Midterm Review
Oct. 23 Thu
5:00pm ~ 6:15pm
NSH 1305
Andrew Moore
Review VC-Dim, SVM and Memory-based Learning
Oct. 30 Thu
5:00pm ~ 6:15pm
NSH 1305
Rong Zhang
Homework 4 Help Session
Nov. 6 Thu
5:00pm ~ 6:15pm
NSH 1305
Jiayong Zhang
Homework 5 Help Session
Nov. 20 Thu
5:00pm ~ 6:15pm
NSH 1305
Andrew Moore
Review GMM and K-means
Dec. 4 Thu
2:00pm ~ 3:00pm
NSH 3305
Andrew Moore
Extra Review Session
Dec. 7 Sun
8:00pm ~ 9:00pm
NSH 3305
Andrew Moore
Final Review
Note:
- Subsequent review sessions will all be Thursdays 5:00pm-6:15pm in NSH 1305,
starting Thu Sep 11th.
Exam Schedule
- Midterm: Tuesday October 14th, 10:30am-11:50am, WeH 7500
(in class).
- Final: Monday December 8th, 8:30am-11:30am WeH 7500
Additional Resources
Here are some example questions for studying for the final. Note that these
are exams from earlier years, and contain some topics that will not appear in
this year's final. And some topics will apear this year that do not appear in
the following examples.
Note to people outside CMU
Feel free to use the slides and materials available
online here. Please email the instructors with any
corrections or improvements. Additional slides and
software are available at the Machine
Learning textbook homepage and at Andrew
Moore's tutorials page.