-
Notifications
You must be signed in to change notification settings - Fork 76
Pedagogy: add 15_optimization/038 — optimization on a histogram surrogate (classification example) #425
Description
Summary
Build 15_optimization/038_optimization_on_histogram_surrogate.ipynb as a companion to the existing 030_Classification_Optimization.ipynb. The new notebook teaches a general numerical-methods pattern: when iterative parameter optimization is expensive over
The pattern was used in Ch. 6 of:
Lee, K. (2004). Longitudinal Driver Model and Collision Warning and Avoidance Algorithms Based on Human Driving Databases. Ph.D. dissertation, Mechanical Engineering, University of Michigan.
There, two 5D histograms reduced 3.4 M data points (~400 MB) to ~1.9 MB while approximating the confusion matrix used by fmincon with 1000 random initial conditions. The same pattern transplants cleanly to a 2D classification example for teaching.
Pedagogical arc
- Motivate the problem: iterative optimization (multi-start, hyperparameter sweeps) makes the per-point cost-function scan the bottleneck
- Build per-class histograms from a 2D training set (two overlapping Gaussians, ~10ドル^6$ points) — visualize side-by-side heatmaps analogous to the dissertation's Fig. 6.2 "Threatening" / "Safe" panels
- Surrogate confusion matrix:
$\widehat{TP}, \widehat{FP}, \widehat{TN}, \widehat{FN}$ as weighted sums over cell centroids (the algebra of Eqs. 38–39 in Ch. 6, restated for classification) - Train a logistic regression model two ways — on raw data vs on the surrogate — and overlay the decision boundaries
- Bias-vs-speed tradeoff via an
ipywidgetsslider on grid resolution$n \times n$ , with a live Pareto plot - Verify-on-truth step: re-evaluate the surrogate-trained model point-by-point on the full dataset; tabulate the confusion matrix both ways
- Train / evaluation split for overfitting check (mirrors Ch. 6 §6.1's split-half analysis)
Scope decisions
- Classifier: logistic regression (clean weighted log-loss on the surrogate) — likely. SVM is more visual but heavier.
- Grid: uniform for the first notebook. Log-spaced (per dimension) and quantile-spaced are mentioned as alternatives but not implemented in 038.
- Cost function: choose one of (a) the geometric mean of TP and precision, mirroring Ch. 6 Eq. 40, or (b) sklearn-style weighted log-loss. Signpost the other in the closing.
Curricular hooks
- Bridges
20_probability/histograms (per-class joint densities) and15_optimization/(optimization on a surrogate) - Direct follow-on to
030_Classification_Optimization.ipynb - Train-on-surrogate, verify-on-truth pattern is a general engineering-numerical-methods lesson reusable elsewhere
Out of scope (parked as future threads)
- Adaptive
$C$ -groups (k-means coreset) — extends the rectangular grid to data-adaptive cells, the move that escapes the curse of dimensionality. Would live as a sibling notebook (~039) once 038 lands. - Soft-assignment / Gaussian-mixture surrogate
- Boundary-aware adaptive refinement (cluster misclassified points after a first pass)
- Sufficient-statistics-per-cell extension (Bradley, Fayyad, Reina 1998), the high-D successor to histograms