@@ -190,3 +190,33 @@ Missing data
190
190
191
191
Using mean imputing;
192
192
-------------------------------------------------------------------------------------------------------------------------------
193
+ 5. Tesing the Model's Accuracy
194
+ -------------------------------
195
+ * Statistics are only data. We define what is good or bad;
196
+ * Performance Improvement Options
197
+ -------------------------------
198
+ a. Adjust current algorithm;
199
+ b. Get more data or improve data;
200
+ c. Improve training;
201
+ d. Switch algorithms;
202
+
203
+ * Random Forest
204
+ ----------------
205
+ -> Ensemble Algorithm;
206
+ -> Fits multiple trees with subsets of data;
207
+ -> Average tree results to improve performance and control overfitting;
208
+
209
+ - Train with training data: y = x1 + w2 * (x2)^3 + w3 * (x3)^8
210
+ - complex decision boundary;
211
+ - good fit of training data;
212
+ - poor fit of test data;
213
+ - Overfitting;
214
+
215
+ Fixing Overfitting
216
+ ------------------
217
+ * Regularization hyperparameter
218
+ y = x1 + w2 * (x2)^3 + w3 * (x3)^8 - f(W)/(lambda)
219
+ y=x1+w2 * x2^3+w3 * x3^8−f(W)/λ
220
+ - Cross validation;
221
+ - Bias - variance trade-off;
222
+ - Sacrifice some perfection for better overall performance;
0 commit comments