@@ -58,3 +58,45 @@ Disadvantages:
58
58
* Prone to overfitting, especially with deep trees.
59
59
* Sensitive to small variations in data.
60
60
61
+ Random Forest
62
+ ==========
63
+ * Random Forest is an ensermble technique, that combines multiple decision trees
64
+ * It mitigates overfitting by averaging the results of many tree, which indivudually may have high variance
65
+
66
+ Building a Random Forest:
67
+ ==========
68
+ BTA (pronounced as "beta"): BootStrapSampling; TreeConstruction; Aggregation
69
+ * Bootstrap Sampling: Randomly select subsets of the training data with replacement to create multiple datasets.
70
+ * Tree Construction: For each subset, build a decision tree using a random selection of features at each split.
71
+ * Aggregation: During prediction, aggregate the results from all trees (e.g., majority vote for classification or average for regression)
72
+
73
+ Advantages:
74
+ ==========
75
+ * Reduces overfitting compared to individual decision trees.
76
+ * Handles large datasets with higher dimensionality well.
77
+ * Provides feature importance scores.
78
+
79
+ Disadvantages:
80
+ ==========
81
+ * More complex and less interpretable than single decision trees.
82
+ * Requires more computational resources.
83
+
84
+ Bagging or (B)ootstrap (Agg)regating
85
+ ====================================
86
+ * This is an ensemble technique aimed at improving the accuracy and stability of ML models
87
+ * It is done by combining multiple models trained on different subsets of the training data
88
+
89
+ How Bagging Works:
90
+ ===============
91
+ * Multiple Samples: Generate multiple bootstrap samples from the original dataset.
92
+ * Model Training: Train a separate model (e.g., decision tree) on each bootstrap sample.
93
+ * Final Prediction: Aggregate predictions from all models (e.g., majority voting for classification)
94
+
95
+ Advantages:
96
+ ==========
97
+ * Reduces variance and helps prevent overfitting.
98
+ * Improves model robustness against noise in data.
99
+
100
+ Disadvantages:
101
+ =================
102
+ * May not significantly improve performance if base learners are not diverse.
0 commit comments