Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 2f9aa21

Browse files
Update GeneralMLPrep.md
1 parent c6515d8 commit 2f9aa21

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed

‎DataScience/GeneralMLPrep.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,45 @@ Disadvantages:
5858
* Prone to overfitting, especially with deep trees.
5959
* Sensitive to small variations in data.
6060

61+
Random Forest
62+
==========
63+
* Random Forest is an ensermble technique, that combines multiple decision trees
64+
* It mitigates overfitting by averaging the results of many tree, which indivudually may have high variance
65+
66+
Building a Random Forest:
67+
==========
68+
BTA (pronounced as "beta"): BootStrapSampling; TreeConstruction; Aggregation
69+
* Bootstrap Sampling: Randomly select subsets of the training data with replacement to create multiple datasets.
70+
* Tree Construction: For each subset, build a decision tree using a random selection of features at each split.
71+
* Aggregation: During prediction, aggregate the results from all trees (e.g., majority vote for classification or average for regression)
72+
73+
Advantages:
74+
==========
75+
* Reduces overfitting compared to individual decision trees.
76+
* Handles large datasets with higher dimensionality well.
77+
* Provides feature importance scores.
78+
79+
Disadvantages:
80+
==========
81+
* More complex and less interpretable than single decision trees.
82+
* Requires more computational resources.
83+
84+
Bagging or (B)ootstrap (Agg)regating
85+
====================================
86+
* This is an ensemble technique aimed at improving the accuracy and stability of ML models
87+
* It is done by combining multiple models trained on different subsets of the training data
88+
89+
How Bagging Works:
90+
===============
91+
* Multiple Samples: Generate multiple bootstrap samples from the original dataset.
92+
* Model Training: Train a separate model (e.g., decision tree) on each bootstrap sample.
93+
* Final Prediction: Aggregate predictions from all models (e.g., majority voting for classification)
94+
95+
Advantages:
96+
==========
97+
* Reduces variance and helps prevent overfitting.
98+
* Improves model robustness against noise in data.
99+
100+
Disadvantages:
101+
=================
102+
* May not significantly improve performance if base learners are not diverse.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /