Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit a61e1bc

Browse files
Update GeneralMLPrep.md
1 parent 06dbd0e commit a61e1bc

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

‎DataScience/GeneralMLPrep.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,30 @@ RNNs are commonly used in:
3131
* Natural Language Processing: Tasks such as language modeling, text generation, and sentiment analysis.
3232
* Speech Recognition: Processing audio signals to convert speech into text.
3333
* Time Series Prediction: Forecasting stock prices or weather conditions based on historical data.
34+
35+
Decision Tree
36+
==========
37+
* Decision tree is a supervised ML algorithm used in classification and regression taks
38+
* It is able to model decision and possible consequences in the form of a tree like strcuture
39+
* The branch represents a `decision rule` and the internal node represents a `feature`. The leaf node or the terminal node of the branch is the `outcome`
40+
41+
Building a Decision Tree:
42+
==========
43+
DEINR (pronounced as "Diner") : Data; Entropy; InformationGain ; NodeSeletion; RecursiveSplitting
44+
* Data Input: Start with the entire dataset.
45+
* Entropy Calculation: Calculate the entropy of the target variable and predictor attributes to measure impurity.
46+
* Information Gain: Determine the information gain for each attribute to identify which feature best splits the data.
47+
* Node Selection: Choose the attribute with the highest information gain as the root node.
48+
* Recursive Splitting: Repeat this process recursively for each branch until all branches are finalized or a stopping criterion is met (e.g., maximum depth or minimum samples per leaf)
49+
50+
Advantages:
51+
==========
52+
* Easy to interpret and visualize.
53+
* Requires little data preprocessing (no need for normalization).
54+
* Can handle both numerical and categorical data.
55+
56+
Disadvantages:
57+
============
58+
* Prone to overfitting, especially with deep trees.
59+
* Sensitive to small variations in data.
60+

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /