Commit 8abf811

authored

Update Introduction to Machine Learning

1 parent a78f324 commit 8abf811Copy full SHA for 8abf811

File tree

1 file changed

+85

-2

lines changed

Introduction to Machine Learning

1 file changed

+85

-2

lines changed

`‎Introduction to Machine Learning`

Lines changed: 85 additions & 2 deletions

Original file line number	Diff line number	Diff line change
`@@ -30,6 +30,8 @@ An orchestrated and repeatable pattern which systematically transforms and proce`
`30`	`30`	`4. Training the model;`
`31`	`31`	`5. Testing the model;`
`32`	`32`
	`33`	`+------------------------------------------------------------------------------------------------------------------------`
	`34`	`+`
`33`	`35`	`1. Asking the Right Question`
`34`	`36`	`-----------------------------`
`35`	`37`	`a. Define scope (including data sources);`
`@@ -45,7 +47,9 @@ d. Define how solution is created;`
`45`	`47`	`- Use the Machine Learning Workflow to process and transform Pima Indian data to create a predictin model. This model`
`46`	`48`	`must predict whih people are likely to develop diabetes with 70% or greater accuracy.`
`47`	`49`
`48`		`- 2. Preparing data`
	`50`	`+---------------------------------------------------------------------------------------------------------------------------`
	`51`	`+`
	`52`	`+2. Preparing data`
`49`	`53`	`---------------------`
`50`	`54`	`a. Tidy Data`
`51`	`55`	`- Tidy datasets are easy to manipulate, model and visualize,and have a specific structure:`
`@@ -64,6 +68,9 @@ Data Rule #2:`
`64`	`68`	`* Columns to eliminate - Not used, no values, duplicates;`
`65`	`69`	`* Correlated columns - Same information in different format, add little value, and cause algorithm to get confused;`
`66`	`70`	`* Modling Data - Adjusting data types, creating columns, if required;`
	`71`	`+* Dealing with missing data -`
	`72`	`+ - Ignore it - Algorithms may fail;`
	`73`	`+ - Impute it - update to "reasonable" values - Most frequent, Mean, Median, Expert reasonable value;`
`67`	`74`
`68`	`75`	`Data Rule #3:`
`69`	`76`	`----------------`
`@@ -73,7 +80,83 @@ Data Rule #4:`
`73`	`80`	`--------------`
`74`	`81`	`Track how to manipulate data;`
`75`	`82`
`76`		`-3.`
	`83`	`+-------------------------------------------------------------------------------------------------------------------------`
	`84`	`+`
	`85`	`+3. Selecting the algorithm:`
	`86`	`+------------------------------`
	`87`	`+Role of the Algorithm`
	`88`	`+ - fit the training set and predict on the read data;`
	`89`	`+ - (fit()) training data -> Algorithm -> model;`
	`90`	`+ - (predict()) real data -> Model -> result;`
	`91`	`+`
	`92`	`+ Over 50 algorithms`
	`93`	`+ - algorithm selection`
	`94`	`+ *. Compare factors;`
	`95`	`+ *. Difference of opinions about which factors are important;`
	`96`	`+ *. Develop your own factors;`
	`97`	`+`
	`98`	`+Algorithm Decision Factors`
	`99`	`+--------------------------`
	`100`	`+i. Learning Type`
	`101`	`+ii. Result`
	`102`	`+iii. Complexity`
	`103`	`+iv. Basic vs Enhanced`
	`104`	`+`
	`105`	`+i. Learning Type:`
	`106`	`+"Use the Machine Learning Workflow to process and transform Pima Indian data to create a "prediction model". This model must`
	`107`	`+predict which people are likely to develop diabetes with 70% or greater accuracy."`
	`108`	`+`
	`109`	`+-> Prediction Model => Supervised machine learning;`
	`110`	`+Over 28 algorithms`
	`111`	`+`
	`112`	`+ii. Result`
	`113`	`+a. Regression - constinuous vales;`
	`114`	`+b. Classification - discrete values;`
	`115`	`+`
	`116`	`+"Use the Machine Learning Workflow to process and transform Pima Indian data to create a prediction model. This model must`
	`117`	`+"predict which people are likely to develop diabetes" with 70% or greater accuracy."`
	`118`	`+`
	`119`	`+- Diabetes`
	`120`	`+- Binary (True/False)`
	`121`	`+- Algorithm must support classification - Binary classification;`
	`122`	`+** Over 20 algorithms;`
	`123`	`+`
	`124`	`+iii. Complexity`
	`125`	`+- Keep it simple;`
	`126`	`+- Eliminate ensemble algorithms - Container algorithm; Multiple child algorithm, boost performance, Can be difficult to debug;`
	`127`	`+** Over 14 algorithm;`
	`128`	`+`
	`129`	`+iv. Enhanced vs. Basic`
	`130`	`+- Enhanced - variation of basic, performance improvements, additional functionality, more complex;`
	`131`	`+- Basic - simpler, easier to understand;`
	`132`	`+`
	`133`	`+Candidate Algorithms`
	`134`	`+--------------------`
	`135`	`+a. Naive Bayes;`
	`136`	`+b. Logistics Regression;`
	`137`	`+c. Decision Tree;`
	`138`	`+`
	`139`	`+a. Naive Bayes - Based on likelihood and probability; every feature has same weight; requires smaller amount of data;`
	`140`	`+b. Logistic Regression - Binary result, relation between features are weighted;`
	`141`	`+c. Decision Tree - Binary tree, node contains decision, requires enough data to determine nodes and splits;`
	`142`	`+`
	`143`	`+Selected algorithm - Naive Bayes`
	`144`	`+-------------------------------`
	`145`	`+Simple - easy to understand;`
	`146`	`+Fast - up to 100X faster;`
	`147`	`+Stable to data changes;`
	`148`	`+`
	`149`	`+Overview`
	`150`	`+----------`
	`151`	`+Lots of algorithms available`
	`152`	`+`
	`153`	`+Selected based on`
	`154`	`+- Learning = Supervised`
	`155`	`+- Result = Binary classification`
	`156`	`+- Non-ensemble`
	`157`	`+- Basic`
	`158`	`+`
	`159`	`+`
`77`	`160`
`78`	`161`
`79`	`162`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 8abf811

File tree

1 file changed

1 file changed

`‎Introduction to Machine Learning`

0 commit comments