Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit a78f324

Browse files
Introduction to Machine Learning
1 parent 73274f1 commit a78f324

File tree

1 file changed

+79
-0
lines changed

1 file changed

+79
-0
lines changed

‎Introduction to Machine Learning

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
Machine Learning
2+
------------------
3+
Building a model from example inputs to make data-driven predictions versus following strictly static program instructions.
4+
Application:
5+
6+
1. Email a spam?
7+
2. How can cars drive themselves?
8+
3. What will people buy?
9+
10+
Machine Learning
11+
-----------------
12+
2 categories
13+
a. Supervised;
14+
- Value Prediction
15+
- Needs training data containing value being predicted, the trained model predicts value in the new model;
16+
b. Unsupervised;
17+
- Identify clusters of like data;
18+
- Data does not contain cluster membership, but model provides access to data by cluster;
19+
20+
url -> https://www.continuum.io/downloads
21+
22+
23+
Machine Learning WorkFlow:
24+
--------------------------
25+
An orchestrated and repeatable pattern which systematically transforms and processes information to create prediction solutions.
26+
27+
1. Asking the right question;
28+
2. Preparing data;
29+
3. Selecting the algorithm;
30+
4. Training the model;
31+
5. Testing the model;
32+
33+
1. Asking the Right Question
34+
-----------------------------
35+
a. Define scope (including data sources);
36+
- Using Pima Indian Diabetes data, predict which people will develop diabetes.
37+
38+
b. Define target performance;
39+
- Using Pima Indian Diabetes data, predict with 70% or grater accuracy, which people will develop diabetes.
40+
41+
c. Define context for usage;
42+
- Using Pima Indian Diabetes data, predict with 70% or greater accuracy which people are likely to develop diabetes.
43+
44+
d. Define how solution is created;
45+
- Use the Machine Learning Workflow to process and transform Pima Indian data to create a predictin model. This model
46+
must predict whih people are likely to develop diabetes with 70% or greater accuracy.
47+
48+
2. Preparing data
49+
---------------------
50+
a. Tidy Data
51+
- Tidy datasets are easy to manipulate, model and visualize,and have a specific structure:
52+
* each variable is a column;
53+
* each observation is a row;
54+
* each type of observational unit is a table;
55+
** 50 - 80% of a ML project is spent getting, cleaning, and organizing data;
56+
57+
Data Rule #1:
58+
---------------
59+
- Closer the data is to what you are predicting, the better;
60+
61+
Data Rule #2:
62+
--------------
63+
- Data will never be in the format you need;
64+
* Columns to eliminate - Not used, no values, duplicates;
65+
* Correlated columns - Same information in different format, add little value, and cause algorithm to get confused;
66+
* Modling Data - Adjusting data types, creating columns, if required;
67+
68+
Data Rule #3:
69+
----------------
70+
Accurately predicting rare events is difficule;
71+
72+
Data Rule #4:
73+
--------------
74+
Track how to manipulate data;
75+
76+
3.
77+
78+
79+

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /