Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 03099a3

Browse files
Add files via upload
1 parent f438156 commit 03099a3

File tree

1 file changed

+119
-0
lines changed

1 file changed

+119
-0
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
4+
# ### Principal component analysis (PCA)
5+
#
6+
# * PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. In scikit-learn, PCA is implemented as a transformer object that learns `n` components in its fit method, and can be used on new data to project it on these components.
7+
#
8+
# PCA centers but does not scale the input data for each feature before applying the SVD. The optional parameter whiten=True makes it possible to project the data onto the singular space while scaling each component to unit variance. This is often useful if the models down-stream make strong assumptions on the isotropy of the signal: this is for example the case for Support Vector Machines with the RBF kernel and the K-Means clustering algorithm.
9+
#
10+
# Below is an example of the iris dataset, which is comprised of 4 features, projected on the 2 dimensions that explain most variance:
11+
12+
# In[2]:
13+
14+
15+
get_ipython().system('pip install numpy')
16+
get_ipython().system('pip install pandas ')
17+
get_ipython().system('pip install matplotlib ')
18+
get_ipython().system('pip install scikit-learn')
19+
20+
21+
# In[3]:
22+
23+
24+
import numpy as np
25+
import pandas as pd
26+
import matplotlib.pyplot as plt
27+
28+
from sklearn import datasets
29+
from sklearn.decomposition import PCA
30+
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
31+
32+
33+
# In[5]:
34+
35+
36+
iris = datasets.load_iris()
37+
38+
39+
# In[6]:
40+
41+
42+
iris.keys()
43+
44+
45+
# In[7]:
46+
47+
48+
X = iris.data
49+
y = iris.target
50+
51+
52+
# In[8]:
53+
54+
55+
target_names = iris.target_names
56+
57+
58+
# In[9]:
59+
60+
61+
X.shape # 150 rows and 4 columns
62+
63+
64+
# In[10]:
65+
66+
67+
pca = PCA(n_components=2) # 150 rows and 2 columns
68+
69+
70+
# In[11]:
71+
72+
73+
X_r = pca.fit_transform(X)
74+
X_r.shape
75+
76+
77+
# In[23]:
78+
79+
80+
lda = LinearDiscriminantAnalysis(n_components=2)
81+
X_r2 = lda.fit(X,y).transform(X)
82+
83+
84+
# In[14]:
85+
86+
87+
# Percentage of variance explained for each components
88+
print('explained variance ratio (first two components): %s'
89+
% str(pca.explained_variance_ratio_))
90+
91+
92+
# In[24]:
93+
94+
95+
plt.figure()
96+
colors = ['navy', 'turquoise', 'darkorange']
97+
lw = 2
98+
99+
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
100+
plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw,
101+
label=target_name)
102+
plt.legend(loc='best', shadow=False, scatterpoints=1)
103+
plt.title('PCA of IRIS dataset')
104+
105+
plt.figure()
106+
for color, i, target_name in zip(colors, [0, 1, 2], target_names):
107+
plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color,
108+
label=target_name)
109+
plt.legend(loc='best', shadow=False, scatterpoints=1)
110+
plt.title('LDA of IRIS dataset')
111+
112+
plt.show()
113+
114+
115+
# In[ ]:
116+
117+
118+
119+

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /