Commit 03099a3

authored

Add files via upload

1 parent f438156 commit 03099a3Copy full SHA for 03099a3

File tree

1 file changed

+119

-0

lines changed

Clustring/Dimensionality reduction
- PCA for IRIS Dataset.py

1 file changed

+119

-0

lines changed

`‎Clustring/Dimensionality reduction/PCA for IRIS Dataset.py‎`

Lines changed: 119 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,119 @@`
	`1`	`+#!/usr/bin/env python`
	`2`	`+# coding: utf-8`
	`3`	`+`
	`4`	`+# ### Principal component analysis (PCA)`
	`5`	`+#`
	`6`	+# * PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. In scikit-learn, PCA is implemented as a transformer object that learns `n` components in its fit method, and can be used on new data to project it on these components.
	`7`	`+#`
	`8`	`+# PCA centers but does not scale the input data for each feature before applying the SVD. The optional parameter whiten=True makes it possible to project the data onto the singular space while scaling each component to unit variance. This is often useful if the models down-stream make strong assumptions on the isotropy of the signal: this is for example the case for Support Vector Machines with the RBF kernel and the K-Means clustering algorithm.`
	`9`	`+#`
	`10`	`+# Below is an example of the iris dataset, which is comprised of 4 features, projected on the 2 dimensions that explain most variance:`
	`11`	`+`
	`12`	`+# In[2]:`
	`13`	`+`
	`14`	`+`
	`15`	`+get_ipython().system('pip install numpy')`
	`16`	`+get_ipython().system('pip install pandas ')`
	`17`	`+get_ipython().system('pip install matplotlib ')`
	`18`	`+get_ipython().system('pip install scikit-learn')`
	`19`	`+`
	`20`	`+`
	`21`	`+# In[3]:`
	`22`	`+`
	`23`	`+`
	`24`	`+import numpy as np`
	`25`	`+import pandas as pd`
	`26`	`+import matplotlib.pyplot as plt`
	`27`	`+`
	`28`	`+from sklearn import datasets`
	`29`	`+from sklearn.decomposition import PCA`
	`30`	`+from sklearn.discriminant_analysis import LinearDiscriminantAnalysis`
	`31`	`+`
	`32`	`+`
	`33`	`+# In[5]:`
	`34`	`+`
	`35`	`+`
	`36`	`+iris = datasets.load_iris()`
	`37`	`+`
	`38`	`+`
	`39`	`+# In[6]:`
	`40`	`+`
	`41`	`+`
	`42`	`+iris.keys()`
	`43`	`+`
	`44`	`+`
	`45`	`+# In[7]:`
	`46`	`+`
	`47`	`+`
	`48`	`+X = iris.data`
	`49`	`+y = iris.target`
	`50`	`+`
	`51`	`+`
	`52`	`+# In[8]:`
	`53`	`+`
	`54`	`+`
	`55`	`+target_names = iris.target_names`
	`56`	`+`
	`57`	`+`
	`58`	`+# In[9]:`
	`59`	`+`
	`60`	`+`
	`61`	`+X.shape # 150 rows and 4 columns`
	`62`	`+`
	`63`	`+`
	`64`	`+# In[10]:`
	`65`	`+`
	`66`	`+`
	`67`	`+pca = PCA(n_components=2) # 150 rows and 2 columns`
	`68`	`+`
	`69`	`+`
	`70`	`+# In[11]:`
	`71`	`+`
	`72`	`+`
	`73`	`+X_r = pca.fit_transform(X)`
	`74`	`+X_r.shape`
	`75`	`+`
	`76`	`+`
	`77`	`+# In[23]:`
	`78`	`+`
	`79`	`+`
	`80`	`+lda = LinearDiscriminantAnalysis(n_components=2)`
	`81`	`+X_r2 = lda.fit(X,y).transform(X)`
	`82`	`+`
	`83`	`+`
	`84`	`+# In[14]:`
	`85`	`+`
	`86`	`+`
	`87`	`+# Percentage of variance explained for each components`
	`88`	`+print('explained variance ratio (first two components): %s'`
	`89`	`+ % str(pca.explained_variance_ratio_))`
	`90`	`+`
	`91`	`+`
	`92`	`+# In[24]:`
	`93`	`+`
	`94`	`+`
	`95`	`+plt.figure()`
	`96`	`+colors = ['navy', 'turquoise', 'darkorange']`
	`97`	`+lw = 2`
	`98`	`+`
	`99`	`+for color, i, target_name in zip(colors, [0, 1, 2], target_names):`
	`100`	`+ plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw,`
	`101`	`+ label=target_name)`
	`102`	`+plt.legend(loc='best', shadow=False, scatterpoints=1)`
	`103`	`+plt.title('PCA of IRIS dataset')`
	`104`	`+`
	`105`	`+plt.figure()`
	`106`	`+for color, i, target_name in zip(colors, [0, 1, 2], target_names):`
	`107`	`+ plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color,`
	`108`	`+ label=target_name)`
	`109`	`+plt.legend(loc='best', shadow=False, scatterpoints=1)`
	`110`	`+plt.title('LDA of IRIS dataset')`
	`111`	`+`
	`112`	`+plt.show()`
	`113`	`+`
	`114`	`+`
	`115`	`+# In[ ]:`
	`116`	`+`
	`117`	`+`
	`118`	`+`
	`119`	`+`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 03099a3

File tree

1 file changed

1 file changed

`‎Clustring/Dimensionality reduction/PCA for IRIS Dataset.py‎`

0 commit comments