Commit 8826ad3

authored

feat: Implement Principal Component Analysis (PCA) (#12596)

- Added PCA implementation with dataset standardization. - Used Singular Value Decomposition (SVD) for computing principal components. - Fixed import sorting to comply with PEP 8 (Ruff I001). - Ensured type hints and docstrings for better readability. - Added doctests to validate correctness. - Passed all Ruff checks and automated tests.

1 parent f528ce3 commit 8826ad3Copy full SHA for 8826ad3

File tree

2 files changed

+87

-0

lines changed

DIRECTORY.md
machine_learning
- principle_component_analysis.py

2 files changed

+87

-0

lines changed

`‎DIRECTORY.md`

Lines changed: 2 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -395,6 +395,7 @@`
`395`	`395`	`* [Minimum Tickets Cost](dynamic_programming/minimum_tickets_cost.py)`
`396`	`396`	`* [Optimal Binary Search Tree](dynamic_programming/optimal_binary_search_tree.py)`
`397`	`397`	`* [Palindrome Partitioning](dynamic_programming/palindrome_partitioning.py)`
	`398`	`+ * [Range Sum Query](dynamic_programming/range_sum_query.py)`
`398`	`399`	`* [Regex Match](dynamic_programming/regex_match.py)`
`399`	`400`	`* [Rod Cutting](dynamic_programming/rod_cutting.py)`
`400`	`401`	`* [Smith Waterman](dynamic_programming/smith_waterman.py)`
`@@ -608,6 +609,7 @@`
`608`	`609`	`* [Mfcc](machine_learning/mfcc.py)`
`609`	`610`	`* [Multilayer Perceptron Classifier](machine_learning/multilayer_perceptron_classifier.py)`
`610`	`611`	`* [Polynomial Regression](machine_learning/polynomial_regression.py)`
	`612`	`+ * [Principle Component Analysis](machine_learning/principle_component_analysis.py)`
`611`	`613`	`* [Scoring Functions](machine_learning/scoring_functions.py)`
`612`	`614`	`* [Self Organizing Map](machine_learning/self_organizing_map.py)`
`613`	`615`	`* [Sequential Minimum Optimization](machine_learning/sequential_minimum_optimization.py)`

`‎machine_learning/principle_component_analysis.py`

Lines changed: 85 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,85 @@`
	`1`	`+"""`
	`2`	`+Principal Component Analysis (PCA) is a dimensionality reduction technique`
	`3`	`+used in machine learning. It transforms high-dimensional data into a lower-dimensional`
	`4`	`+representation while retaining as much variance as possible.`
	`5`	`+`
	`6`	`+This implementation follows best practices, including:`
	`7`	`+- Standardizing the dataset.`
	`8`	`+- Computing principal components using Singular Value Decomposition (SVD).`
	`9`	`+- Returning transformed data and explained variance ratio.`
	`10`	`+"""`
	`11`	`+`
	`12`	`+import doctest`
	`13`	`+`
	`14`	`+import numpy as np`
	`15`	`+from sklearn.datasets import load_iris`
	`16`	`+from sklearn.decomposition import PCA`
	`17`	`+from sklearn.preprocessing import StandardScaler`
	`18`	`+`
	`19`	`+`
	`20`	`+def collect_dataset() -> tuple[np.ndarray, np.ndarray]:`
	`21`	`+ """`
	`22`	`+ Collects the dataset (Iris dataset) and returns feature matrix and target values.`
	`23`	`+`
	`24`	`+ :return: Tuple containing feature matrix (X) and target labels (y)`
	`25`	`+`
	`26`	`+ Example:`
	`27`	`+ >>> X, y = collect_dataset()`
	`28`	`+ >>> X.shape`
	`29`	`+ (150, 4)`
	`30`	`+ >>> y.shape`
	`31`	`+ (150,)`
	`32`	`+ """`
	`33`	`+ data = load_iris()`
	`34`	`+ return np.array(data.data), np.array(data.target)`
	`35`	`+`
	`36`	`+`
	`37`	`+def apply_pca(data_x: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:`
	`38`	`+ """`
	`39`	`+ Applies Principal Component Analysis (PCA) to reduce dimensionality.`
	`40`	`+`
	`41`	`+ :param data_x: Original dataset (features)`
	`42`	`+ :param n_components: Number of principal components to retain`
	`43`	`+ :return: Tuple containing transformed dataset and explained variance ratio`
	`44`	`+`
	`45`	`+ Example:`
	`46`	`+ >>> X, _ = collect_dataset()`
	`47`	`+ >>> transformed_X, variance = apply_pca(X, 2)`
	`48`	`+ >>> transformed_X.shape`
	`49`	`+ (150, 2)`
	`50`	`+ >>> len(variance) == 2`
	`51`	`+ True`
	`52`	`+ """`
	`53`	`+ # Standardizing the dataset`
	`54`	`+ scaler = StandardScaler()`
	`55`	`+ data_x_scaled = scaler.fit_transform(data_x)`
	`56`	`+`
	`57`	`+ # Applying PCA`
	`58`	`+ pca = PCA(n_components=n_components)`
	`59`	`+ principal_components = pca.fit_transform(data_x_scaled)`
	`60`	`+`
	`61`	`+ return principal_components, pca.explained_variance_ratio_`
	`62`	`+`
	`63`	`+`
	`64`	`+def main() -> None:`
	`65`	`+ """`
	`66`	`+ Driver function to execute PCA and display results.`
	`67`	`+ """`
	`68`	`+ data_x, data_y = collect_dataset()`
	`69`	`+`
	`70`	`+ # Number of principal components to retain`
	`71`	`+ n_components = 2`
	`72`	`+`
	`73`	`+ # Apply PCA`
	`74`	`+ transformed_data, variance_ratio = apply_pca(data_x, n_components)`
	`75`	`+`
	`76`	`+ print("Transformed Dataset (First 5 rows):")`
	`77`	`+ print(transformed_data[:5])`
	`78`	`+`
	`79`	`+ print("\nExplained Variance Ratio:")`
	`80`	`+ print(variance_ratio)`
	`81`	`+`
	`82`	`+`
	`83`	`+if __name__ == "__main__":`
	`84`	`+ doctest.testmod()`
	`85`	`+ main()`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Commit 8826ad3

File tree

2 files changed

2 files changed

`‎DIRECTORY.md`

`‎machine_learning/principle_component_analysis.py`

0 commit comments