Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 8826ad3

Browse files
feat: Implement Principal Component Analysis (PCA) (#12596)
- Added PCA implementation with dataset standardization. - Used Singular Value Decomposition (SVD) for computing principal components. - Fixed import sorting to comply with PEP 8 (Ruff I001). - Ensured type hints and docstrings for better readability. - Added doctests to validate correctness. - Passed all Ruff checks and automated tests.
1 parent f528ce3 commit 8826ad3

File tree

2 files changed

+87
-0
lines changed

2 files changed

+87
-0
lines changed

‎DIRECTORY.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,7 @@
395395
* [Minimum Tickets Cost](dynamic_programming/minimum_tickets_cost.py)
396396
* [Optimal Binary Search Tree](dynamic_programming/optimal_binary_search_tree.py)
397397
* [Palindrome Partitioning](dynamic_programming/palindrome_partitioning.py)
398+
* [Range Sum Query](dynamic_programming/range_sum_query.py)
398399
* [Regex Match](dynamic_programming/regex_match.py)
399400
* [Rod Cutting](dynamic_programming/rod_cutting.py)
400401
* [Smith Waterman](dynamic_programming/smith_waterman.py)
@@ -608,6 +609,7 @@
608609
* [Mfcc](machine_learning/mfcc.py)
609610
* [Multilayer Perceptron Classifier](machine_learning/multilayer_perceptron_classifier.py)
610611
* [Polynomial Regression](machine_learning/polynomial_regression.py)
612+
* [Principle Component Analysis](machine_learning/principle_component_analysis.py)
611613
* [Scoring Functions](machine_learning/scoring_functions.py)
612614
* [Self Organizing Map](machine_learning/self_organizing_map.py)
613615
* [Sequential Minimum Optimization](machine_learning/sequential_minimum_optimization.py)
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
"""
2+
Principal Component Analysis (PCA) is a dimensionality reduction technique
3+
used in machine learning. It transforms high-dimensional data into a lower-dimensional
4+
representation while retaining as much variance as possible.
5+
6+
This implementation follows best practices, including:
7+
- Standardizing the dataset.
8+
- Computing principal components using Singular Value Decomposition (SVD).
9+
- Returning transformed data and explained variance ratio.
10+
"""
11+
12+
import doctest
13+
14+
import numpy as np
15+
from sklearn.datasets import load_iris
16+
from sklearn.decomposition import PCA
17+
from sklearn.preprocessing import StandardScaler
18+
19+
20+
def collect_dataset() -> tuple[np.ndarray, np.ndarray]:
21+
"""
22+
Collects the dataset (Iris dataset) and returns feature matrix and target values.
23+
24+
:return: Tuple containing feature matrix (X) and target labels (y)
25+
26+
Example:
27+
>>> X, y = collect_dataset()
28+
>>> X.shape
29+
(150, 4)
30+
>>> y.shape
31+
(150,)
32+
"""
33+
data = load_iris()
34+
return np.array(data.data), np.array(data.target)
35+
36+
37+
def apply_pca(data_x: np.ndarray, n_components: int) -> tuple[np.ndarray, np.ndarray]:
38+
"""
39+
Applies Principal Component Analysis (PCA) to reduce dimensionality.
40+
41+
:param data_x: Original dataset (features)
42+
:param n_components: Number of principal components to retain
43+
:return: Tuple containing transformed dataset and explained variance ratio
44+
45+
Example:
46+
>>> X, _ = collect_dataset()
47+
>>> transformed_X, variance = apply_pca(X, 2)
48+
>>> transformed_X.shape
49+
(150, 2)
50+
>>> len(variance) == 2
51+
True
52+
"""
53+
# Standardizing the dataset
54+
scaler = StandardScaler()
55+
data_x_scaled = scaler.fit_transform(data_x)
56+
57+
# Applying PCA
58+
pca = PCA(n_components=n_components)
59+
principal_components = pca.fit_transform(data_x_scaled)
60+
61+
return principal_components, pca.explained_variance_ratio_
62+
63+
64+
def main() -> None:
65+
"""
66+
Driver function to execute PCA and display results.
67+
"""
68+
data_x, data_y = collect_dataset()
69+
70+
# Number of principal components to retain
71+
n_components = 2
72+
73+
# Apply PCA
74+
transformed_data, variance_ratio = apply_pca(data_x, n_components)
75+
76+
print("Transformed Dataset (First 5 rows):")
77+
print(transformed_data[:5])
78+
79+
print("\nExplained Variance Ratio:")
80+
print(variance_ratio)
81+
82+
83+
if __name__ == "__main__":
84+
doctest.testmod()
85+
main()

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /