This is the repo contains an evolving open source book " Efficient Python for Data Science, Machine Learning, and Software Engineering", which is aimed to cover topics from advanced python programming, advanced usage of python libraries like Numpy, Pandas, Scipy, Scikit, Keras, Tensorflow, Spark and Hadoop etc., to cases studies of data science & machine learning applications and big data architecture design & engineering in the Python framework.
Leading contributors: wyardt(https://github.com/wyardt, mainly spiritual support, haha), yangyutu(https://github.com/yangyutu)
The tentative chapters are
-
Pythonic Advanced Programming
-
Efficient Numpy
-
Efficient Pandas: Series
-
Efficient Pandas: DataFrame
-
Efficient Linear Algebra via Numpy and Scipy
-
Mathematical Optimization via Scipy
-
Data Visualization
-
Linear Models For Regression
-
Linear Models For Classification
-
Tree Methods
-
Support Vector Machines
-
Ensemble Learning Methods
-
Unsupervised Learning
-
Deep Learning via Keras and Tensorflow
-
Using Apache Spark
-
Modern Big Data Processing with Hadoop
-
Databases