A Udacity Data Scientist Nanodegree Project
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
For this project, I was interestested in using FIFA 18 Complete Player Dataset to better understand:
Question 1: Which Nation has most number of Soccer Players collected in FIFA 18, list the top 20 Nations
Question 2: How about the age distribution of the FIFA 18 Players?
Question 3: Find out the top 10 clubs with highest total player market value, and the highest average player wage
Question 4: Choose the best squad
Question 5: Correlation between Age, Overall, Potential, Position, Club, Nationality, Special vs Value/Wage
There are 1 notebooks available here to showcase work related to the above questions. The notebooks is exploratory in searching through the data pertaining to the questions showcased by the notebook title. Markdown cells & comments were used to assist in walking through the thought process for individual steps.
PlayerAttributeData.csv - This file contains Player performance attributes (Overall, Potential, Aggression, Agility etc.) indexed by player id.
PlayerPersonalData.csv - This file contains basic Player personal attributes (Nationality, Club, Photo, Age, Wage, Value etc.)
PlayerPlayingPositionData.csv - This file contains Player preferred position and ratings at all positions.
CompleteDataset.csv - This file is the complete dataset contains all informaiton in above three dataset.
The main findings of the code can be found at the post available here.
Must give credit to Aman Shrivastava & EA Sports for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!