|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "These three practice problems in the following few sections will help you prepare for the project. These problems use the same mini version of the actual chicago.csv dataset that you will use for the project." |
| 8 | + ] |
| 9 | + }, |
| 10 | + { |
| 11 | + "cell_type": "markdown", |
| 12 | + "metadata": {}, |
| 13 | + "source": [ |
| 14 | + "#### Practice Problem #1: Compute the Most Popular Start Hour\n", |
| 15 | + "\n", |
| 16 | + "Use pandas to load chicago.csv into a dataframe, and find the most frequent hour when people start traveling. There isn't an hour column in this dataset, but you can create one by extracting the hour from the \"Start Time\" column. To do this, you can convert \"Start Time\" to the datetime datatype using the pandas to_datetime() method and extracting properties such as the hour with these properties.\n", |
| 17 | + "\n", |
| 18 | + "Hint: Another way to describe the most common value in a column is the mode." |
| 19 | + ] |
| 20 | + }, |
| 21 | + { |
| 22 | + "cell_type": "code", |
| 23 | + "execution_count": null, |
| 24 | + "metadata": {}, |
| 25 | + "outputs": [], |
| 26 | + "source": [ |
| 27 | + "import pandas as pd\n", |
| 28 | + "\n", |
| 29 | + "filename = 'chicago.csv'\n", |
| 30 | + "\n", |
| 31 | + "## load data file into a dataframe\n", |
| 32 | + "df = pd.read_csv(filename)\n", |
| 33 | + "\n", |
| 34 | + "## convert the Start Time column to datetime\n", |
| 35 | + "df['Start Time'] = pd.to_datetime(df['Start Time'])\n", |
| 36 | + "\n", |
| 37 | + "## extract hour from the Start Time column to create an hour column\n", |
| 38 | + "df['hour'] = df['Start Time'].dt.hour\n", |
| 39 | + "\n", |
| 40 | + "## find the most popular hour\n", |
| 41 | + "popular_hour = df['hour'].mode()[0]\n", |
| 42 | + " \n", |
| 43 | + "print('the Most Popular Start Hour:', popular_hour)" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "metadata": {}, |
| 49 | + "source": [ |
| 50 | + "#### Practice Problem #2: Display a Breakdown of User Types\n", |
| 51 | + "\n", |
| 52 | + "There are different types of users specified in the \"User Type\" column. Find how many there are of each type and store the counts in a pandas Series in the user_types variable.\n", |
| 53 | + "\n", |
| 54 | + "Hint: What pandas function returns a Series with the counts of each unique value in a column?" |
| 55 | + ] |
| 56 | + }, |
| 57 | + { |
| 58 | + "cell_type": "code", |
| 59 | + "execution_count": null, |
| 60 | + "metadata": {}, |
| 61 | + "outputs": [], |
| 62 | + "source": [ |
| 63 | + "import pandas as pd\n", |
| 64 | + "\n", |
| 65 | + "filename = 'chicago.csv'\n", |
| 66 | + "\n", |
| 67 | + "## load data file into a dataframe\n", |
| 68 | + "df = pd.read_csv(filename)\n", |
| 69 | + "\n", |
| 70 | + "## print value counts for each user type\n", |
| 71 | + "user_types = df['User Type'].value_counts()\n", |
| 72 | + "\n", |
| 73 | + "print(user_types)" |
| 74 | + ] |
| 75 | + }, |
| 76 | + { |
| 77 | + "cell_type": "markdown", |
| 78 | + "metadata": {}, |
| 79 | + "source": [ |
| 80 | + "#### Practice Problem #3: Load and Filter the Dataset\n", |
| 81 | + "\n", |
| 82 | + "This is a bit of a bigger task, which involves choosing a dataset to load and filtering it based on a specified month and day. In the quiz below, you'll implement the load_data() function, which you can use directly in your project. There are four steps:\n", |
| 83 | + "\n", |
| 84 | + " 1- Load the dataset for the specified city. Index the global CITY_DATA dictionary object to get the corresponding filename for the given city name.\n", |
| 85 | + " 2- Create month and day_of_week columns. Convert the \"Start Time\" column to datetime and extract the month number and weekday name into separate columns using the datetime module.\n", |
| 86 | + " 3- Filter by month. Since the month parameter is given as the name of the month, you'll need to first convert this to the corresponding month number. Then, select rows of the dataframe that have the specified month and reassign this as the new dataframe.\n", |
| 87 | + " 4- Filter by day of week. Select rows of the dataframe that have the specified day of week and reassign this as the new dataframe. (Note: Capitalize the day parameter with the title() method to match the title case used in the day_of_week column!)" |
| 88 | + ] |
| 89 | + }, |
| 90 | + { |
| 91 | + "cell_type": "code", |
| 92 | + "execution_count": null, |
| 93 | + "metadata": {}, |
| 94 | + "outputs": [], |
| 95 | + "source": [ |
| 96 | + "import pandas as pd\n", |
| 97 | + "\n", |
| 98 | + "CITY_DATA = { 'chicago': 'chicago.csv',\n", |
| 99 | + " 'new york city': 'new_york_city.csv',\n", |
| 100 | + " 'washington': 'washington.csv' }\n", |
| 101 | + "\n", |
| 102 | + "def load_data(city, month, day):\n", |
| 103 | + " \"\"\"\n", |
| 104 | + " Loads data for the specified city and filters by month and day if applicable.\n", |
| 105 | + "\n", |
| 106 | + " Args:\n", |
| 107 | + " (str) city - name of the city to analyze\n", |
| 108 | + " (str) month - name of the month to filter by, or \"all\" to apply no month filter\n", |
| 109 | + " (str) day - name of the day of week to filter by, or \"all\" to apply no day filter\n", |
| 110 | + " Returns:\n", |
| 111 | + " df - Pandas DataFrame containing city data filtered by month and day\n", |
| 112 | + " \"\"\"\n", |
| 113 | + " \n", |
| 114 | + " # load data file into a dataframe\n", |
| 115 | + " df = pd.read_csv(CITY_DATA[city])\n", |
| 116 | + "\n", |
| 117 | + " # convert the Start Time column to datetime\n", |
| 118 | + " df['Start Time'] = pd.to_datetime(df['Start Time'])\n", |
| 119 | + "\n", |
| 120 | + " # extract month and day of week from Start Time to create new columns\n", |
| 121 | + " df['month'] = df['Start Time'].dt.month\n", |
| 122 | + " df['day_of_week'] = df['Start Time'].dt.weekday_name\n", |
| 123 | + "\n", |
| 124 | + " # filter by month if applicable\n", |
| 125 | + " if month != 'all':\n", |
| 126 | + " # use the index of the months list to get the corresponding int\n", |
| 127 | + " months = ['january', 'february', 'march', 'april', 'may', 'june']\n", |
| 128 | + " month = months.index(month) + 1\n", |
| 129 | + " \n", |
| 130 | + " # filter by month to create the new dataframe\n", |
| 131 | + " df = df[df['month'] == month]\n", |
| 132 | + "\n", |
| 133 | + " # filter by day of week if applicable\n", |
| 134 | + " if day != 'all':\n", |
| 135 | + " # filter by day of week to create the new dataframe\n", |
| 136 | + " df = df[df['day_of_week'] == day.title()]\n", |
| 137 | + " \n", |
| 138 | + " return df\n", |
| 139 | + " \n", |
| 140 | + "df = load_data('chicago', 'march', 'friday')" |
| 141 | + ] |
| 142 | + } |
| 143 | + ], |
| 144 | + "metadata": { |
| 145 | + "kernelspec": { |
| 146 | + "display_name": "Python 3", |
| 147 | + "language": "python", |
| 148 | + "name": "python3" |
| 149 | + }, |
| 150 | + "language_info": { |
| 151 | + "codemirror_mode": { |
| 152 | + "name": "ipython", |
| 153 | + "version": 3 |
| 154 | + }, |
| 155 | + "file_extension": ".py", |
| 156 | + "mimetype": "text/x-python", |
| 157 | + "name": "python", |
| 158 | + "nbconvert_exporter": "python", |
| 159 | + "pygments_lexer": "ipython3", |
| 160 | + "version": "3.11.0" |
| 161 | + }, |
| 162 | + "orig_nbformat": 4 |
| 163 | + }, |
| 164 | + "nbformat": 4, |
| 165 | + "nbformat_minor": 2 |
| 166 | +} |
0 commit comments