Related questions
Please written by computer source
Assignment 4 In this assignment you will be using the dataset released by The Department of Transportation. This dataset lists flights that occurred in 2015, along with other information such as delays, flight time etc.
In this assignment, you will be showing good practices to manipulate data using Python's most popular libraries to accomplish the following:
cleaning data with pandas make specific changes with numpy handling date-related values with datetime Note: please consider the flights departing from BOS, JFK, SFO and LAX.
Each question is equally weighted for the total grade.
import os
import pandas as pd
import pandas.api.types as ptypes
import numpy as np
import datetime as dt
airlines_df= pd.read_csv('assets\airlines.csv')
airports_df = pd.read_csv('assets\airports.csv')
flights_df_raw = pd.read_csv('assets\flights.csv', low_memory = False)
Question 1: Data Preprocessing
For this question, perform the following:
remove rows with missing values
keep flights departing from airports (ORIGIN_AIRPORT) that we want to look at (BOS, JFK, SFO and LAX)
filter out the flights that have more than 1 day delay (DEPARTURE_DELAY)
convert FLIGHT_NUMBER column type to string
SCHEDULED_DEPARTURE is coded as a float where the first two digits indicate the hour and the last two indicate the minutes. Convert this column to datetime format by combining existing columns DAY, MONTH, YEAR and SCHEDULED_DEPARTURE
add IS_DELAYEDcolumn by considering any flight above 15 minutes delay (DEPARTURE_DELAY) are delayed, and any other flight is not delayed
remove YEAR, MONTH, DAY columns
def data_preprocess(flights_df):
# YOUR CODE HERE
#raise NotImplementedError()
return df
flights_df = data_preprocess(flights_df_raw.copy())
assert len(flights_df) == 535744, "Q1: There should be 535744 observations in the flights dataframe"
Question 2
NOTE: The column to merge both dataframes are flights_df['ORIGIN_AIRPORT'] and airports_df['IATA_CODE'] and there is no ['NUM_FLIGHTS'] column in the dataframe
PLEASE MAKE SURE that the shape of the dataframe return as (4,1) AND number of counts are not equal to 105276
Merge flights_df dataframe with airports_df dataframe and return the number of departing flights (NUM_FLIGHTS) per airport (IATA_CODE) across the year.
def flights_per_airport(flights_df, airports_df):
# YOUR CODE HERE
raise NotImplementedError()
return df
num_flights_df=flights_per_airport(flights_df_raw.copy(), airports_df.copy())
assert num_flights_df.shape==(4,1), "Shape of DataFrame should be (4,1)"
assert num_flights_df.columns[0]=='NUM_FLIGHTS', "DataFrame should have a column which is called NUM_FLIGHTS"
assert num_flights_df.loc["BOS", "NUM_FLIGHTS"] == 105276, "The NUM_FLIGHTS for BOS is wrong"
PLEASE MAKE SURE that the shape of the dataframe return as (4,1)
Question 3
For this question, find the top three airline names which have high number of flights and the least percentage of delay compared to other airlines. The result should be a dataframe which has three columns AIRLINE_NAME, NUM_FLIGHTS and PERC_DELAY.
NOTE: THERE ARE NO COLUMNS NAMED AIRLINE_NAME AND PERC_DELAY ND NUM_FLIGHTS so you have create them
Hint:
percentage of delay for each airline is obtained using groupby and apply methods
merge flights_df with airlines_df to get the names of top three airlines
def top_three_airlines(flights_df, airlines_df):
# YOUR CODE HERE
raise NotImplementedError()
return df
top_three_airlines_df = top_three_airlines(flights_df_raw.copy(), airlines_df.copy())
assert sorted(list(top_three_airlines_df.columns)) == sorted(['NUM_FLIGHTS', 'PERC_DELAY', 'AIRLINE_NAME']), "Dataframe doesn't have required columns"
assert top_three_airlines_df.loc[0, 'AIRLINE_NAME'] == 'United Air Lines Inc.', "Top airline name doesn't match"
Question 4
For this question, obtain the monthly percentage of delays for each ORIGIN_AIRPORT.
Example Result:
MONTH BOS JFK LAX SFO
0 January 0.1902 0.2257 0.1738 0.xxxx
1 February 0.3248 0.xxxx 0.xxxx 0.xxxx
2 March 0.1984 0.xxxx 0.xxxx 0.xxxx
3 April 0.xxxx 0.xxxx 0.xxxx 0.xxxx
def monthly_airport_delays(flights_df):
# YOUR CODE HERE
raise NotImplementedError()
return df
monthly_airport_delays_df = monthly_airport_delays(flights_df_raw.copy())
I would like to add the csv files but can't.
I need help with this assignment.
Trending nowThis is a popular solution!
Step by stepSolved in 2 steps
- Please answer in matlab code. Download the data file AtlanticHurricanes20012020.csv, read in Matlab,and assign to the array hurrData:hurrData = readmatrix('AtlanticHurricanes20012020.csv'); Create a histogram plot showing the number of Hurricanes per year Label the x-axis Number of Hurricanes/year Label the y-axis Frequency Title the plot Hurricane Frequency Distribution 2001-2020 Save the figure as an emf file Create a bar plot showing annual hurricaines occurence Set the x = to the year; y = number of hurricanes Label the x-axis Year Label the y-axis Number of Hurricanes Title the plot Annual Hurricane Occurrence 2001-2020 Save the figure as an emf file. Create a line plot showing annual hurricaines occurence Set the x = to the year; y = number of hurricanes. The curve should be a red line with square symbols. Label the x-axis Year Label the y-axis Number of Hurricanes Title the plot Annual Hurricane Occurrence 2001-2020 Save the figure as an emf file. Plot the histogram, the bar...arrow_forwardPlease answer them in R thank you.arrow_forwardB. Please complete the following question. Make sure you are responding to all parts of the question and show all your work. As a tip let the marks assigned be a guide as to how much information is required to respond. The attached datafile awards_data.csv contains two variables: the type of program [prog] in which the student was enrolled (i.e., general, academic, or vocational) and the score on their final exam in math [math]. Use α = .05 for all analyses. 1. Use the information about the variables to develop a research question for a one-way ANOVA and conduct the analysis. Are the assumptions met? Please include the appropriate statistics or information to support your answer. What do you conclude? Present your answer in APA.arrow_forward
- How does the Use of LOOKUP function to get the directory of a store, be useful and applicable in business and other transactions?arrow_forwardGiven: (3,6) int numi, num2, newNum; double x, y; Which of the following assignments are valid? If an assignment is not valid, state the reason. a. numl = 35; b.newNum = numl - num2; c.numl = 5;num2 = 2 + numl;numl = num2 I 3; d. numl * num2 = newNum; e.x = 12 * numl - 15.3; f.numl * 2 = newNum + num2; g. x / y = x * y; h. num2 = numl % 2.0; i.newNum = static_cast<int> (x) % 5; j.x = x + y - 5; k. newNum = numl + static_cast<int> (4.6/2);arrow_forwardFocus on dictionary methods, use of functions, and good programming styleFor this assignment, you will create a glossary (dictionary) of technical terms and definitions. It will be set up as a Python dictionary structure. The file glossary_starter.py is a complete starter framework for the assignment. It includes some initial values for the dictionary. It is long because most of the code has already been written for you.Your task is to complete the five individual functions for adding and deleting terms, looking up terms, listing them, and printing out both the terms and definitions. These functions are all short, just a couple of lines, and use basic dictionary methods and techniques.arrow_forward
- Text book imageComputer Networking: A Top-Down Approach (7th Edi...Computer EngineeringISBN:9780133594140Author:James Kurose, Keith RossPublisher:PEARSONText book imageComputer Organization and Design MIPS Edition, Fi...Computer EngineeringISBN:9780124077263Author:David A. Patterson, John L. HennessyPublisher:Elsevier ScienceText book imageNetwork+ Guide to Networks (MindTap Course List)Computer EngineeringISBN:9781337569330Author:Jill West, Tamara Dean, Jean AndrewsPublisher:Cengage Learning
- Text book imageConcepts of Database ManagementComputer EngineeringISBN:9781337093422Author:Joy L. Starks, Philip J. Pratt, Mary Z. LastPublisher:Cengage LearningText book imagePrelude to ProgrammingComputer EngineeringISBN:9780133750423Author:VENIT, StewartPublisher:Pearson EducationText book imageSc Business Data Communications and Networking, T...Computer EngineeringISBN:9781119368830Author:FITZGERALDPublisher:WILEY