completed by Di Wu, Samuel Chao, Bry Power
Final project for CSCI E-107 / Bio 260 Introduction to Data Science. AirBnb New User Bookings
- finalproject.Rmd
- finalproject.html
- SAS code.sas file contains data cleaning and manipulation in SAS
- data.zip contains original data files from kaggle
- ModelDst.csv contains cleaned data used for modeling
Here is the motivation and project objective:
As we’ve entered the digital era, personalized marketing became more and more relevant for promoting the profits of any customer facing company. To be able to accurately predict what customers might need is a key to achieve a higher margin, and this is where data science comes to help. In this project our main task is to predict in which country a new user will make his/her first booking.
Another reason why we chose this project is that we want to apply what we’ve learned in class to the real, popular and interesting data. AirBnB provided us exactly those features. They successfully leveraged data science and transformed theory into real profits, and we want to be part of that and discover their data. It’s simply exciting to work with data from a company such as AirBnB that is at the forefront of data science. In this project, we will not only work with numerical data but also time and categorical data. This means we can learn more about how to handle more types of data.
There is also social component to a service like AirBnB that is very relevant to young millennials like ourselves who have grown up with social networks and crowdsourcing. Especially now, we’re more connected than ever before.
The objective of this project is to leverage predictive statistics to identify the destinations of first-time users. There are 12 possible outcome countries: United States, France, Canada, United Kingdom, Spain, Italy, Portugal, Netherland, Germany, Australia, NDF (no destination found) and other. It is then a multi-class classification problem. The inferential part of this project probably will be to identify which features affect users’ decision the most. We might have some assumptions such as whether if gender or originate country will be influential to his/her decision along the way once we’ve done more exploration. We might also be interested in learning how time features impacted on the booking behaviors. At this point in time before we dive into the data, it is difficult to tell what kind of inferences we can make, but this intuition is a start.
From the perspective of their data science team, the benefits of understanding the booking behavior and the key drivers behind it are helping marketing department better position their strategy, personalized the contents of marketing material and lifting the revenue. For us, this is exciting because we will gain insights from data exploration analysis, figuring out how to present it and apply and discover many modeling techniques to a real project.
For detailed data description, please find it here