Airbnb is an online marketplace for people to rent their own places for people who need a roof under their heads. And like every other tech-startup evolving into a giant, it has its origin in Silicon Valley, well almost. In 2005 and approximately 5000 km east of Silicon Valley in Providence, Rhode Island the garage story of Brian Chesky and Joe Gebbia began.
Now more than 15 years later their marketplace covers more than 100,000 cities and 220 countries around the globe. The broad variety of properties to rent — single rooms, houses, yachts, and even castles — makes this place so special and unique. In this article, we will analyze what features rule the pricing, and additionally, we will build a model to make price predictions in a historic place like Boston, Massachusetts.
Therefore, I used data from Kaggle which has the following Airbnb activity included:
- Listings, including full descriptions and average review score
- Reviews, including unique id for each reviewer and detailed comments
- Calendar, including listing id and the price and availability for that day
This data covers more than 3500 reviews and listing information from September 2016 to September 2017 and describes the activity of homestays in Boston, MA.
The world is standing still right now, but once this pandemic is over people will start traveling around the globe and accommodations will be in demand. Hosts make money by providing single rooms in their property, the whole property, or they start an attractive business by investing in more properties to transform them into Airbnbs.
Do you also plan to earn some extra cash? Wanna know what should be taken into consideration when you start hosting accommodations on Airbnb. What features are important for guests so you as a host can recommend a reasonable price and attract more guests. So will we be able to select the right estimators to predict the price? Let’s see what can make YOU a good host.
Exploratory Data Analysis (EDA)
Let’s dive into our data and visualize what our exploration has to say. Since we want to see where most of the listings in Boston in terms of price and amounts are recorded, it’d be interesting to have a broad overview of a map of Boston.
As assumed the majority of the accommodations are listed around the center of the city with the highest price range. Seems like Boston north station, Cambridge port, Brighton, and East Boston are a very nice place to sleep-over. While the listings in the suburbs farther away from the city center are price-tagged cheaper.
So if you want to compete in a highly dense area it’s more than important to stand out with a fair price. But what makes a fair price and which of Boston's neighborhoods are highly competitive regarding the price?
It is not a surprise that the top four neighborhoods are located between the Financial District and Chinatown. Due to sightseeing and opportunities to go out, these locations are a hot spot for business people or tourists who are willing to pay a higher price for a night. Hosts are likely to ask between $200 and 250$ per night on average, while in the suburbs you ask between $50 and $100. So if you are a tourist on a budget, but you still would like to spend a night in the marvelous downtown of Boston, you can do that.
As seen in Fig. 2 the prices range from $11 to $400 in Downtown and thanks to shared and private rooms you will be able to experience this historical city at midnight for an affordable price. On average you can stay in such rooms for $50 to $75 per night. Let’s dig into some further fancy observations.
It is nice to see that in Boston you can spend the same money on either an apartment downtown or a villa outside and see both sides of Boston’s vibe while spending time throughout the whole city. And if you ever wanted to spend a night in the Atlantic with the view of midnight downtown, feel free to take a boat and spend a miraculous overnight-stay.
How well can we predict the price?
In order to attain a precise model, we need to prepare our data appropriately. Therefore we will thin out the data set by deleting all sorts of columns and rows with missing entries. Since not every feature has an impact on the price we have to filter through the features that will influence the price.
A heatmap will help to showcase which numerical features have either a positive or negative correlation between price and feature exists.
The darker a feature is labeled the higher the correlation between the price and the feature is. We can see that a higher cleaning fee will increase the price positively high. This makes sense since you will expect a high cleaning fee for larger and more exquisite accommodations, like condos or a villa. Our model will include a bunch of numerical features as well as categorical features, like the neighborhood.
For the price prediction, we will define a success metric to compare how well our ML models perform. In this case, we will look specifically at the Root Mean Square Error (RMSE). In context, we will build and train our model to achieve an RMSE of 15% or less.
Model 1: Linear Regression, RMSE = 36,36%
Model 2: Ridge Regression, RMSE = 36,35%
Model 3: Lasso Regression, RMSE = 38,53%
We are able to predict the price with our features, unfortunately, the accuracy isn’t very high. Since we are dealing with so many factors that we need to consider that might affect the price, we can assume that we have a reasonable outcome and are able to predict the price at least 60% of the time.
In this article, we took a look at how to predict the price of an Airbnb listing in Boston according to Boston Airbnb Open Data 2017 data on Kaggle.
- As a host, you definitely need to check first in what area you want to rent. You will not be able to compete against others if you rent high in suburbs or rent low in expensive areas like downtown because of cost-efficiency. The ultimate goal is still to gain profit and not to be able to host as many accommodations as possible.
- After splitting Boston into neighborhoods to see how the price is distributed in each area, we now can focus on the listings themselves. We have sourced out 3 different types of rooms to host. And we clearly observe a price gap between hosted rooms and entire homes/apartments (Fig. 3). Do you have already an apartment in Chinatown? Why not share one room and earn some extra money? Do you own a boat on the beautiful harbor of Boston? You can host it for $250 a night and give people a tremendous experience.
- Finally, we build an ML model to predict the price with various features. Our result wasn’t as accurate as desired. We trained 3 different models with almost identical outcomes. Our Ridge and Linear Regression had the least error with approx. 36%, which results in an accuracy of 64%.
This little observation is experiential and not the result of a formal study. To see more about this analysis, see the link to my Github available here.