Fairsterdam Please click here to watch our introductory video about our product Fairsterdam.
Amsterdam is one of the earliest and largest market for Airbnb. Though the tourism industry and short-term/holiday rental helped Amsterdam to renovate and revitalize the historical district, the unregulated expansion brought serious problems to the local communities. By pushing out local businesses, driving up rental prices for short-term and long-term rental prices, bringing too much strangers and impolite tourists into residential neighborhoods, Airbnb is influencing the city and especially the communities in a negative way. With rising concerns Airbnb, the current public health condistion is seen by some cities as the opportunity to make a turning point for the Airbnb expansion.
We are developing an exploratory machine learning model to predict the annual revenue of a new Airbnb listing. The dependent variable Annual revenues are calculated by aggregating the price and occupant time for the whole year.
Our grander goal is to develop a integrated system which not only provide information about the predicted tax revenue, but also feedback and opinions about the new listing from community members. Our prediction about tax revenue and monthly occupancy will be sent to the community members to inform them about the activities and amenities that can be supported by the economic gains. Being fully informed about the pros and cons, they will report their opinions back to the government. Ideally, their opinions will be taken into consideration in the decision making process. But since we don’t have actual survey data from the community, this report will only focus on the algorithm of annual revenue prediction.
This report is constituted of two parts: the first part is an exploratory analysis of the features that influence Airbnb revenue, which is directly related to price and occupancy; the second part is a machine learning model based on the correlated features to predict the annual revenue, this model is tested and validated in different ways.
As Airbnb is a large business that affects cities globally, this research method and algorithm can be used to explore Airbnb in other cities and regions. Heavily relying on tourism, Airbnb in different cities have a lot in common. But as they are also contextualized in local culture, regulations, spatial and business traditions, the analysis will need to be adapted to fit in.
In the real estate market, to price a new lease and predict about the vacancy rate, the operator or the consultant searches for comparable properties, which are generally locating near the new project, have similar features and target markets. But usually this market analysis is limited to a small data set. Also, to weigh different features about the comparability, the use common parameters for different properties. This parameter are often set in a range and the exact value was chosen subjectively. Since we have a large data set, we can be more precise about the coefficients thus the prediction.
Our model is still based on the hedonic model, but more features are included through data exploration. In the published Airbnb prediction researches online, most of them only predicted price and included limited features. (Click here to see an example) Since we are predicting revenue, which is the actual benefit, our model took price and occupancy into consideration. Though our prediction has larger errors than predictions on price alone (since occupancy is more volatile and has fewer connections with physical features), we believe it is still valuable. Also, we are including more independent variables. Apart from basic features, we include 1) public amenities and tourist attractions, 2) the spatial lag of price, calculating the average revenue of the nearest five listings, 3)complementary features extracted from names and descriptions.
Our model is far from perfect, but as a public algorithm based on public data set, we believe it will help the public to understand more about the benefits and costs of Airbnb.
To predict the annual revenue, our model will be based on three aspects listed below. The hedonic model features the internal and external features about the apartments’ physical condition and spatial relationship with amenities and POIs; the time factor accounts for the seasonality of tourism; text are used to see how hosts are advertising their properties, and what descriptions might be adding value.
Physical Characteristics: basic features such as room number, room type, amenities
Spatial Processing: Neighborhood Effect, Spatial Lag
Spatial Features: Distance to transit, supermarkets, tourists’ attractions, city center
Seasonality: monthly differences of price and occupancy
Names: Key features advertised
Descriptions: Features not formally listed that are adding value to the listing
Our data source include:
Airbnb Data of Amsterdam
This data set provide information about Airbnb listings, including price, physical features of the apartment, and location. It also have a calender dataframe, which includes to date data about the price and occupant status.
Amsterdam Open Data (Maps)
This is the open data provided by Amsterdam government, features such as boundaries of neighborhood, UNESCO zone and public amenities.
OpenStreetMap Data downloaded with package “osmdata”
We use point data from OpenStreetMap about convenient stores, shopping malls, supermarkets and other amenity data that tourists care about.
Tourists Attractions and POIs
This is a public data set provided by Tourpedia. We used data of POIs and tourist attractions in Amsterdam.
Since the data size is quite large, we suggest you to download all the data in advance. All the data can be downloaded HERE.
library(tidyverse)
library(sf)
library(RSocrata)
library(viridis)
library(spatstat)
library(ggplot2)
library(raster)
library(spdep)
library(FNN)
library(mapview)
library(grid)
library(gridExtra)
library(knitr)
library(stringr)
library(kableExtra)
library(tidycensus)
library(lubridate)
library(viridis)
library(stargazer)
library(scales)
library(RColorBrewer)
library(gridExtra)
library(ggthemes)
library(readr)
library(ggcorrplot)
library(caret)
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(osmdata)
#text mining
library(tm)
library(wordcloud2)
library(SnowballC)
options(scipen=999)
palette5 <- c("#E46B45","#BD665C","#966174","#6E5C8B","#4757A2")
palette4 <- c("#E46B45","#BD665C","#6E5C8B","#4757A2")
palette2 <- c("#e46b45","#4757a2")
qBr <- function(df, variable, rnd) {
if (missing(rnd)) {
as.character(quantile(round(df[[variable]],0),
c(.01,.2,.4,.6,.8), na.rm=T))
} else if (rnd == FALSE | rnd == F) {
as.character(formatC(quantile(df[[variable]]), digits = 3),
c(.01,.2,.4,.6,.8), na.rm=T)
}
}
qBr2 <- function(df, variable, rnd) {
if (missing(rnd)) {
as.character(round(quantile(round(df[[variable]],0),
c(.01,.2,.4,.6,.8), na.rm=T)))
} else if (rnd == FALSE | rnd == F) {
as.character(round(formatC(quantile(round(df[[variable]]), 0)),
c(.01,.2,.4,.6,.8), na.rm=T))
}
}
q5 <- function(variable) {as.factor(ntile(variable, 5))}
plotTheme <- function(base_size = 12) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = 14,colour = "black"),
plot.subtitle = element_text(face="italic"),
plot.caption = element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_line("grey80", size = 0.1),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
strip.background = element_rect(fill = "grey80", color = "white"),
strip.text = element_text(size=12),
axis.title = element_text(size=12),
axis.text = element_text(size=10),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text(colour = "black", face = "italic"),
legend.text = element_text(colour = "black", face = "italic"),
strip.text.x = element_text(size = 14)
)
}
mapTheme <- theme(plot.title =element_text(size=12),
plot.subtitle = element_text(size=8),
plot.caption = element_text(size = 8),
axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.background=element_blank(),
panel.border = element_rect(colour = "black", fill=NA, size=2),
panel.grid.major=element_line(colour = 'grey92'),
panel.grid.minor=element_blank(),
legend.direction = "vertical",
legend.position = "right",
plot.margin = margin(1, 1, 1, 1, 'cm'),
legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))
nn_function <- function(measureFrom,measureTo,k) {
measureFrom_Matrix <- as.matrix(measureFrom)
measureTo_Matrix <- as.matrix(measureTo)
nn <-
get.knnx(measureTo, measureFrom, k)$nn.dist
output <-
as.data.frame(nn) %>%
rownames_to_column(var = "thisPoint") %>%
gather(points, point_distance, V1:ncol(.)) %>%
arrange(as.numeric(thisPoint)) %>%
group_by(thisPoint) %>%
summarize(pointDistance = mean(point_distance)) %>%
arrange(as.numeric(thisPoint)) %>%
dplyr::select(-thisPoint) %>%
pull()
return(output)
}
rquery.wordcloud <- function(x, type=c("text", "url", "file"),
lang="english", excludeWords=NULL,
textStemming=FALSE, colorPalette="Dark2",
min.freq=3, max.words=200)
{
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
if(type[1]=="file") text <- readLines(x)
else if(type[1]=="url") text <- html_to_text(x)
else if(type[1]=="text") text <- x
# Load the text as a corpus
docs <- Corpus(VectorSource(text))
# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove stopwords for the language
docs <- tm_map(docs, removeWords, stopwords(lang))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Remove your own stopwords
if(!is.null(excludeWords))
docs <- tm_map(docs, removeWords, excludeWords)
# Text stemming
if(textStemming) docs <- tm_map(docs, stemDocument)
# Create term-document matrix
tdm <- TermDocumentMatrix(docs)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
# check the color palette name
if(!colorPalette %in% rownames(brewer.pal.info)) colors = colorPalette
else colors = brewer.pal(8, colorPalette)
# Plot the word cloud
set.seed(1234)
wordcloud(d$word,d$freq, min.freq=min.freq, max.words=max.words,
random.order=FALSE, rot.per=0.35,
use.r.layout=FALSE, colors=colors)
invisible(list(tdm=tdm, freqTable = d))
}
#++++++++++++++++++++++
# Helper function
#++++++++++++++++++++++
# Download and parse webpage
html_to_text<-function(url){
library(RCurl)
library(XML)
# download html
html.doc <- getURL(url)
#convert to plain text
doc = htmlParse(html.doc, asText=TRUE)
# "//text()" returns all text outside of HTML tags.
# We also don’t want text such as style and script codes
text <- xpathSApply(doc, "//text()[not(ancestor::script)][not(ancestor::style)][not(ancestor::noscript)][not(ancestor::form)]", xmlValue)
# Format text vector into one character string
return(paste(text, collapse = " "))
}
setwd("D:/MUSA 508/Final Project")
#setwd("D:/Rdata/Final_Airbnb/Data")
listings <- st_read("listings.csv")
## Reading layer `listings' from data source `D:\MUSA 508\Final Project\listings.csv' using driver `CSV'
details <- st_read("listings_details.csv")
## Reading layer `listings_details' from data source `D:\MUSA 508\Final Project\listings_details.csv' using driver `CSV'
calendar <- read.csv("calendar.csv")
#large scale neighborhood
neighborhood <- st_read('neighbourhoods.geojson')
## Reading layer `neighbourhoods' from data source `D:\MUSA 508\Final Project\neighbourhoods.geojson' using driver `GeoJSON'
## Simple feature collection with 22 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XYZ
## bbox: xmin: 4.754837 ymin: 52.27817 xmax: 5.079162 ymax: 52.43068
## z_range: zmin: 42.88058 zmax: 43.14972
## geographic CRS: WGS 84
#small scale neighborhood (used to define community?)
neighbor2 <- st_read('neighbor2.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `neighbor2' from data source `D:\MUSA 508\Final Project\neighbor2.json' using driver `GeoJSON'
## Simple feature collection with 481 features and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 4.728773 ymin: 52.27816 xmax: 5.079169 ymax: 52.43105
## geographic CRS: WGS 84
developing_area <- st_read('developing_area.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `developing_area' from data source `D:\MUSA 508\Final Project\developing_area.json' using driver `GeoJSON'
## Simple feature collection with 19 features and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 4.776982 ymin: 52.28436 xmax: 4.993388 ymax: 52.42163
## geographic CRS: WGS 84
crowdsensor <- st_read('crowdsensor.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `crowdsensor' from data source `D:\MUSA 508\Final Project\crowdsensor.json' using driver `GeoJSON'
## Simple feature collection with 107 features and 6 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 4.855762 ymin: 52.31185 xmax: 4.972232 ymax: 52.39214
## geographic CRS: WGS 84
metro <- st_read('tram_metro_stops.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `tram_metro_stops' from data source `D:\MUSA 508\Final Project\tram_metro_stops.json' using driver `GeoJSON'
## Simple feature collection with 224 features and 6 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 4.77478 ymin: 52.29561 xmax: 5.004306 ymax: 52.40187
## geographic CRS: WGS 84
buildingyear <- st_read('buildingyearblock.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `buildingyearblock' from data source `D:\MUSA 508\Final Project\buildingyearblock.json' using driver `GeoJSON'
## Simple feature collection with 22072 features and 1 field
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 4.734726 ymin: 52.27854 xmax: 5.062233 ymax: 52.43045
## geographic CRS: WGS 84
zipcode6 <- st_read('zipcode6.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `zipcode6' from data source `D:\MUSA 508\Final Project\zipcode6.json' using driver `GeoJSON'
## Simple feature collection with 18280 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 4.735438 ymin: 52.27845 xmax: 5.062233 ymax: 52.43045
## geographic CRS: WGS 84
zipcode4 <- st_read('zipcode4.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `zipcode4' from data source `D:\MUSA 508\Final Project\zipcode4.json' using driver `GeoJSON'
## Simple feature collection with 81 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 4.728773 ymin: 52.27816 xmax: 5.079169 ymax: 52.43105
## geographic CRS: WGS 84
details.sf <- st_as_sf(details,coords = c('longitude','latitude'),crs = 4326) %>%
st_transform(st_crs(neighborhood))
details.sf$price <- parse_number(details.sf$price)
details.sf$weekly_price <- parse_number(details.sf$weekly_price)
details.sf$monthly_price <- parse_number(details.sf$monthly_price)
details.sf$cleaning_fee <- parse_number(details.sf$cleaning_fee)
details.sf$extra_people <- parse_number(details.sf$extra_people)
details.sf$security_deposit <- parse_number(details.sf$security_deposit)
details.sf$beds <- as.numeric(details.sf$beds)
details.sf$minimum_nights <- as.numeric(details.sf$minimum_nights)
details.sf$maximum_nights <- as.numeric(details.sf$maximum_nights)
details.sf$number_of_reviews <- as.numeric(details.sf$number_of_reviews)
details.sf$review_scores_rating <- as.numeric(details.sf$review_scores_rating)
details.sf$review_scores_accuracy <- as.numeric(details.sf$review_scores_accuracy)
details.sf$review_scores_cleanliness<- as.numeric(details.sf$review_scores_cleanliness)
details.sf$review_scores_value <- as.numeric(details.sf$review_scores_value)
details.sf$reviews_per_month <- as.numeric(details.sf$reviews_per_month)
details.sf.raw <- details.sf
# Available Calendar
available_calendar <- calendar %>%
filter(available =="t")
available_calendar$listing_id <- as.character(available_calendar$listing_id)
# Price change
available_calendar <- available_calendar%>%
mutate(price2 = gsub("^.","",price))%>%
mutate(price2 = gsub(",","",price2))
available_calendar$price3 <- as.numeric(available_calendar$price2)
# Sum per month
available_calendar2 <- available_calendar%>%
mutate(date2 = ymd(date))%>%
mutate(month = month(date2))%>%
group_by(listing_id, month) %>%
summarize(month_price = mean(price3))
length(unique(calendar$listing_id))
## [1] 20030
length(unique(calendar$listing_id))*12
## [1] 240360
study.panel <-
expand.grid(listing_id = unique(calendar$listing_id),
month=unique(available_calendar2$month))
study.panel$listing_id <- as.character(study.panel$listing_id)
listing_panel <- study.panel %>%
left_join(available_calendar2) %>%
mutate(each_month_price = 0)
o <- order(listing_panel[,"listing_id"],listing_panel[,"month"])
listing_panel <- listing_panel[o,]
xx <- 0
for(i in 2:nrow(listing_panel)){
if(!is.na(listing_panel[i,3])){
xx <- listing_panel[i,3]
listing_panel[i,4]=listing_panel[i,3]
}
if(is.na(listing_panel[i,3])& listing_panel[i-1,1]==listing_panel[i,1]){
listing_panel[i,4] <- xx
}
if(is.na(listing_panel[i,3])& listing_panel[i-1,1]!=listing_panel[i,1]){
xx <- 0
listing_panel[i,4] <- xx
}
}
d <- order(listing_panel[,"listing_id"],-listing_panel[,"month"])
listing_panel <- listing_panel[d,]
for(i in 2:nrow(listing_panel)){
if(listing_panel[i,4]==0){
listing_panel[i,4]=listing_panel[i-1,4]
}
}
o <- order(listing_panel[,"listing_id"],listing_panel[,"month"])
listing_panel <- listing_panel[o,]
listing_0price <- listing_panel %>%
filter(each_month_price == 0)
no_price <- unique(listing_0price$listing_id)
no_price
## [1] "10002942" "10003068" "10003546" "10003576" "10003943" "10004383"
## [7] "10004452" "10004732" "10004773" "10004838" "10004961" "10005851"
Drop listings that have no price
listing_panel <- listing_panel %>%
filter(!listing_id %in% no_price)
index <- function(x, flag = '0') {
digit <- floor(log10(length(x))) + 1
paste(flag, formatC(x, width = digit, flag = '0'), sep = '')
}
occupancy <- calendar%>%
mutate(date2 = ymd(date))%>%
mutate(month = month(date2),
count = ifelse(available == "f", 1, 0),
listing_id = as.character(listing_id))%>%
filter(!listing_id %in% no_price) %>%
group_by(listing_id, month) %>%
summarize(monthly_occupancy = sum(count))
monthly_occupancy <- occupancy %>%
mutate(month = as.character(month))%>%
mutate(month = ifelse(month != 10 & month != 11 & month != 12, gsub("^","0",month), month)) %>%
group_by(month) %>%
summarise(mean_monthly_occupancy = mean(monthly_occupancy))
ggplot(monthly_occupancy,
aes(x=month, y=mean_monthly_occupancy, group =1)) +
geom_line(size=1, color = "#e46b45")+
plotTheme()
From the figure above, we see that the occupancy reaches its peak during summer, especially in July. Though occupancy in February seems much lower than others, that is probably because the number of days in February is fewer. Overall, occupancy is a bit higher during summer time and a bit lower during winter time.
Month_price <- calendar%>%
mutate(date2 = ymd(date))%>%
mutate(month = month(date2),
listing_id = as.character(listing_id),
price = parse_number(price))%>%
mutate(month = as.character(month))%>%
mutate(month = ifelse(month != 10 & month != 11 & month != 12, gsub("^","0",month), month)) %>%
drop_na(price) %>%
group_by(month) %>%
summarize(mean_monthly_price = mean(price))
ggplot(Month_price,
aes(x=month, y=mean_monthly_price, group =1)) +
geom_line(size=1, color = "#e46b45")+
plotTheme()
From the figure above, we see that the price reaches its peak in April. The average price in February is lower than others. Overall, the fluctuation of prices throughout year shows the same trend as occupancy, which indicates that these two variables might be correlated.
Plot the listings’ prices as points on the map
details.sf <-
details.sf %>%
mutate(priceBed=price/beds) %>%
filter(priceBed <=1000)
ggplot()+
geom_sf(data = neighborhood,fill='grey90',color = 'white')+
geom_sf(data = details.sf, aes(colour=q5(priceBed)),size=.5)+
scale_color_manual(values = palette5,
labels = qBr(details.sf,'priceBed'),)+
labs(title = "Price per bed",
subtitle = 'Amsterdam Airbnb, price on 2018-12-6')+
mapTheme
From the figure above, we see that the distribution of Airbnb housings cluster at the center of the city. Houses with high price also cluster at the center while those with lower price are dispersed at the outskirt.
Plot the number of airbnb by neighborhood2
listings.sf<- listings %>%
st_as_sf(coords = c( "longitude","latitude"), crs = 4326, agr = "constant")
neighbor2 <- neighbor2 %>%
st_transform(st_crs(listings.sf))
listing.sf.neighbor2 <- st_intersection(listings.sf,neighbor2) %>%
dplyr::select(id, Buurt,Buurt_code) %>%
mutate(count=1) %>%
st_drop_geometry()
listings.sf <- left_join(listings.sf, listing.sf.neighbor2,by='id')
neighbo2.count <- listings.sf %>%
group_by(Buurt) %>%
summarise(airbnb.number = sum(count)) %>%
dplyr::select(Buurt, airbnb.number) %>%
st_drop_geometry()
neighbor2 <- left_join(neighbor2,neighbo2.count,by="Buurt")
neighbor2 %>% ggplot() +
geom_sf(aes(fill = airbnb.number), color = 'white') +
scale_fill_gradient(low = '#f3c226', high = palette5[5],
name = "Airbnb Number") +
labs(title = "Airbnb Number by ZIP") +
mapTheme
ggplot()+
geom_sf(data = neighbor2, aes(fill=q5(airbnb.number)),color='transparent')+
scale_fill_manual(values = palette5,
labels = qBr(neighbor2,'airbnb.number'),)+
labs(title = "Airbnb Number by neighborhood")+
mapTheme
Figure above also shows that Airbnb housings cluster at the center of the city.
Basic features such as beds, bedrooms, bathrooms are the most relevant features to price and revenue. Here is a summary of these features:
table.basic <- details.sf %>%
st_drop_geometry() %>%
dplyr::select(price ,beds, bedrooms, bathrooms, accommodates)
stargazer(as.data.frame(table.basic),
type = "text",
title ="Table 1. Summary of Basic Features",
single.row = TRUE,
out.header = TRUE)
##
## Table 1. Summary of Basic Features
## =============================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## -------------------------------------------------------------
## price 20,011 150.260 102.648 0 96 175 3,900
## beds 20,011 1.850 1.390 1 1 2 32
## -------------------------------------------------------------
Apart from common amenities, some of them are adding more value to the property. As shown in the plot below, properties with pools, fireplaces, parking and kithcens generally have higher price. Also, we counted the number of amenities listes by the host, we’ll later use a correlation matrix to test if it is influencing the price.
#pool
details.sf <- details.sf %>%
mutate(pool = ifelse(str_detect(amenities, "Pool"), "Pool", "No Pool"))
#Paid parking off premises
details.sf <- details.sf %>%
mutate(parking = ifelse(str_detect(amenities, "Paid parking off premises"),
"Parking", "No Parking"))
#Indoor fireplace
details.sf <- details.sf %>%
mutate(fireplace = ifelse(str_detect(amenities, "Indoor fireplace"),
"Fireplace", "No Fireplace"))
#Waterfront
details.sf <- details.sf %>%
mutate(waterfront = ifelse(str_detect(amenities, "Waterfront"),
"waterfront", "No waterfront"))
#Kitchen
details.sf <- details.sf %>%
mutate(kitchen = ifelse(str_detect(amenities, "Kitchen"),
"kitchen", "No kitchen"))
#Air conditioning
details.sf <- details.sf %>%
mutate(AC = ifelse(str_detect(amenities, "Air conditioning"),
"AC", "No AC"))
amenitie_vars <- c('pool','parking','fireplace','waterfront','kitchen','AC')
plotList <- list()
for (i in amenitie_vars){
plotList[[i]] <-
details.sf %>%st_drop_geometry() %>%
dplyr::select(price,i) %>%
filter(price<500) %>%
gather(Variable, value, -price) %>%
ggplot(aes(value, price, fill=value)) +
#geom_bar(position = "dodge", stat = "summary", fun.y = "mean") +
geom_boxplot()+
scale_fill_manual(values = palette2) +
labs(x="value", y="price",
title = i) +
theme(legend.position = "none")+
plotTheme()
}
do.call(grid.arrange,c(plotList, ncol = 3, top = "Amenities' influence on Price"))
number of amenities listed
library(stringr)
details.sf <- details.sf %>%
mutate(amenities.number = str_count(amenities,",")+1)
Sharing the same location and similar spatial pattern, there is a spatial effect on the prediction model. We used two geogaphies to account for the neighborhood effect, and test which one is of larger influence. Neighborhood boundaries are defined by Zipcode 4 and Zipcode 6, the former devides Amsterdam into 22 neighborhoods and the latter 481.
details.sf.neighbor <- st_intersection(details.sf,zipcode4) %>%
dplyr::select(id, Postcode4) %>%
st_drop_geometry()
details.sf <- left_join(details.sf,details.sf.neighbor,by='id')
neighbor2 <- neighbor2 %>%
st_transform(st_crs(listings.sf))
listing.sf.neighbor2 <- st_intersection(listings.sf,neighbor2) %>%
dplyr::select(id, Buurt,Buurt_code) %>%
mutate(count=1) %>% st_drop_geometry()
detail.sf <- left_join(details.sf,listing.sf.neighbor2, by = "id")
Distance to public transportation (the metro stations), public amenities (parks, beaches, supermarkets …) are calculated in this part. Som of them are converted from numeric to categorical features.
listings$longitude = as.numeric(listings$longitude)
listings$latitude = as.numeric(listings$latitude)
listings.sf<- listings %>%
st_as_sf(coords = c( "longitude","latitude"), crs = 4326, agr = "constant")
st_crs(metro) <- st_crs(listings.sf)
st_c <- st_coordinates
details.sf.c <- st_centroid(details.sf)%>%
st_transform('ESRI:102013')
metro.c <- st_centroid(metro)%>%
st_transform('ESRI:102013')
details.sf <- details.sf%>%
mutate(dist.metro =nn_function(st_coordinates(details.sf.c),st_coordinates(metro.c),1))
metro <- metro%>%
st_transform('ESRI:102013')
listings.sf <- listings.sf%>%
st_transform('ESRI:102013')
listings <- listings%>%
mutate(distance_to_metro =nn_function(st_c(listings.sf), st_c(metro),1))
ggplot()+
geom_sf(data = neighbor2, fill = "grey32", color= "grey40")+
geom_point(data = listings,
aes(x= longitude, y = latitude, color = distance_to_metro),
fill = "transparent", size = 0.86, alpha = 0.6) +
scale_colour_viridis(direction = -1,
discrete = FALSE,
option = "plasma")+
geom_sf(data = metro, fill="red")+
ylim(min(listings$latitude), max(listings$latitude))+
xlim(min(listings$longitude), max(listings$longitude))+
labs(title="Choropleth Map - Distance to Metro",
caption = "Figure xxx")+
mapTheme
Load data
# a polygon
unesco <-
st_read('UNESCO/UnescoWerelderfgoed_region.shp') %>%
st_transform(st_crs(neighborhood))
## Reading layer `UnescoWerelderfgoed_region' from data source `D:\MUSA 508\Final Project\UNESCO\UnescoWerelderfgoed_region.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2 features and 6 fields
## geometry type: POLYGON
## dimension: XYZ
## bbox: xmin: 120080.1 ymin: 485673.3 xmax: 123661.3 ymax: 489070.2
## z_range: zmin: 0 zmax: 0
## projected CRS: Amersfoort / RD New
parks <- st_read('parks.json') %>%
st_transform(st_crs(neighborhood))
## Reading layer `parks' from data source `D:\MUSA 508\Final Project\parks.json' using driver `GeoJSON'
## Simple feature collection with 122 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 4.755128 ymin: 52.27963 xmax: 5.019766 ymax: 52.43052
## geographic CRS: WGS 84
attraction <- st_read('amsterdam-attraction.csv') %>%
filter(lat != 'attraction')
## Reading layer `amsterdam-attraction' from data source `D:\MUSA 508\Final Project\amsterdam-attraction.csv' using driver `CSV'
attraction <- st_as_sf(attraction,coords = c('lng','lat'),crs = 4326) %>%
st_transform(st_crs(neighborhood))
parks <- attraction %>%
filter(subCategory == 'Park')
museum <- attraction %>%
filter(subCategory == 'Museum')
#supermarkets
supermarkets <- getbb('Amsterdam') %>%
opq() %>%
add_osm_feature('shop','supermarket') %>%
osmdata_sf()
supermarkets <- supermarkets$osm_points %>%
dplyr::select(geometry)%>%
st_transform('ESRI:102013')%>%
dplyr::select(geometry) %>%
mutate(Legend = "supermarkets")
convenienceshop <- getbb('Amsterdam') %>%
opq() %>%
add_osm_feature('shop','convenience') %>%
osmdata_sf()
convenienceshop <- convenienceshop$osm_points %>%
dplyr::select(geometry)%>%
st_transform('ESRI:102013')%>%
dplyr::select(geometry) %>%
mutate(Legend = "convenienceshop")
mall <- getbb('Amsterdam') %>%
opq() %>%
add_osm_feature('shop','mall') %>%
osmdata_sf()
mall<- mall$osm_points %>%
dplyr::select(geometry)%>%
st_transform('ESRI:102013') %>%
dplyr::select(geometry) %>%
mutate(Legend = "mall")
attraction <- st_as_sf(attraction,coords = c('lng','lat'),crs = 4326) %>%
st_transform('ESRI:102013')
plaza <- attraction %>%
filter(subCategory == 'Plaza')
beach <- attraction %>%
filter(subCategory == 'Beach')
nightclub <- attraction %>%
filter(subCategory == 'Nightclub')
Calculate distance to amenities and attractions**
details.sf <- details.sf%>%
mutate(dist.mall =nn_function(st_coordinates(details.sf.c),st_coordinates(mall),1))
details.sf <- details.sf%>%
mutate(dist.supermarkets =nn_function(st_coordinates(details.sf.c),
st_coordinates(supermarkets),1))
details.sf <- details.sf%>%
mutate(dist.convenienceshop =nn_function(st_coordinates(details.sf.c),
st_coordinates(convenienceshop),1))
details.sf <- details.sf%>%
mutate(dist.museum =nn_function(st_coordinates(details.sf.c),
st_coordinates(museum),1))
details.sf <- details.sf%>%
mutate(dist.plaza =nn_function(st_coordinates(details.sf.c),
st_coordinates(plaza),1))
details.sf <- details.sf%>%
mutate(dist.nightclub =nn_function(st_coordinates(details.sf.c),
st_coordinates(nightclub),1))
details.sf <- details.sf%>%
mutate(dist.beach =nn_function(st_coordinates(details.sf.c),
st_coordinates(beach),1))
details.sf <- details.sf%>%
mutate(dist.parks =nn_function(st_coordinates(details.sf.c),
st_coordinates(parks),1))
This is a summary of the distance features:
table.distance <- details.sf %>%
st_drop_geometry() %>%
dplyr::select(c("dist.museum","dist.plaza","dist.nightclub","dist.beach",
"dist.metro","dist.supermarkets"))
stargazer(as.data.frame(table.distance),
type = "text",
title ="Table 2. Summary of all Distance Features",
single.row = TRUE,
out.header = TRUE)
##
## Table 2. Summary of all Distance Features
## ========================================================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## --------------------------------------------------------------------------------------------------------
## dist.museum 20,011 2,499,865.000 1,896.420 2,490,188.000 2,498,696.000 2,501,137.000 2,506,428.000
## dist.plaza 20,011 558.536 799.228 0.453 218.848 579.526 7,412.628
## dist.nightclub 20,011 1,004.675 759.914 0.622 530.634 1,305.400 7,560.146
## dist.beach 20,011 1,978.854 747.704 3.899 1,494.927 2,475.238 6,862.358
## dist.metro 20,011 302.557 375.550 1.175 141.607 315.447 6,472.553
## dist.supermarkets 20,011 255.136 210.254 1.963 129.179 314.397 5,427.940
## --------------------------------------------------------------------------------------------------------
# within UNESCO buffer or not
unesco_buffer <- st_union(unesco)
unesco_buffer <- st_as_sf(unesco_buffer)
details.sf.unesco <- st_intersection(details.sf,unesco_buffer) %>%
dplyr::select(id) %>%
mutate(Unesco = 'within') %>%
st_drop_geometry()
details.sf <- left_join(details.sf,details.sf.unesco,by='id')
details.sf$Unesco <- tidyr::replace_na(details.sf$Unesco,'outside')
We used two ways to calculate the Airbnb Hotspots in Amsterdam, one by counting Airbnb number in a fishnet cell; one by setting threshold with Local Moran’s I. This feature partly overlapped with “ditance to center city”, but might give a more nuance view on the clustering of Airbnbs.
#create fishnet
amsterdam.boundary <- st_union(neighborhood) %>% st_transform('ESRI:102013')
fishnet <-
st_make_grid(amsterdam.boundary, cellsize = 300) %>%
st_sf() %>%
mutate(uniqueID = rownames(.))
fishnet <- fishnet %>%
mutate(uniqueID = rownames(.))
fishnet.count <- st_intersection(listings.sf,fishnet) %>%
mutate(count = 1) %>%
st_drop_geometry() %>%
dplyr::select(uniqueID, count) %>%
group_by(uniqueID) %>%
summarise(countairbnb = sum(count))
airbnb_net <-
dplyr::select(listings.sf) %>%
mutate(countairbnb = 1) %>%
aggregate(., fishnet, sum)
airbnb_net <- airbnb_net %>%
mutate(countairbnb = tidyr::replace_na(countairbnb,0),
uniqueID = row.names(.))
ggplot() +
geom_sf(data = airbnb_net, aes(fill = countairbnb), color = NA) +
scale_fill_viridis() +
labs(title = "Count of airbnb for the fishnet") +
mapTheme
#Visualize local spatial process of airbnb
final_net <- airbnb_net
final_net.nb <- poly2nb(as_Spatial(airbnb_net), queen=TRUE)
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)
#Visualize local spatial process of auto Theft
final_net.localMorans <-
cbind(
as.data.frame(localmoran(final_net$countairbnb, final_net.weights)),
as.data.frame(final_net)) %>%
st_sf() %>%
dplyr::select(airbnb_Count = countairbnb,
Local_Morans_I = Ii,
P_Value = `Pr(z > 0)`) %>%
mutate(Significant_Hotspots = ifelse(P_Value <= 0.0001, 1, 0)) %>%
gather(Variable, Value, -geometry)
vars <- unique(final_net.localMorans$Variable)
varList <- list()
for(i in vars){
varList[[i]] <-
ggplot() +
geom_sf(data = filter(final_net.localMorans, Variable == i),
aes(fill = Value), colour=NA) +
scale_fill_viridis(name="") +
labs(title=i) +
mapTheme + theme(legend.position="right")}
do.call(grid.arrange,c(varList, ncol = 2, top = "Local Morans I statistics, Amsterdam Airbnb"))
#hotspots by count
hotspot.count <- airbnb_net %>%
filter(countairbnb>150)
#hotspot by moran's I
hotspot.moran <- final_net %>%
mutate(isSig =
ifelse(localmoran(final_net$countairbnb,
final_net.weights)[,5] <= 0.000001, 1, 0)) %>%
filter(isSig == 1)
#distance to hotspots
details.sf <- details.sf %>%
mutate(dist.hotspot.count = nn_function(st_coordinates(details.sf),
st_coordinates(st_centroid(hotspot.count)), 1),
dist.hotspot.moran = nn_function(st_coordinates(details.sf),
st_coordinates(st_centroid(hotspot.moran)), 1))
Some features such as decoration, property area are not directly given out in the dataset. So by analysing the descriptions and names, we hope to use words like “spatious” and “luxurious” to partly represent the housing qualities.
details.sf$name <- as.character(details.sf$name)
# city center
details.sf <- details.sf %>%
mutate(name.center = ifelse(str_detect(name, "center")|
str_detect(name, "centre")|
str_detect(name, "central")|
str_detect(name, "jordan")|
str_detect(name, "Center")|
str_detect(name, "Centre")|
str_detect(name, "Central")|
str_detect(name, "Jordan")|
str_detect(name, "CENTER")|
str_detect(name, "CENTRE")|
str_detect(name, "CENTRAL")|
str_detect(name, "JORDAN"),
"Center", "No Center"))
details.sf <- details.sf %>%
mutate(name.bright = ifelse(str_detect(name, "bright")|
str_detect(name, "luminous")|
str_detect(name, "Bright")|
str_detect(name, "Luminous")|
str_detect(name, "BRIGHT")|
str_detect(name, "LUMIOUS"),
"bright", "not bright"))
#spacious
details.sf <- details.sf %>%
mutate(name.spacious = ifelse(str_detect(name, "spacious")|
str_detect(name, "large")|
str_detect(name, "Spacious")|
str_detect(name, "Large")|
str_detect(name, "SPACIOUS")|
str_detect(name, "LARGE"),
"spacious", "not spacious"))
#luxurious
details.sf <- details.sf %>%
mutate(name.luxury = ifelse(str_detect(name, "luxury")|
str_detect(name, "luxurious")|
str_detect(name, "Luxury")|
str_detect(name, "Luxurious")|
str_detect(name, "LUXURY")|
str_detect(name, "LUXURIOUS"),
"luxury", "not luxury"))
Plot monthly-prices and numeric features Each line represents the correlation of numeric feature and the average price in a month. For each month, the coefficient is slightly different, but not significant. By analysing the data more closely, prices do change among differnt months, but within a relatively small range around 10 euros.
listing_panel2 <- listing_panel %>%
dplyr::rename(id = listing_id) %>%
dplyr::select(-month_price)
listing_panel2 <-
left_join(listing_panel2, st_drop_geometry(details.sf), by = "id") %>%
dplyr::select(id, month, each_month_price, bathrooms, bedrooms, beds, review_scores_rating, reviews_per_month, dist.metro) %>%
gather(-id,-month, -each_month_price, key = "variable", value = "value") %>%
mutate(value=as.numeric(value))
ggplot()+
geom_point(data=listing_panel2 %>%
filter(month ==1 & each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#6f1b17", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==1 & each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#ea483d")+
geom_point(data=listing_panel2 %>%
filter(month ==2& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#610f26", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==2& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#df2866")+
geom_point(data=listing_panel2 %>%
filter(month ==3& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#380f42", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==3& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#972aaf")+
geom_point(data=listing_panel2 %>%
filter(month ==4& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#261646", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==4& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#663bb6")+
geom_point(data=listing_panel2 %>%
filter(month ==5& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#191e46", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==5& each_month_price <= 2500), aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#4551b4")+
geom_point(data=listing_panel2 %>%
filter(month ==6& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#19387d", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==6& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#4295f2")+
geom_point(data=listing_panel2 %>%
filter(month ==7& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#173f7f", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==7& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#3ea8f3")+
geom_point(data=listing_panel2 %>%
filter(month ==8& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#184957", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==8& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#41bbd3")+
geom_point(data=listing_panel2 %>%
filter(month ==9& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#123832", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==9& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#309587")+
geom_point(data=listing_panel2 %>%
filter(month ==10& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#22421e", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==10& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#5aae51")+
geom_point(data=listing_panel2 %>%
filter(month ==11& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#364c1d", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==11& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#90c24c")+
geom_point(data=listing_panel2 %>%
filter(month ==12& each_month_price <= 2500),aes(x = value, y = each_month_price), color = "#535f18", alpha = 0.26)+
geom_smooth(data=listing_panel2 %>%
filter(month ==12& each_month_price <= 2500),aes(x = value, y = each_month_price), method = "lm", se= FALSE, color = "#cddc3f")+
facet_wrap(~variable, scales = "free")+
labs(title="Price as a function of numeric variable",
y="Mean Price each month",
caption = "Scatterplots of price and numeric variable")+
plotTheme()
We plot the relationship between average price each month and numeric features. The figure above shows that none of these features have linear relationship with price but price does have connections with some of these features such as the number of bedrooms, distance to metro and the number of reviews per month. There are 12 lines in each scatter plot. Each line represents correlation of a numeric feature to the average price in a month. It can seen that for each numeric feature, the variance of correlation is low, which means the average prices in each month have similar relationships with these numeric features.
Plot monthly-occupancy and numeric features Each line represents the correlation of numeric feature and the average occupancy in a month. Occupancy is much more influenced by the tourism seasonality, the coefficient varied hugely among differnt months.
occupancy2 <- occupancy %>%
dplyr::rename(id = listing_id)
occupancy2 <-
left_join(occupancy2, st_drop_geometry(details.sf), by = "id") %>%
dplyr::select(id, month, monthly_occupancy, bathrooms, bedrooms, beds, review_scores_rating, reviews_per_month, dist.metro) %>%
gather(-id,-month, -monthly_occupancy, key = "variable", value = "value")%>%
mutate(value=as.numeric(value))
ggplot()+
geom_point(data=occupancy2 %>%
filter(month ==1),aes(x = value, y = monthly_occupancy), color = "#6f1b17", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==1),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#ea483d")+
geom_point(data=occupancy2 %>%
filter(month ==2),aes(x = value, y = monthly_occupancy), color = "#610f26", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==2),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#df2866")+
geom_point(data=occupancy2 %>%
filter(month ==3),aes(x = value, y = monthly_occupancy), color = "#380f42", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==3),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#972aaf")+
geom_point(data=occupancy2 %>%
filter(month ==4),aes(x = value, y = monthly_occupancy), color = "#261646", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==4),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#663bb6")+
geom_point(data=occupancy2 %>%
filter(month ==5),aes(x = value, y = monthly_occupancy), color = "#191e46", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==5),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#4551b4")+
geom_point(data=occupancy2 %>%
filter(month ==6),aes(x = value, y = monthly_occupancy), color = "#19387d", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==6),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#4295f2")+
geom_point(data=occupancy2 %>%
filter(month ==7),aes(x = value, y = monthly_occupancy), color = "#173f7f", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==7),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#3ea8f3")+
geom_point(data=occupancy2 %>%
filter(month ==8),aes(x = value, y = monthly_occupancy), color = "#184957", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==8),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#41bbd3")+
geom_point(data=occupancy2 %>%
filter(month ==9),aes(x = value, y = monthly_occupancy), color = "#123832", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==9),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#309587")+
geom_point(data=occupancy2 %>%
filter(month ==10),aes(x = value, y = monthly_occupancy), color = "#22421e", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==10),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#5aae51")+
geom_point(data=occupancy2 %>%
filter(month ==11),aes(x = value, y = monthly_occupancy), color = "#364c1d", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==11),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#90c24c")+
geom_point(data=occupancy2 %>%
filter(month ==12),aes(x = value, y = monthly_occupancy), color = "#535f18", alpha = 0.26)+
geom_smooth(data=occupancy2 %>%
filter(month ==12),aes(x = value, y = monthly_occupancy), method = "lm", se= FALSE, color = "#cddc3f")+
ylim(0,31)+
facet_wrap(~variable, scales = "free")+
labs(title="Occupancy as a function of numeric variable",
y="Mean Occupancy each month",
caption = "Figure 18. Scatterplots of occupancy and numeric variable")+
plotTheme()
We plot the relationship between average occupancy each month and numeric features. The figure above shows that none of these features have linear relationship with occupancy and occupancy barely has connections with these features. There are 12 lines in each scatter plot. Each line represents correlation of a numeric feature to the average occupancy in a month. It can seen that for each numeric feature, the variance of correlation is high, which means the average occupancies in each month have different relationships with these numeric features. This indicates that we should fit the model separately for each month.
By plotting the correlation between numeric features, it is obvious that basic features such as bed number and bedroom number are the determinant of price. Distance to public amenities are not as important as we expected.
We calculate the average price in a month for each listing.
occupancy3 <- occupancy %>%
dplyr::rename(id = listing_id)
occupancy3 <-
left_join(occupancy3, st_drop_geometry(details.sf), by = "id")
listing_panel <- listing_panel%>%
dplyr::rename(id = listing_id)
revenue_panel <- merge(occupancy3,listing_panel[c("id","month","each_month_price")],by=c("id","month")) %>%
mutate(revenue = each_month_price*monthly_occupancy)
annualrevenue <- revenue_panel %>%
group_by(id) %>%
summarise(annual_revenue = sum(revenue))
annualrevenue<- left_join(details.sf, annualrevenue,by="id")%>%
filter(!id %in% no_price)%>%
mutate(bathrooms = as.numeric(bathrooms),
bedrooms = as.numeric(bedrooms))
numericVars <-
select_if(st_drop_geometry(annualrevenue), is.numeric) %>% na.omit() %>%
dplyr::select(annual_revenue,price,beds, bedrooms, bathrooms,
minimum_nights,dist.museum,dist.supermarkets,
dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
amenities.number)
ggcorrplot(
round(cor(numericVars), 1),
p.mat = cor_pmat(numericVars),
colors = c("#4757a2", "white", "#E46B45"),
type="lower",
insig = "blank") +
labs(title = "Correlation across numeric variables")
Overall, there is none multicollinearity in our regression expect for the connection between distance to museum and distance to parks.
Modeling approach
Our goal is to predict the annual revenue for a new listing in Amsterdam. Here are two approaches: First is to predict monthly price and occupancy separately and calculate the annual, which includes the following steps:
Second is to predict annual revenue directly, which includes the following steps:
The features we use are mainly hosts’ input and the houses’ exposure to amenities, attractions, etc. Though new listing has no previous price and we cannot add time lag, we take spatial effects into consideration, adding neighborhoods effect and spatial lag as features to improve both accuracy and generalizability.
We show both approaches in our report and compare their performances in prediction.
To take spatial effect into consideration, we calculate the mean price of 5 nearest listings for each listing and name it lag price.
revenue_panel <- merge(revenue_panel, details[c("id","longitude","latitude")], by=c("id"))
revenue_panel.sf <-
st_as_sf(revenue_panel,coords = c('longitude','latitude'),crs = 4326) %>%
st_transform(st_crs(neighborhood))
#Calculate for each month-------------------------------------------------------
#January
Jan_price <- revenue_panel.sf %>%
filter(month == 1)
coords <- st_coordinates(Jan_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Jan_price$lagPrice <- lag.listw(spatialWeights, Jan_price$each_month_price)
#February
Feb_price <- revenue_panel.sf %>%
filter(month == 2)
coords <- st_coordinates(Feb_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Feb_price$lagPrice <- lag.listw(spatialWeights, Feb_price$each_month_price)
#March
Mar_price <- revenue_panel.sf %>%
filter(month == 3)
coords <- st_coordinates(Mar_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Mar_price$lagPrice <- lag.listw(spatialWeights, Mar_price$each_month_price)
#April
Apr_price <- revenue_panel.sf %>%
filter(month == 4)
coords <- st_coordinates(Apr_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Apr_price$lagPrice <- lag.listw(spatialWeights, Apr_price$each_month_price)
#May
May_price <- revenue_panel.sf %>%
filter(month == 5)
coords <- st_coordinates(May_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
May_price$lagPrice <- lag.listw(spatialWeights, May_price$each_month_price)
#June
Jun_price <- revenue_panel.sf %>%
filter(month == 6)
coords <- st_coordinates(Jun_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Jun_price$lagPrice <- lag.listw(spatialWeights, Jun_price$each_month_price)
#Jul
Jul_price <- revenue_panel.sf %>%
filter(month == 7)
coords <- st_coordinates(Jul_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Jul_price$lagPrice <- lag.listw(spatialWeights, Jul_price$each_month_price)
#August
Aug_price <- revenue_panel.sf %>%
filter(month == 8)
coords <- st_coordinates(Aug_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Aug_price$lagPrice <- lag.listw(spatialWeights, Aug_price$each_month_price)
#September
Sep_price <- revenue_panel.sf %>%
filter(month == 9)
coords <- st_coordinates(Sep_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Sep_price$lagPrice <- lag.listw(spatialWeights, Sep_price$each_month_price)
#October
Oct_price <- revenue_panel.sf %>%
filter(month == 10)
coords <- st_coordinates(Oct_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Oct_price$lagPrice <- lag.listw(spatialWeights, Oct_price$each_month_price)
#November
Nov_price <- revenue_panel.sf %>%
filter(month == 11)
coords <- st_coordinates(Nov_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Nov_price$lagPrice <- lag.listw(spatialWeights, Nov_price$each_month_price)
#December
Dec_price <- revenue_panel.sf %>%
filter(month == 12)
coords <- st_coordinates(Dec_price)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
Dec_price$lagPrice <- lag.listw(spatialWeights, Dec_price$each_month_price)
#----------------------------------------------------------------------
price_panel_lag <- rbind(Jan_price, Feb_price, Mar_price, Apr_price, May_price, Jun_price, Jul_price, Aug_price, Sep_price, Oct_price, Nov_price, Dec_price)
ggplot(price_panel_lag )+
geom_point(aes(x = lagPrice, y = each_month_price), alpha = 0.26)+
geom_smooth(aes(x = lagPrice, y =each_month_price), method = "lm", se= FALSE, color = "orange")+
labs(title="Price as a function of lagPrice",
caption = "Figure xx. Scatterplots of Price and lagPrice")+
plotTheme()
From the figure above, we know that though price has correlation with lag price, their correlation is not that strong. Obviously, some listings with high prices are surrounded by houses with much lower prices. For these listings (both high-price and low-price), lag price might be a misleading predictor. If we ignore these data, we will find that most listings’ prices are similar to the nearby.
set.seed(1234)
month.var <- c(1:12)
Price.monthList <- list()
ams.train <- list()
ams.test <- list()
ams.test.prediction <- list()
ams.test.table <- list()
price_panel_lag <- merge(price_panel_lag,listing.sf.neighbor2[c("id", "Buurt")], by = "id")
Jan_price <- st_drop_geometry(price_panel_lag)%>%
filter(month == 1)
inTrain <- createDataPartition(
y = paste(Jan_price$pool,Jan_price$Buurt,Jan_price$property_type,
Jan_price$host_is_superhost),
p = .60, list = FALSE)
for (i in month.var){
Price.monthList[[i]] <-
st_drop_geometry(price_panel_lag) %>%
filter(month == i)
ams.train[[i]] <- Price.monthList[[i]][inTrain,]
ams.test[[i]] <- Price.monthList[[i]][-inTrain,]
reg.price <- lm(each_month_price ~ .,
data = ams.train[[i]] %>%
dplyr::select(each_month_price, beds, bedrooms, bathrooms, accommodates,
pool, parking, kitchen, AC, fireplace,
Buurt,host_is_superhost,
room_type,property_type,bed_type,
minimum_nights,dist.museum,dist.supermarkets,
Unesco,dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
name.bright, name.spacious,name.luxury,
amenities.number,lagPrice))
ams.test.prediction[[i]] <-
ams.test[[i]] %>%
mutate(price.Predict = predict(reg.price, ams.test[[i]]),
price.AbsError = abs(each_month_price - price.Predict))
if(i ==1){
ams.test.table.all <- ams.test.prediction[[i]]
}else{
ams.test.table[[i]] <- ams.test.prediction[[i]]
ams.test.table.all <- rbind(ams.test.table.all,ams.test.table[[i]])
}
}
stargazer(reg.price, type = "text" ,single.row = FALSE, digits = 3,no.space = FALSE)
##
## =======================================================================
## Dependent variable:
## ---------------------------
## each_month_price
## -----------------------------------------------------------------------
## beds -4.282*
## (2.290)
##
## bedrooms0 -7.799
## (97.931)
##
## bedrooms1 -3.058
## (97.729)
##
## bedrooms10 -361.143*
## (214.845)
##
## bedrooms11 -465.513**
## (204.740)
##
## bedrooms12 -979.836***
## (179.288)
##
## bedrooms2 24.116
## (97.804)
##
## bedrooms3 38.124
## (97.982)
##
## bedrooms4 82.060
## (98.509)
##
## bedrooms5 76.715
## (101.549)
##
## bedrooms6 15.157
## (113.316)
##
## bedrooms7 504.975***
## (144.207)
##
## bedrooms8 -514.822***
## (152.477)
##
## bedrooms9 -344.359*
## (203.917)
##
## bathrooms0.0 -76.697
## (70.481)
##
## bathrooms0.5 -88.476
## (60.055)
##
## bathrooms1.0 -59.145
## (55.886)
##
## bathrooms1.5 -49.244
## (56.059)
##
## bathrooms10.0 8.168
## (182.690)
##
## bathrooms100.5 -116.248
## (176.849)
##
## bathrooms15.0 -137.624
## (174.450)
##
## bathrooms2.0 -34.299
## (56.360)
##
## bathrooms2.5 -21.244
## (57.929)
##
## bathrooms3.0 28.429
## (61.656)
##
## bathrooms3.5 83.352
## (70.371)
##
## bathrooms4.0 315.551***
## (91.468)
##
## bathrooms4.5 -8.929
## (180.469)
##
## bathrooms5.0 82.262
## (195.918)
##
## bathrooms5.5 1,481.201***
## (178.642)
##
## bathrooms7.0 -107.164
## (174.378)
##
## bathrooms8.0 -285.334**
## (135.676)
##
## accommodates10 209.175***
## (53.593)
##
## accommodates11 418.217***
## (122.176)
##
## accommodates12 196.275***
## (48.337)
##
## accommodates14 300.694***
## (94.437)
##
## accommodates16 783.509***
## (67.788)
##
## accommodates17
##
##
## accommodates2 7.773
## (10.631)
##
## accommodates3 8.752
## (11.937)
##
## accommodates4 35.130***
## (11.786)
##
## accommodates5 33.398**
## (16.250)
##
## accommodates6 94.857***
## (16.171)
##
## accommodates7 79.946**
## (31.622)
##
## accommodates8 107.201***
## (26.719)
##
## accommodates9 25.491
## (105.262)
##
## poolPool 1.631
## (19.416)
##
## parkingParking 2.136
## (3.894)
##
## kitchenNo kitchen -13.440**
## (6.013)
##
## ACNo AC -14.540**
## (7.282)
##
## fireplaceNo Fireplace -18.099***
## (6.495)
##
## BuurtAalsmeerwegbuurt West 1.403
## (28.709)
##
## BuurtAlexanderplein e.o. -47.035
## (70.319)
##
## BuurtAmstel III deel A/B Noord -4.541
## (191.532)
##
## BuurtAmstelglorie -1.654
## (68.502)
##
## BuurtAmstelkwartier Noord -32.537
## (45.589)
##
## BuurtAmstelkwartier West -72.527
## (83.942)
##
## BuurtAmstelkwartier Zuid -61.646
## (170.755)
##
## BuurtAmstelpark -61.556
## (173.435)
##
## BuurtAmstelveldbuurt 46.734
## (49.255)
##
## BuurtAmsterdamse Bos -62.809
## (67.388)
##
## BuurtAmsterdamse Poort -8.093
## (95.469)
##
## BuurtAndreasterrein -43.474
## (61.624)
##
## BuurtAnjeliersbuurt Noord -27.695
## (57.915)
##
## BuurtAnjeliersbuurt Zuid 2.653
## (55.467)
##
## BuurtArchitectenbuurt -70.523
## (48.788)
##
## BuurtBalboaplein e.o. -52.967
## (39.152)
##
## BuurtBanne Noordoost -43.423
## (88.954)
##
## BuurtBanne Noordwest -18.979
## (93.061)
##
## BuurtBanne Zuidoost -63.663
## (73.191)
##
## BuurtBanne Zuidwest -55.083
## (79.141)
##
## BuurtBanpleinbuurt -5.583
## (55.285)
##
## BuurtBedrijvencentrum Osdorp -87.002
## (167.991)
##
## BuurtBedrijvencentrum Westerkwartier -108.398
## (82.790)
##
## BuurtBedrijvengebied Cruquiusweg -104.845
## (88.915)
##
## BuurtBedrijvengebied Veelaan -69.415
## (167.233)
##
## BuurtBedrijvengebied Zeeburgerkade -70.630
## (99.810)
##
## BuurtBedrijvenpark Lutkemeer -113.389
## (171.356)
##
## BuurtBedrijventerrein Hamerstraat -95.012
## (72.679)
##
## BuurtBedrijventerrein Landlust -62.578
## (58.932)
##
## BuurtBedrijventerrein Schinkel -48.076
## (44.330)
##
## BuurtBeethovenbuurt -54.068
## (56.290)
##
## BuurtBegijnhofbuurt 30.236
## (60.071)
##
## BuurtBelgi< U+00EB> plein e.o. -81.041
## (100.056)
##
## BuurtBellamybuurt Noord -23.182
## (36.986)
##
## BuurtBellamybuurt Zuid -33.517
## (34.305)
##
## BuurtBertelmanpleinbuurt 4.467
## (52.811)
##
## BuurtBetondorp -68.415
## (58.033)
##
## BuurtBG-terrein e.o. 25.034
## (55.346)
##
## BuurtBijlmermuseum Noord -38.851
## (120.521)
##
## BuurtBijlmermuseum Zuid -105.384
## (112.102)
##
## BuurtBijlmerpark Oost 193.697
## (143.177)
##
## BuurtBlauwe Zand -88.720
## (57.982)
##
## BuurtBloemenbuurt Noord -16.768
## (63.634)
##
## BuurtBloemenbuurt Zuid -58.400
## (61.171)
##
## BuurtBloemgrachtbuurt -25.065
## (52.890)
##
## BuurtBorgerbuurt -32.807
## (34.797)
##
## BuurtBorneo -60.796
## (40.743)
##
## BuurtBosleeuw -75.038
## (47.460)
##
## BuurtBretten Oost -29.310
## (174.486)
##
## BuurtBuiksloterbreek -42.325
## (90.105)
##
## BuurtBuiksloterdijk West -93.816
## (93.937)
##
## BuurtBuiksloterham -51.745
## (84.852)
##
## BuurtBuikslotermeer Noord -76.067
## (82.133)
##
## BuurtBuikslotermeerplein -111.532
## (75.971)
##
## BuurtBuitenveldert Midden Zuid 9.429
## (51.298)
##
## BuurtBuitenveldert Oost Midden -74.912
## (57.612)
##
## BuurtBuitenveldert West Midden 46.903
## (82.367)
##
## BuurtBuitenveldert Zuidoost -35.489
## (57.465)
##
## BuurtBuitenveldert Zuidwest -58.298
## (50.562)
##
## BuurtBurgemeester Tellegenbuurt Oost -2.106
## (37.861)
##
## BuurtBurgemeester Tellegenbuurt West -51.520
## (40.586)
##
## BuurtBurgwallen Oost 81.624
## (53.609)
##
## BuurtBuurt 10 -93.015
## (86.815)
##
## BuurtBuurt 2 -25.312
## (69.845)
##
## BuurtBuurt 3 -91.835
## (56.971)
##
## BuurtBuurt 4 Oost -102.163
## (73.884)
##
## BuurtBuurt 5 Noord -68.043
## (95.191)
##
## BuurtBuurt 5 Zuid -52.555
## (74.122)
##
## BuurtBuurt 6 -116.742
## (106.756)
##
## BuurtBuurt 7 -86.289
## (96.046)
##
## BuurtBuurt 8 -81.229
## (77.113)
##
## BuurtBuurt 9 -91.370
## (127.863)
##
## BuurtBuyskade e.o. -69.324
## (45.796)
##
## BuurtCalandlaan/Lelylaan -68.502
## (72.815)
##
## BuurtCentrumeiland -45.043
## (179.191)
##
## BuurtCircus/Kermisbuurt -55.716
## (130.018)
##
## BuurtCoenhaven/Mercuriushaven -69.130
## (177.180)
##
## BuurtColumbusplein e.o. -21.904
## (35.906)
##
## BuurtConcertgebouwbuurt -14.515
## (39.816)
##
## BuurtCornelis Douwesterrein -65.692
## (139.038)
##
## BuurtCornelis Schuytbuurt 17.667
## (34.936)
##
## BuurtCornelis Troostbuurt -28.753
## (33.590)
##
## BuurtCremerbuurt Oost -41.347
## (31.053)
##
## BuurtCremerbuurt West -52.076*
## (28.008)
##
## BuurtCzaar Peterbuurt -33.857
## (41.687)
##
## BuurtD-buurt 35.162
## (95.612)
##
## BuurtDa Costabuurt Noord -32.094
## (34.073)
##
## BuurtDa Costabuurt Zuid -46.831
## (34.455)
##
## BuurtDapperbuurt Noord -51.418*
## (30.973)
##
## BuurtDapperbuurt Zuid -55.817*
## (31.853)
##
## BuurtDe Aker Oost -41.294
## (39.692)
##
## BuurtDe Aker West -86.352
## (58.010)
##
## BuurtDe Bongerd -37.062
## (76.232)
##
## BuurtDe Eenhoorn -51.685
## (53.186)
##
## BuurtDe Kleine Wereld -106.586
## (87.989)
##
## BuurtDe Klenckebuurt -16.429
## (102.912)
##
## BuurtDe Omval -65.182
## (69.224)
##
## BuurtDe Punt -71.932
## (54.260)
##
## BuurtDe Wester Quartier -37.469
## (41.018)
##
## BuurtDe Wetbuurt -34.231
## (52.329)
##
## BuurtDe Wittenbuurt Noord -39.834
## (52.573)
##
## BuurtDe Wittenbuurt Zuid -81.458
## (64.427)
##
## BuurtDelflandpleinbuurt Oost -30.609
## (70.793)
##
## BuurtDelflandpleinbuurt West -101.567**
## (41.809)
##
## BuurtDen Texbuurt -41.966
## (51.452)
##
## BuurtDiamantbuurt -6.915
## (36.169)
##
## BuurtDiepenbrockbuurt -74.545
## (88.826)
##
## BuurtDon Bosco -77.977**
## (38.000)
##
## BuurtDorp Driemond 50.279
## (138.425)
##
## BuurtDorp Sloten -54.051
## (58.499)
##
## BuurtDriehoekbuurt -18.166
## (58.730)
##
## BuurtDuivelseiland 6.851
## (40.525)
##
## BuurtDurgerdam -63.386
## (58.889)
##
## BuurtE-buurt -38.370
## (85.445)
##
## BuurtEcowijk -28.865
## (60.793)
##
## BuurtEendrachtspark -6.526
## (173.795)
##
## BuurtElandsgrachtbuurt -1.804
## (50.298)
##
## BuurtElzenhagen Noord -64.797
## (72.228)
##
## BuurtElzenhagen Zuid -41.386
## (173.990)
##
## BuurtEmanuel van Meterenbuurt -52.158
## (51.541)
##
## BuurtEntrepot-Noordwest -35.972
## (51.927)
##
## BuurtErasmusparkbuurt Oost 45.693
## (48.365)
##
## BuurtErasmusparkbuurt West -56.729
## (42.806)
##
## BuurtF-buurt -0.335
## (86.423)
##
## BuurtFannius Scholtenbuurt -61.424
## (48.823)
##
## BuurtFelix Meritisbuurt 31.263
## (51.913)
##
## BuurtFilips van Almondekwartier -43.688
## (43.453)
##
## BuurtFlevopark -59.166
## (88.587)
##
## BuurtFrankendael 64.865
## (68.162)
##
## BuurtFrans Halsbuurt -13.930
## (31.976)
##
## BuurtFrederik Hendrikbuurt Noord -39.408
## (41.701)
##
## BuurtFrederik Hendrikbuurt Zuidoost -58.708
## (38.281)
##
## BuurtFrederik Hendrikbuurt Zuidwest -40.922
## (45.182)
##
## BuurtFrederikspleinbuurt 55.140
## (50.340)
##
## BuurtG-buurt Noord -31.792
## (106.742)
##
## BuurtG-buurt Oost 1.524
## (82.777)
##
## BuurtG-buurt West -6.552
## (82.610)
##
## BuurtGaasperdam Noord 19.986
## (127.772)
##
## BuurtGaasperdam Zuid 19.012
## (142.559)
##
## BuurtGaasperpark 16.169
## (153.992)
##
## BuurtGaasperplas 44.538
## (147.871)
##
## BuurtGein Noordoost 46.823
## (125.475)
##
## BuurtGein Noordwest -29.806
## (135.780)
##
## BuurtGein Zuidwest 17.800
## (170.730)
##
## BuurtGein Zuioost 0.174
## (143.856)
##
## BuurtGelderlandpleinbuurt -119.643**
## (47.359)
##
## BuurtGerard Doubuurt -38.861
## (31.747)
##
## BuurtGeuzenhofbuurt -78.577*
## (40.630)
##
## BuurtGibraltarbuurt -45.596
## (51.100)
##
## BuurtGouden Bocht -39.145
## (62.008)
##
## BuurtGroenmarktkadebuurt -45.046
## (58.139)
##
## BuurtGrunder/Koningshoef 15.032
## (91.839)
##
## BuurtHaarlemmerbuurt Oost 168.960***
## (59.813)
##
## BuurtHaarlemmerbuurt West -48.769
## (60.210)
##
## BuurtHakfort/Huigenbos -11.225
## (124.649)
##
## BuurtHarmoniehofbuurt -7.877
## (65.501)
##
## BuurtHaveneiland Noord 30.279
## (80.735)
##
## BuurtHaveneiland Noordoost -51.718
## (63.587)
##
## BuurtHaveneiland Noordwest -45.440
## (61.558)
##
## BuurtHaveneiland Oost -29.884
## (72.248)
##
## BuurtHaveneiland Zuidwest/Rieteiland West -71.716
## (62.505)
##
## BuurtHelmersbuurt Oost -44.176
## (31.605)
##
## BuurtHemelrijk -14.633
## (59.138)
##
## BuurtHemonybuurt -52.585*
## (29.626)
##
## BuurtHercules Seghersbuurt -19.023
## (34.863)
##
## BuurtHet Funen -64.261
## (49.554)
##
## BuurtHiltonbuurt -115.666
## (86.677)
##
## BuurtHolendrecht Oost -9.168
## (126.487)
##
## BuurtHolendrecht West 11.873
## (196.464)
##
## BuurtHolysloot 3.331
## (105.911)
##
## BuurtHondecoeterbuurt -18.193
## (41.013)
##
## BuurtHoptille 3.359
## (118.063)
##
## BuurtHouthavens Oost -46.999
## (67.308)
##
## BuurtHouthavens West -70.725
## (70.874)
##
## BuurtIJplein e.o. -52.800
## (49.927)
##
## BuurtIJsbaanpad e.o. -30.657
## (52.292)
##
## BuurtIJselbuurt Oost -39.663
## (38.682)
##
## BuurtIJselbuurt West -56.446
## (42.978)
##
## BuurtJacob Geelbuurt -99.672
## (88.165)
##
## BuurtJacques Veldmanbuurt -50.879
## (38.370)
##
## BuurtJan Maijenbuurt -40.905
## (41.809)
##
## BuurtJava-eiland -137.086**
## (53.839)
##
## BuurtJohan Jongkindbuurt -39.710
## (88.024)
##
## BuurtJohannnes Vermeerbuurt -4.018
## (38.854)
##
## BuurtJohn Franklinbuurt -61.730
## (43.456)
##
## BuurtJulianapark -51.801
## (88.393)
##
## BuurtK-buurt Midden 142.691
## (142.696)
##
## BuurtK-buurt Zuidoost -39.373
## (107.828)
##
## BuurtK-buurt Zuidwest -77.321
## (186.749)
##
## BuurtKadijken -12.447
## (53.456)
##
## BuurtKadoelen -18.828
## (86.597)
##
## BuurtKalverdriehoek -29.309
## (57.460)
##
## BuurtKantershof -17.788
## (100.954)
##
## BuurtKattenburg -83.419
## (58.374)
##
## BuurtKazernebuurt -49.238
## (65.612)
##
## BuurtKelbergen -33.115
## (128.107)
##
## BuurtKNSM-eiland -88.446*
## (45.565)
##
## BuurtKolenkitbuurt Noord -81.234
## (65.059)
##
## BuurtKolenkitbuurt Zuid -85.913
## (52.995)
##
## BuurtKoningin Wilhelminaplein -87.767*
## (46.222)
##
## BuurtKop Zeedijk 20.548
## (58.900)
##
## BuurtKop Zuidas -94.032
## (73.532)
##
## BuurtKortenaerkwartier -46.382
## (39.752)
##
## BuurtKortvoort 24.587
## (122.324)
##
## BuurtKromme Mijdrechtbuurt -59.686
## (42.478)
##
## BuurtL-buurt -0.033
## (103.151)
##
## BuurtLaan van Spartaan -60.158
## (54.038)
##
## BuurtLandelijk gebied Driemond 134.882
## (139.309)
##
## BuurtLandlust Noord -44.239
## (50.416)
##
## BuurtLandlust Zuid -36.924
## (43.108)
##
## BuurtLangestraat e.o. 26.216
## (57.315)
##
## BuurtLastage -21.981
## (56.319)
##
## BuurtLegmeerpleinbuurt 153.194***
## (35.608)
##
## BuurtLeidsebuurt Noordoost 1.609
## (49.329)
##
## BuurtLeidsebuurt Noordwest -9.192
## (56.405)
##
## BuurtLeidsebuurt Zuidoost -14.323
## (55.658)
##
## BuurtLeidsebuurt Zuidwest -46.921
## (60.553)
##
## BuurtLeidsegracht Noord -7.108
## (55.951)
##
## BuurtLeidsegracht Zuid -1.360
## (55.093)
##
## BuurtLeliegracht e.o. -13.215
## (55.021)
##
## BuurtLinnaeusparkbuurt -43.262
## (38.265)
##
## BuurtLizzy Ansinghbuurt -47.156
## (36.159)
##
## BuurtLoenermark -60.058
## (85.109)
##
## BuurtLootsbuurt -29.535
## (35.518)
##
## BuurtLouis Crispijnbuurt -76.475
## (64.369)
##
## BuurtLucas/Andreasziekenhuis e.o. -19.468
## (83.863)
##
## BuurtMarathonbuurt Oost -34.897
## (41.280)
##
## BuurtMarathonbuurt West -71.994**
## (34.793)
##
## BuurtMarcanti -60.840
## (54.523)
##
## BuurtMarine-Etablissement -80.774
## (62.136)
##
## BuurtMarjoleinterrein -16.077
## (113.388)
##
## BuurtMarkengouw Midden -58.649
## (71.584)
##
## BuurtMarkengouw Noord 49.874
## (128.345)
##
## BuurtMarkengouw Zuid -105.698
## (172.290)
##
## BuurtMarkthallen -92.080
## (78.675)
##
## BuurtMarnixbuurt Midden -31.627
## (64.673)
##
## BuurtMarnixbuurt Noord -28.677
## (62.160)
##
## BuurtMarnixbuurt Zuid -11.870
## (60.115)
##
## BuurtMedisch Centrum Slotervaart -75.195
## (167.229)
##
## BuurtMeer en Oever -80.474
## (69.438)
##
## BuurtMercatorpark -46.181
## (76.533)
##
## BuurtMiddelveldsche Akerpolder -73.965
## (79.071)
##
## BuurtMiddenmeer Noord -73.742*
## (40.023)
##
## BuurtMiddenmeer Zuid -65.575*
## (35.463)
##
## BuurtMinervabuurt Midden 35.249
## (47.702)
##
## BuurtMinervabuurt Noord 63.274
## (48.730)
##
## BuurtMinervabuurt Zuid -87.568*
## (47.479)
##
## BuurtMolenwijk -30.699
## (146.658)
##
## BuurtMuseumplein -54.325
## (72.465)
##
## BuurtNDSM terrein 256.810***
## (92.226)
##
## BuurtNes e.o. -26.179
## (57.310)
##
## BuurtNieuw Sloten Noordoost -58.742
## (79.612)
##
## BuurtNieuw Sloten Noordwest 28.054
## (52.534)
##
## BuurtNieuw Sloten Zuidoost -116.953
## (80.523)
##
## BuurtNieuw Sloten Zuidwest -72.400
## (64.069)
##
## BuurtNieuwe Diep/Diemerpark -56.207
## (81.982)
##
## BuurtNieuwe Kerk e.o. -38.704
## (56.052)
##
## BuurtNieuwe Meer -57.716
## (171.302)
##
## BuurtNieuwe Oosterbegraafplaats -92.110
## (120.022)
##
## BuurtNieuwendammerdijk Oost -69.899
## (69.200)
##
## BuurtNieuwendammerdijk Zuid -63.782
## (86.055)
##
## BuurtNieuwendammmerdijk West -62.343
## (56.569)
##
## BuurtNieuwendijk Noord -16.111
## (63.280)
##
## BuurtNieuwmarkt 92.070*
## (55.065)
##
## BuurtNintemanterrein -88.337
## (128.232)
##
## BuurtNoorder IJplas 83.883
## (290.377)
##
## BuurtNoorderstrook Oost -42.108
## (172.973)
##
## BuurtNoorderstrook West -16.123
## (134.515)
##
## BuurtNoordoever Sloterplas -56.713
## (56.437)
##
## BuurtNoordoostkwadrant Indische buurt -81.905**
## (33.229)
##
## BuurtNoordwestkwadrant Indische buurt Noord -53.307*
## (29.894)
##
## BuurtNoordwestkwadrant Indische buurt Zuid -65.367**
## (30.596)
##
## BuurtOlympisch Stadion e.o. -93.596*
## (55.320)
##
## BuurtOokmeer -59.453
## (92.291)
##
## BuurtOostelijke Handelskade -49.669
## (66.172)
##
## BuurtOostenburg -27.127
## (41.295)
##
## BuurtOosterdokseiland 45.423
## (77.058)
##
## BuurtOosterpark -16.783
## (50.540)
##
## BuurtOosterparkbuurt Noordwest -37.238
## (30.520)
##
## BuurtOosterparkbuurt Zuidoost -42.275
## (31.240)
##
## BuurtOosterparkbuurt Zuidwest -35.832
## (35.109)
##
## BuurtOostoever Sloterplas -63.692
## (54.085)
##
## BuurtOostpoort -69.129*
## (38.291)
##
## BuurtOostzanerdijk -77.077
## (108.759)
##
## BuurtOrteliusbuurt Midden -78.849*
## (41.479)
##
## BuurtOrteliusbuurt Noord -64.742
## (45.804)
##
## BuurtOrteliusbuurt Zuid -49.792
## (38.899)
##
## BuurtOsdorp Midden Noord -56.360
## (80.606)
##
## BuurtOsdorp Midden Zuid -34.301
## (73.680)
##
## BuurtOsdorp Zuidoost -61.906
## (50.321)
##
## BuurtOsdorper Binnenpolder -109.020
## (92.168)
##
## BuurtOsdorper Bovenpolder -109.910
## (108.697)
##
## BuurtOsdorpplein e.o. -68.254
## (68.662)
##
## BuurtOude Kerk e.o. 21.745
## (56.376)
##
## BuurtOveramstel -25.204
## (177.010)
##
## BuurtOverbraker Binnenpolder -78.804
## (99.630)
##
## BuurtOverhoeks -96.091
## (77.307)
##
## BuurtOvertoomse Veld Noord -84.592*
## (46.227)
##
## BuurtOvertoomse Veld Zuid -79.196*
## (46.167)
##
## BuurtP.C. Hooftbuurt 33.493
## (45.551)
##
## BuurtPapaverweg e.o. -1.720
## (63.747)
##
## BuurtParamariboplein e.o. -62.525**
## (31.157)
##
## BuurtPark de Meer -67.786
## (61.566)
##
## BuurtPark Haagseweg -126.367
## (121.935)
##
## BuurtParooldriehoek -42.666
## (50.023)
##
## BuurtPasseerdersgrachtbuurt 37.926
## (57.063)
##
## BuurtPieter van der Doesbuurt -61.746
## (42.345)
##
## BuurtPlan van Gool -51.741
## (67.691)
##
## BuurtPlanciusbuurt Noord 199.429**
## (79.144)
##
## BuurtPlanciusbuurt Zuid -90.205
## (131.440)
##
## BuurtPlantage -9.430
## (51.378)
##
## BuurtPostjeskade e.o. -59.282*
## (32.743)
##
## BuurtPrinses Irenebuurt -104.080*
## (53.677)
##
## BuurtRAI -60.302
## (77.332)
##
## BuurtRansdorp 13.891
## (89.593)
##
## BuurtRapenburg -32.319
## (54.965)
##
## BuurtRechte H-buurt 25.609
## (107.869)
##
## BuurtReguliersbuurt -2.914
## (67.726)
##
## BuurtReigersbos Midden 50.416
## (138.296)
##
## BuurtReigersbos Noord 38.320
## (129.213)
##
## BuurtReigersbos Zuid 5.167
## (152.092)
##
## BuurtRembrandtpark Noord -47.794
## (52.395)
##
## BuurtRembrandtpark Zuid -27.469
## (43.999)
##
## BuurtRembrandtpleinbuurt -6.850
## (55.021)
##
## BuurtRI Oost terrein -35.220
## (48.334)
##
## BuurtRieteiland Oost -45.044
## (103.480)
##
## BuurtRietlanden -28.240
## (47.668)
##
## BuurtRijnbuurt Midden -58.765
## (44.235)
##
## BuurtRijnbuurt Oost -24.277
## (41.770)
##
## BuurtRijnbuurt West -20.722
## (58.514)
##
## BuurtRobert Scottbuurt Oost -68.410
## (47.662)
##
## BuurtRobert Scottbuurt West -53.322
## (46.671)
##
## BuurtRode Kruisbuurt -113.011
## (107.268)
##
## BuurtSarphatiparkbuurt -18.431
## (29.159)
##
## BuurtSarphatistrook -21.387
## (47.515)
##
## BuurtScheepvaarthuisbuurt -49.939
## (55.242)
##
## BuurtScheldebuurt Midden -66.610*
## (40.432)
##
## BuurtScheldebuurt Oost -52.420
## (42.676)
##
## BuurtScheldebuurt West -56.412
## (39.494)
##
## BuurtSchellingwoude Oost -74.026
## (52.598)
##
## BuurtSchellingwoude West -33.126
## (77.754)
##
## BuurtSchinkelbuurt Noord -60.626**
## (29.268)
##
## BuurtSchinkelbuurt Zuid -47.684
## (38.602)
##
## BuurtSchipluidenbuurt -80.845
## (87.593)
##
## BuurtScience Park Noord -85.771
## (52.252)
##
## BuurtScience Park Zuid -31.913
## (121.237)
##
## BuurtSlotermeer Zuid -26.295
## (61.936)
##
## BuurtSloterpark -109.692
## (80.759)
##
## BuurtSloterweg e.o. -29.744
## (76.438)
##
## BuurtSpaarndammerbuurt Midden -48.619
## (66.511)
##
## BuurtSpaarndammerbuurt Noordoost -72.359
## (59.274)
##
## BuurtSpaarndammerbuurt Noordwest -65.514
## (71.583)
##
## BuurtSpaarndammerbuurt Zuidoost -46.163
## (60.380)
##
## BuurtSpaarndammerbuurt Zuidwest -61.116
## (57.782)
##
## BuurtSpiegelbuurt -27.023
## (51.038)
##
## BuurtSporenburg -55.554
## (41.916)
##
## BuurtSportpark Middenmeer Noord -32.591
## (99.842)
##
## BuurtSportpark Middenmeer Zuid -131.982
## (99.289)
##
## BuurtSportpark Voorland 85.830
## (167.968)
##
## BuurtSpuistraat Noord 0.904
## (56.516)
##
## BuurtSpuistraat Zuid 32.642
## (57.454)
##
## BuurtStaalmanbuurt -59.548
## (40.667)
##
## BuurtStaatsliedenbuurt Noordoost -68.186
## (54.555)
##
## BuurtStationsplein e.o. -46.282
## (173.062)
##
## BuurtSteigereiland Noord -49.491
## (55.328)
##
## BuurtSteigereiland Zuid -34.645
## (47.806)
##
## BuurtSurinamepleinbuurt -78.896**
## (36.776)
##
## BuurtSwammerdambuurt -39.870
## (31.832)
##
## BuurtTeleport -118.323
## (130.491)
##
## BuurtTerrasdorp -31.525
## (77.992)
##
## BuurtTransvaalbuurt Oost -36.171
## (31.724)
##
## BuurtTransvaalbuurt West -66.769*
## (34.941)
##
## BuurtTrompbuurt -53.360
## (39.859)
##
## BuurtTuindorp Amstelstation -15.232
## (71.493)
##
## BuurtTuindorp Frankendael -109.289**
## (53.993)
##
## BuurtTuindorp Nieuwendam Oost -58.542
## (57.391)
##
## BuurtTuindorp Nieuwendam West -53.332
## (69.638)
##
## BuurtTuindorp Oostzaan Oost -37.804
## (87.327)
##
## BuurtTuindorp Oostzaan West -48.146
## (128.818)
##
## BuurtTwiske Oost 111.260
## (147.206)
##
## BuurtTwiske West -12.236
## (105.979)
##
## BuurtUilenburg -29.799
## (55.766)
##
## BuurtUtrechtsebuurt Zuid 21.866
## (51.339)
##
## BuurtValeriusbuurt Oost 6.817
## (44.294)
##
## BuurtValeriusbuurt West -49.855
## (37.695)
##
## BuurtValkenburg -36.317
## (57.688)
##
## BuurtVan Brakelkwartier -39.536
## (51.601)
##
## BuurtVan der Helstpleinbuurt -44.095
## (30.487)
##
## BuurtVan der Kunbuurt -20.581
## (75.096)
##
## BuurtVan der Pekbuurt -50.042
## (53.066)
##
## BuurtVan Loonbuurt -7.171
## (50.327)
##
## BuurtVan Tuyllbuurt -44.009
## (33.527)
##
## BuurtVelserpolder West 28.348
## (80.615)
##
## BuurtVeluwebuurt -60.794
## (69.439)
##
## BuurtVenserpolder Oost -28.936
## (73.363)
##
## BuurtVliegenbos -6.294
## (59.410)
##
## BuurtVogelbuurt Noord -71.456
## (61.340)
##
## BuurtVogelbuurt Zuid -18.066
## (47.588)
##
## BuurtVogeltjeswei 51.135
## (139.185)
##
## BuurtVondelpark Oost -26.121
## (85.869)
##
## BuurtVondelpark West -55.508
## (55.020)
##
## BuurtVondelparkbuurt Midden -24.871
## (37.544)
##
## BuurtVondelparkbuurt Oost -35.075
## (36.961)
##
## BuurtVondelparkbuurt West -26.193
## (31.084)
##
## BuurtVU-kwartier -52.747
## (88.617)
##
## BuurtWalvisbuurt -13.805
## (110.385)
##
## BuurtWaterloopleinbuurt -20.338
## (59.915)
##
## BuurtWeesperbuurt -16.532
## (46.578)
##
## BuurtWeespertrekvaart -72.137
## (65.568)
##
## BuurtWeesperzijde Midden/Zuid -49.829
## (34.464)
##
## BuurtWerengouw Midden -50.120
## (61.254)
##
## BuurtWerengouw Noord -36.424
## (126.446)
##
## BuurtWerengouw Zuid -75.029
## (69.349)
##
## BuurtWestelijke eilanden -9.536
## (61.139)
##
## BuurtWesterdokseiland -73.945
## (54.242)
##
## BuurtWestergasfabriek -42.459
## (70.428)
##
## BuurtWesterstaatsman -48.431
## (47.938)
##
## BuurtWestlandgrachtbuurt -70.155**
## (30.578)
##
## BuurtWeteringbuurt -3.612
## (47.019)
##
## BuurtWG-terrein -23.361
## (32.934)
##
## BuurtWielingenbuurt -61.859
## (45.737)
##
## BuurtWildeman -47.979
## (62.072)
##
## BuurtWillemsparkbuurt Noord -13.429
## (38.347)
##
## BuurtWillibrordusbuurt -26.745
## (32.094)
##
## BuurtWittenburg -32.771
## (45.123)
##
## BuurtWoon- en Groengebied Sloterdijk -67.715
## (90.874)
##
## BuurtZaagpoortbuurt -49.088
## (61.330)
##
## BuurtZamenhofstraat e.o. -66.876
## (170.451)
##
## BuurtZeeburgerdijk Oost -72.896
## (121.748)
##
## BuurtZeeburgereiland Noordoost -3.775
## (102.027)
##
## BuurtZeeburgereiland Noordwest -27.687
## (89.028)
##
## BuurtZeeburgereiland Zuidoost -36.626
## (169.115)
##
## BuurtZeeburgereiland Zuidwest -69.398
## (68.988)
##
## BuurtZeeheldenbuurt -74.999
## (55.385)
##
## BuurtZorgvlied -94.071
## (104.717)
##
## BuurtZuidas Noord -83.150
## (88.421)
##
## BuurtZuidas Zuid -26.104
## (61.132)
##
## BuurtZuiderhof -75.118
## (168.428)
##
## BuurtZuiderkerkbuurt 2.304
## (54.491)
##
## BuurtZuidoostkwadrant Indische buurt -70.646*
## (37.445)
##
## BuurtZuidwestkwadrant Indische buurt -54.647
## (37.625)
##
## BuurtZuidwestkwadrant Osdorp Noord -84.459
## (68.262)
##
## BuurtZuidwestkwadrant Osdorp Zuid -72.124*
## (41.973)
##
## BuurtZunderdorp -33.482
## (82.884)
##
## host_is_superhostf 31.685
## (84.154)
##
## host_is_superhostt 23.815
## (84.235)
##
## room_typePrivate room -32.607***
## (4.617)
##
## room_typeShared room -13.293
## (25.084)
##
## property_typeApartment -14.855
## (25.359)
##
## property_typeBarn -102.173
## (88.710)
##
## property_typeBed and breakfast -15.156
## (26.986)
##
## property_typeBoat 19.512
## (27.354)
##
## property_typeBoutique hotel 2.490
## (43.863)
##
## property_typeBungalow -6.151
## (67.896)
##
## property_typeCabin -6.376
## (52.587)
##
## property_typeCamper/RV -69.812
## (130.124)
##
## property_typeCampsite -47.158
## (126.653)
##
## property_typeCasa particular (Cuba) -17.525
## (78.901)
##
## property_typeCastle 30.529
## (168.702)
##
## property_typeChalet -33.467
## (100.904)
##
## property_typeCondominium -13.566
## (27.189)
##
## property_typeCottage -30.833
## (57.238)
##
## property_typeEarth house -72.598
## (167.475)
##
## property_typeGuest suite -26.590
## (29.070)
##
## property_typeGuesthouse -23.336
## (37.865)
##
## property_typeHostel -19.507
## (92.605)
##
## property_typeHotel 238.210***
## (73.362)
##
## property_typeHouse -19.031
## (26.041)
##
## property_typeHouseboat -0.206
## (28.167)
##
## property_typeLighthouse 416.346**
## (178.880)
##
## property_typeLoft 13.029
## (26.832)
##
## property_typeNature lodge 14.917
## (179.227)
##
## property_typeOther -18.139
## (35.428)
##
## property_typeServiced apartment 23.628
## (33.517)
##
## property_typeTent -12.215
## (187.455)
##
## property_typeTiny house 9.720
## (79.513)
##
## property_typeTownhouse -26.472
## (26.465)
##
## property_typeVilla -10.312
## (41.976)
##
## bed_typeCouch -30.815
## (122.179)
##
## bed_typeFuton -6.669
## (81.656)
##
## bed_typePull-out Sofa 0.985
## (77.276)
##
## bed_typeReal Bed 16.282
## (75.233)
##
## minimum_nights 0.304
## (0.200)
##
## dist.museum 0.002
## (0.011)
##
## dist.supermarkets 0.002
## (0.015)
##
## Unescowithin -10.741
## (35.152)
##
## dist.metro -0.021
## (0.015)
##
## dist.plaza -0.017
## (0.014)
##
## dist.nightclub 0.009
## (0.013)
##
## dist.beach -0.008
## (0.012)
##
## dist.parks
##
##
## name.brightnot bright -3.885
## (6.404)
##
## name.spaciousspacious 7.942*
## (4.403)
##
## name.luxurynot luxury -34.059***
## (7.140)
##
## amenities.number 0.469**
## (0.188)
##
## lagPrice -0.035*
## (0.018)
##
## Constant -4,033.317
## (28,094.420)
##
## -----------------------------------------------------------------------
## Observations 13,232
## R2 0.153
## Adjusted R2 0.116
## Residual Std. Error 164.612 (df = 12687)
## F Statistic 4.207*** (df = 544; 12687)
## =======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
R Sqaure for this algorithm is lower than 0.5, which means that the regression fails to predict more than half of the listings’ prices in Amsterdam. In terms of accuracy, the algorithm doesn’t perform well enough.
ams.test.price.table <- ams.test.table.all %>%
dplyr::select(id, month, price.Predict, each_month_price, Buurt) %>%
mutate(AE = abs(each_month_price-price.Predict),
APE = abs(each_month_price-price.Predict)/each_month_price)
ggplot(ams.test.price.table, aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
ggplot(ams.test.price.table%>%
filter(APE<1.5), aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
The absolute percentage errors of price for test set have a positively skewed distribution. Most APEs are close to 0.15 and less than 7% of the APEs are higher than 1.5. Those APEs higher 10 might be caused by outliers, whose prices are usually extremely high or low.
ams.test.table.all %>%
drop_na(price.AbsError)%>%
group_by(month)%>%
summarise(MAE=mean(price.AbsError),
MAPE = mean(price.AbsError/each_month_price))%>%
kable() %>% kable_styling()
month | MAE | MAPE |
---|---|---|
1 | 70.38180 | 0.4851576 |
2 | 69.99850 | 0.4803573 |
3 | 70.05682 | 0.4800525 |
4 | 71.33608 | 0.4851716 |
5 | 70.89487 | 0.4826700 |
6 | 71.01547 | 0.4829015 |
7 | 71.01024 | 0.4825626 |
8 | 70.91579 | 0.4822727 |
9 | 71.07073 | 0.4827872 |
10 | 70.72211 | 0.4811256 |
11 | 70.54772 | 0.4802203 |
12 | 70.60160 | 0.4873070 |
ams.test.table.all %>%
drop_na(price.AbsError)%>%
group_by(month)%>%
summarise(MAE=mean(price.AbsError),
MAPE = mean(price.AbsError/each_month_price))%>%
ggplot(aes(month,MAPE)) +
geom_line(size = 1.1,colour = "#4757a2") +
labs(title = "MAPE by Month",
subtitle = "Amsterdam Airbnb price by month prediction",
x = "Month", y= "MAPE") +
plotTheme()
Figure above shows that our algorithm is not that generalizable in time. The prediction for test set has highest error in December and lowest error in November. The prediction is neither accurate enough, as most MAPEs are higher than 48%.
ams.test.prediction[[4]]%>%
group_by(Buurt) %>%
summarize(mean.MAPE = mean(price.AbsError/each_month_price, na.rm = T),
mean.MAE = mean(price.AbsError, na.rm = T)) %>% kable() %>% kable_styling()
Buurt | mean.MAPE | mean.MAE |
---|---|---|
Aalsmeerwegbuurt Oost | 0.8482999 | 79.579513 |
Aalsmeerwegbuurt West | 0.5600389 | 73.681524 |
Alexanderplein e.o. | 0.0799265 | 12.218879 |
Amstelglorie | 0.2849965 | 71.249123 |
Amstelkwartier Noord | 0.4716919 | 49.099742 |
Amstelveldbuurt | 0.7942906 | 91.008419 |
Amsterdamse Bos | 0.1334867 | 28.376262 |
Amsterdamse Poort | 0.6354682 | 78.706971 |
Andreasterrein | 0.7963976 | 66.762241 |
Anjeliersbuurt Noord | 0.4586842 | 76.110797 |
Anjeliersbuurt Zuid | 0.4579480 | 57.785337 |
Architectenbuurt | 0.2833719 | 45.718874 |
Balboaplein e.o. | 0.5926541 | 51.735064 |
Banne Noordoost | 0.7953534 | 43.707293 |
Banne Noordwest | 0.2068969 | 35.066160 |
Banne Zuidoost | 0.4469958 | 36.459125 |
Banne Zuidwest | 0.1587063 | 24.094442 |
Banpleinbuurt | 0.7215072 | 84.288270 |
Bedrijventerrein Hamerstraat | 0.2743106 | 37.340776 |
Bedrijventerrein Landlust | 0.2595329 | 42.056001 |
Bedrijventerrein Schinkel | 0.1204660 | 18.124364 |
Beethovenbuurt | 0.7347758 | 101.773593 |
Begijnhofbuurt | 0.8888703 | 97.085111 |
Belgi<U+00EB>plein e.o. | 0.2195546 | 28.040285 |
Bellamybuurt Noord | 0.5088203 | 64.597126 |
Bellamybuurt Zuid | 0.3558700 | 48.095118 |
Bertelmanpleinbuurt | 0.7582130 | 99.690054 |
Betondorp | 0.2302631 | 46.859817 |
BG-terrein e.o. | 0.4931558 | 78.759158 |
Bijlmermuseum Noord | 0.1038183 | 16.971173 |
Bijlmermuseum Zuid | 1.4721792 | 88.330751 |
Blauwe Zand | 0.6842318 | 88.731578 |
Bloemenbuurt Noord | 0.4583512 | 54.902015 |
Bloemenbuurt Zuid | 0.3833271 | 63.303796 |
Bloemgrachtbuurt | 0.3603517 | 60.133303 |
Borgerbuurt | 0.4520868 | 72.693456 |
Borneo | 0.5697911 | 77.888762 |
Bosleeuw | 0.4316624 | 58.594925 |
Buiksloterbreek | 0.5908330 | 90.940848 |
Buikslotermeer Noord | 0.5250506 | 38.523545 |
Buikslotermeerplein | 0.1971705 | 17.444029 |
Buitenveldert Midden Zuid | 0.7014925 | 83.373883 |
Buitenveldert Oost Midden | 0.3316740 | 61.735455 |
Buitenveldert Zuidoost | 0.5863596 | 54.418563 |
Buitenveldert Zuidwest | 0.6431069 | 113.784307 |
Burgemeester Tellegenbuurt Oost | 0.8312302 | 88.744462 |
Burgemeester Tellegenbuurt West | 0.5175902 | 65.932065 |
Burgwallen Oost | 0.6324443 | 124.664085 |
Buurt 2 | 0.4738365 | 52.628819 |
Buurt 3 | 0.3915345 | 48.559641 |
Buurt 4 Oost | 0.8100930 | 104.535326 |
Buurt 5 Noord | 2.0632156 | 72.571100 |
Buurt 5 Zuid | 1.1169185 | 93.209349 |
Buurt 6 | 0.2406804 | 71.722746 |
Buurt 7 | 0.5235892 | 62.307113 |
Buurt 8 | 0.3669519 | 60.828894 |
Buurt 9 | 0.1850751 | 36.829945 |
Buyskade e.o. | 0.4318008 | 61.477978 |
Calandlaan/Lelylaan | 0.9917287 | 59.920757 |
Columbusplein e.o. | 0.6254155 | 63.934976 |
Concertgebouwbuurt | 0.4735893 | 67.967516 |
Cornelis Schuytbuurt | 0.5790923 | 113.141056 |
Cornelis Troostbuurt | 0.3969562 | 59.774625 |
Cremerbuurt Oost | 0.3302053 | 59.702585 |
Cremerbuurt West | 0.4878790 | 57.049963 |
Czaar Peterbuurt | 0.3504581 | 46.833566 |
D-buurt | 0.6769604 | 270.784176 |
Da Costabuurt Noord | 0.5006239 | 83.813219 |
Da Costabuurt Zuid | 0.4270985 | 50.522190 |
Dapperbuurt Noord | 0.2907810 | 50.664405 |
Dapperbuurt Zuid | 0.5540819 | 81.721536 |
De Aker Oost | 0.2619664 | 40.816218 |
De Aker West | 0.4847173 | 51.996913 |
De Bongerd | 0.6171991 | 60.954868 |
De Eenhoorn | 0.5492506 | 66.388288 |
De Kleine Wereld | 0.3985720 | 85.664065 |
De Klenckebuurt | 0.9342866 | 70.071495 |
De Omval | 0.3145614 | 114.391229 |
De Punt | 0.2417326 | 26.973728 |
De Wester Quartier | 0.4425673 | 79.237986 |
De Wetbuurt | 0.4491094 | 55.644755 |
De Wittenbuurt Noord | 0.4543091 | 84.574308 |
De Wittenbuurt Zuid | 0.3591843 | 87.693271 |
Delflandpleinbuurt Oost | 0.4538331 | 36.697761 |
Delflandpleinbuurt West | 0.2130858 | 36.328263 |
Den Texbuurt | 0.5963689 | 112.729259 |
Diamantbuurt | 0.7306635 | 73.548507 |
Diepenbrockbuurt | 0.3696382 | 65.094564 |
Don Bosco | 0.4472811 | 52.971588 |
Dorp Sloten | 1.0370500 | 97.714523 |
Driehoekbuurt | 0.4568119 | 51.140963 |
Duivelseiland | 0.4856961 | 79.323892 |
Durgerdam | 0.6055265 | 47.145776 |
E-buurt | 0.3245683 | 118.352711 |
Ecowijk | 0.4415434 | 54.110907 |
Elandsgrachtbuurt | 0.4887473 | 86.326062 |
Elzenhagen Noord | 0.7784932 | 61.339132 |
Emanuel van Meterenbuurt | 0.7411112 | 70.028660 |
Entrepot-Noordwest | 0.2624291 | 45.352113 |
Erasmusparkbuurt Oost | 1.2002243 | 124.544878 |
Erasmusparkbuurt West | 0.4805604 | 68.561989 |
F-buurt | 0.3331667 | 93.866256 |
Fannius Scholtenbuurt | 0.3749251 | 44.453507 |
Felix Meritisbuurt | 0.3778189 | 63.279907 |
Filips van Almondekwartier | 0.2922553 | 56.382785 |
Frankendael | 0.8106249 | 113.097300 |
Frans Halsbuurt | 0.4691876 | 61.681336 |
Frederik Hendrikbuurt Noord | 0.4587901 | 66.475724 |
Frederik Hendrikbuurt Zuidoost | 0.3511195 | 76.532073 |
Frederik Hendrikbuurt Zuidwest | 0.3379898 | 63.883093 |
Frederikspleinbuurt | 1.1527539 | 116.575727 |
G-buurt Noord | 0.6484979 | 70.794349 |
G-buurt Oost | 1.2639613 | 151.702362 |
G-buurt West | 0.9167907 | 90.964192 |
Gaasperdam Noord | 0.3872064 | 56.636388 |
Gaasperpark | 0.1515962 | 13.997378 |
Gaasperplas | 0.1340792 | 26.876783 |
Gein Noordoost | 1.1302352 | 75.598301 |
Gein Noordwest | 0.2536248 | 22.018684 |
Gein Zuioost | 0.3455533 | 18.385441 |
Gelderlandpleinbuurt | 0.4478324 | 100.145434 |
Gerard Doubuurt | 0.4537865 | 65.958462 |
Geuzenhofbuurt | 0.4222017 | 61.644382 |
Gibraltarbuurt | 0.4317405 | 58.729588 |
Gouden Bocht | 0.2361294 | 52.235650 |
Groenmarktkadebuurt | 0.5715233 | 52.246039 |
Grunder/Koningshoef | 0.5257551 | 262.877527 |
Haarlemmerbuurt Oost | 0.8266043 | 154.160345 |
Haarlemmerbuurt West | 0.4043421 | 69.869784 |
Hakfort/Huigenbos | 0.2441623 | 32.620247 |
Harmoniehofbuurt | 0.2353300 | 43.145657 |
Haveneiland Noord | 1.5863567 | 124.485870 |
Haveneiland Noordoost | 0.3128995 | 48.176464 |
Haveneiland Noordwest | 0.4092326 | 70.127878 |
Haveneiland Oost | 0.2933539 | 69.587172 |
Haveneiland Zuidwest/Rieteiland West | 0.7364491 | 106.982657 |
Helmersbuurt Oost | 0.4618901 | 79.713689 |
Hemelrijk | 0.4317231 | 96.111283 |
Hemonybuurt | 0.4423449 | 139.205114 |
Hercules Seghersbuurt | 0.7195793 | 77.263532 |
Het Funen | 0.2962107 | 41.026052 |
Holendrecht Oost | 0.9216839 | 174.437212 |
Hondecoeterbuurt | 0.6042666 | 69.750940 |
Hoptille | 0.7155618 | 241.240931 |
Houthavens Oost | 0.2478251 | 28.643737 |
Houthavens West | 0.5580632 | 131.384067 |
IJplein e.o. | 0.7746175 | 64.807206 |
IJsbaanpad e.o. | 0.2944515 | 58.405035 |
IJselbuurt Oost | 0.4040523 | 59.208971 |
IJselbuurt West | 0.2250278 | 48.056648 |
Jacob Geelbuurt | 0.2435287 | 60.882181 |
Jacques Veldmanbuurt | 0.6149482 | 80.942563 |
Jan Maijenbuurt | 0.5445055 | 67.156018 |
Java-eiland | 0.6497427 | 155.364872 |
Johan Jongkindbuurt | 0.4574941 | 49.439861 |
Johannnes Vermeerbuurt | 0.5919263 | 67.143663 |
John Franklinbuurt | 0.3455264 | 61.693945 |
Julianapark | 0.7111052 | 52.277001 |
K-buurt Zuidoost | 0.1404654 | 12.314133 |
Kadijken | 0.6919844 | 78.682644 |
Kadoelen | 0.5597032 | 68.070761 |
Kalverdriehoek | 0.4857547 | 92.341819 |
Kantershof | 0.2474689 | 20.330889 |
Kattenburg | 0.2829312 | 46.169090 |
Kazernebuurt | 0.3308971 | 111.184260 |
KNSM-eiland | 0.3957661 | 60.799917 |
Kolenkitbuurt Noord | 0.3379981 | 77.143240 |
Kolenkitbuurt Zuid | 0.5293842 | 64.627318 |
Koningin Wilhelminaplein | 0.3467333 | 77.314123 |
Kop Zeedijk | 0.7679214 | 84.459160 |
Kop Zuidas | 0.5471934 | 180.726880 |
Kortenaerkwartier | 0.4395164 | 85.457909 |
Kortvoort | 0.3171244 | 31.078190 |
Kromme Mijdrechtbuurt | 0.4014451 | 55.302712 |
L-buurt | 0.2862584 | 37.588169 |
Laan van Spartaan | 0.4326749 | 55.661925 |
Landlust Noord | 0.4031663 | 50.468995 |
Landlust Zuid | 0.4596964 | 55.699365 |
Langestraat e.o. | 0.5717155 | 98.583127 |
Lastage | 0.3207628 | 93.175090 |
Legmeerpleinbuurt | 1.1525055 | 205.971666 |
Leidsebuurt Noordoost | 0.5124568 | 86.971748 |
Leidsebuurt Noordwest | 0.6009920 | 54.444048 |
Leidsebuurt Zuidoost | 0.4765612 | 83.620457 |
Leidsebuurt Zuidwest | 0.3178246 | 72.987227 |
Leidsegracht Noord | 0.4067210 | 72.323268 |
Leidsegracht Zuid | 0.4505997 | 49.494991 |
Leliegracht e.o. | 0.4090148 | 61.579954 |
Linnaeusparkbuurt | 0.6756746 | 63.628615 |
Lizzy Ansinghbuurt | 0.4101128 | 62.669200 |
Loenermark | 0.0204618 | 3.620532 |
Lootsbuurt | 0.4705289 | 314.834424 |
Louis Crispijnbuurt | 0.4821637 | 103.663073 |
Lucas/Andreasziekenhuis e.o. | 0.3439832 | 117.529657 |
Marathonbuurt Oost | 0.5796580 | 75.706013 |
Marathonbuurt West | 0.4207985 | 52.653848 |
Marcanti | 0.3545225 | 46.444569 |
Marine-Etablissement | 0.3776606 | 126.622811 |
Markengouw Midden | 0.4168370 | 150.972435 |
Markthallen | 0.3960551 | 53.846668 |
Marnixbuurt Midden | 0.3352077 | 82.333830 |
Marnixbuurt Noord | 0.3809387 | 53.775860 |
Marnixbuurt Zuid | 0.8561556 | 136.417082 |
Meer en Oever | 0.1338822 | 10.752959 |
Mercatorpark | 0.2872377 | 36.743469 |
Middelveldsche Akerpolder | 0.4887032 | 25.773783 |
Middenmeer Noord | 0.3671617 | 83.669745 |
Middenmeer Zuid | 0.5353730 | 60.452547 |
Minervabuurt Midden | 0.8913877 | 115.931465 |
Minervabuurt Noord | 0.7229124 | 85.152927 |
Minervabuurt Zuid | 0.7734265 | 140.297673 |
Molenwijk | 0.1747834 | 29.713184 |
Museumplein | 0.0183529 | 2.165643 |
NDSM terrein | 2.2758186 | 552.898138 |
Nes e.o. | 0.4708715 | 60.214050 |
Nieuw Sloten Noordoost | 0.3521263 | 59.569457 |
Nieuw Sloten Noordwest | 1.1000836 | 93.537092 |
Nieuw Sloten Zuidoost | 0.3597568 | 33.285712 |
Nieuw Sloten Zuidwest | 0.6293183 | 56.009331 |
Nieuwe Kerk e.o. | 0.3604292 | 58.210499 |
Nieuwendammerdijk Oost | 0.2469759 | 45.954117 |
Nieuwendammmerdijk West | 0.3336743 | 62.211233 |
Nieuwendijk Noord | 0.4296475 | 55.746268 |
Nieuwmarkt | 0.8617167 | 127.071381 |
Noordoever Sloterplas | 0.3170490 | 52.520363 |
Noordoostkwadrant Indische buurt | 0.3781155 | 46.717299 |
Noordwestkwadrant Indische buurt Noord | 0.4542709 | 51.977908 |
Noordwestkwadrant Indische buurt Zuid | 0.4431618 | 82.774067 |
Olympisch Stadion e.o. | 0.1692985 | 20.432150 |
Ookmeer | 1.3271658 | 53.086633 |
Oostelijke Handelskade | 0.2350411 | 33.784755 |
Oostenburg | 0.4914992 | 70.935988 |
Oosterpark | 0.5544603 | 50.825482 |
Oosterparkbuurt Noordwest | 0.4262980 | 46.724698 |
Oosterparkbuurt Zuidoost | 0.5386380 | 63.069634 |
Oosterparkbuurt Zuidwest | 0.5681674 | 66.860011 |
Oostoever Sloterplas | 0.6189193 | 80.683528 |
Oostpoort | 0.2530412 | 36.154703 |
Oostzanerdijk | 0.5860497 | 58.575392 |
Orteliusbuurt Midden | 0.3374337 | 65.416746 |
Orteliusbuurt Noord | 0.3548012 | 55.990240 |
Orteliusbuurt Zuid | 0.4936860 | 67.559934 |
Osdorp Midden Noord | 0.3853096 | 45.668149 |
Osdorp Midden Zuid | 1.2103963 | 105.292117 |
Osdorp Zuidoost | 0.5783300 | 43.569331 |
Osdorper Binnenpolder | 0.0101763 | 1.690962 |
Osdorpplein e.o. | 0.1758429 | 22.824592 |
Oude Kerk e.o. | 0.3575976 | 70.234766 |
Overhoeks | 0.7858372 | 196.459302 |
Overtoomse Veld Noord | 0.6379286 | 111.874307 |
Overtoomse Veld Zuid | 0.2386703 | 27.142563 |
P.C. Hooftbuurt | 0.4664210 | 106.138925 |
Papaverweg e.o. | 0.5339062 | 127.451147 |
Paramariboplein e.o. | 0.4478085 | 62.949053 |
Park de Meer | 0.1239049 | 28.022190 |
Parooldriehoek | 0.3859117 | 34.765705 |
Passeerdersgrachtbuurt | 0.8228697 | 109.032316 |
Pieter van der Doesbuurt | 0.3782911 | 117.747709 |
Plan van Gool | 0.4913810 | 58.311358 |
Planciusbuurt Noord | 2.8902092 | 289.020922 |
Plantage | 0.6343966 | 80.655213 |
Postjeskade e.o. | 0.4419432 | 53.149338 |
Prinses Irenebuurt | 0.7170671 | 93.014561 |
RAI | 0.3241991 | 43.841165 |
Ransdorp | 0.6508467 | 65.735515 |
Rapenburg | 0.2795397 | 41.633865 |
Rechte H-buurt | 0.9461994 | 47.309972 |
Reguliersbuurt | 0.5647622 | 114.614390 |
Reigersbos Midden | 0.6220575 | 79.687605 |
Reigersbos Noord | 0.2494638 | 29.436723 |
Rembrandtpark Noord | 0.2279013 | 20.119756 |
Rembrandtpark Zuid | 0.3756689 | 58.815867 |
Rembrandtpleinbuurt | 0.5770631 | 83.680500 |
RI Oost terrein | 0.3942268 | 76.361052 |
Rietlanden | 1.1021497 | 84.794203 |
Rijnbuurt Midden | 0.3535228 | 46.671466 |
Rijnbuurt Oost | 0.4668837 | 53.972267 |
Rijnbuurt West | 0.5145902 | 73.108918 |
Robert Scottbuurt Oost | 0.3875369 | 47.877897 |
Robert Scottbuurt West | 0.4228762 | 49.869398 |
Rode Kruisbuurt | 0.5042654 | 130.604741 |
Sarphatiparkbuurt | 0.5582229 | 81.440191 |
Sarphatistrook | 0.5018880 | 79.881132 |
Scheepvaarthuisbuurt | 0.3099189 | 75.401826 |
Scheldebuurt Midden | 0.6285853 | 57.485704 |
Scheldebuurt Oost | 0.4499047 | 85.805308 |
Scheldebuurt West | 0.4864030 | 62.244387 |
Schellingwoude Oost | 0.4425946 | 59.332013 |
Schellingwoude West | 0.4969251 | 55.962510 |
Schinkelbuurt Noord | 0.3246526 | 51.525083 |
Schinkelbuurt Zuid | 0.3732498 | 82.881073 |
Science Park Noord | 0.6192178 | 112.995634 |
Science Park Zuid | 0.4399238 | 84.025446 |
Slotermeer Zuid | 1.2621779 | 82.001443 |
Sloterweg e.o. | 0.6498752 | 227.456317 |
Spaarndammerbuurt Midden | 0.3160943 | 33.339270 |
Spaarndammerbuurt Noordoost | 0.3279943 | 37.079530 |
Spaarndammerbuurt Noordwest | 1.0297800 | 115.245887 |
Spaarndammerbuurt Zuidoost | 0.3871032 | 79.184414 |
Spaarndammerbuurt Zuidwest | 0.3195916 | 49.860548 |
Spiegelbuurt | 0.2792729 | 55.409908 |
Sporenburg | 0.4414433 | 53.534428 |
Spuistraat Noord | 0.4640802 | 61.553048 |
Spuistraat Zuid | 0.7098804 | 131.061108 |
Staalmanbuurt | 0.4505089 | 90.206901 |
Staatsliedenbuurt Noordoost | 0.7669482 | 67.225401 |
Steigereiland Noord | 0.2776520 | 40.480090 |
Steigereiland Zuid | 0.6908766 | 61.601743 |
Surinamepleinbuurt | 0.7000518 | 88.760202 |
Swammerdambuurt | 0.2685603 | 37.035588 |
Terrasdorp | 0.5857807 | 99.179897 |
Transvaalbuurt Oost | 0.5223970 | 63.690756 |
Transvaalbuurt West | 0.3684852 | 47.644309 |
Trompbuurt | 0.3605183 | 56.770372 |
Tuindorp Frankendael | 0.7056819 | 41.040293 |
Tuindorp Nieuwendam Oost | 0.3927176 | 86.976683 |
Tuindorp Nieuwendam West | 0.3615555 | 55.181241 |
Tuindorp Oostzaan Oost | 0.4014273 | 80.419388 |
Tuindorp Oostzaan West | 1.0461332 | 146.458653 |
Twiske West | 0.5448704 | 58.659654 |
Uilenburg | 0.4159883 | 114.104819 |
Utrechtsebuurt Zuid | 0.4662460 | 79.835607 |
Valeriusbuurt Oost | 0.5591964 | 111.103936 |
Valeriusbuurt West | 0.6877767 | 87.974913 |
Valkenburg | 0.3669002 | 59.565454 |
Van Brakelkwartier | 0.3390716 | 46.385607 |
Van der Helstpleinbuurt | 0.5036071 | 79.075316 |
Van der Pekbuurt | 0.3776711 | 68.033958 |
Van Loonbuurt | 0.6861531 | 91.968889 |
Van Tuyllbuurt | 0.6991448 | 71.507267 |
Velserpolder West | 0.8826984 | 85.985805 |
Veluwebuurt | 0.8286163 | 43.745202 |
Venserpolder Oost | 0.6134227 | 179.403041 |
Vliegenbos | 0.5818375 | 182.981681 |
Vogelbuurt Noord | 0.3149512 | 61.686386 |
Vogelbuurt Zuid | 0.5934294 | 96.220170 |
Vondelpark West | 0.3750202 | 73.658552 |
Vondelparkbuurt Midden | 0.4511767 | 78.040057 |
Vondelparkbuurt Oost | 0.3997721 | 60.698904 |
Vondelparkbuurt West | 0.4052725 | 62.194751 |
VU-kwartier | 0.2805235 | 40.440631 |
Walvisbuurt | 0.2959902 | 17.759410 |
Waterloopleinbuurt | 0.3216620 | 48.344516 |
Weesperbuurt | 0.4317617 | 73.956656 |
Weespertrekvaart | 0.3828840 | 114.087231 |
Weesperzijde Midden/Zuid | 0.4406814 | 63.144185 |
Werengouw Midden | 0.3985734 | 45.940976 |
Werengouw Zuid | 1.1130987 | 100.951415 |
Westelijke eilanden | 0.6613661 | 86.072107 |
Westerdokseiland | 0.3847410 | 74.119086 |
Westergasfabriek | 0.4198981 | 62.299820 |
Westerstaatsman | 0.4444843 | 55.513248 |
Westlandgrachtbuurt | 0.3919547 | 52.376319 |
Weteringbuurt | 0.5481243 | 91.275984 |
WG-terrein | 0.5633897 | 84.782358 |
Wielingenbuurt | 0.3816397 | 62.559760 |
Wildeman | 2.3677661 | 148.591874 |
Willemsparkbuurt Noord | 0.4455301 | 96.628472 |
Willibrordusbuurt | 0.3994362 | 52.051667 |
Wittenburg | 0.4384509 | 47.696502 |
Woon- en Groengebied Sloterdijk | 0.6242589 | 59.292091 |
Zaagpoortbuurt | 0.4231808 | 85.207364 |
Zeeburgereiland Zuidwest | 0.3232985 | 29.116638 |
Zeeheldenbuurt | 0.3617490 | 69.435192 |
Zuidas Zuid | 1.2330960 | 123.472047 |
Zuiderkerkbuurt | 0.5601882 | 121.844265 |
Zuidoostkwadrant Indische buurt | 0.2654702 | 51.187997 |
Zuidwestkwadrant Indische buurt | 0.4932241 | 51.022590 |
Zuidwestkwadrant Osdorp Noord | 0.2924177 | 58.483548 |
Zuidwestkwadrant Osdorp Zuid | 0.4055739 | 44.095377 |
Zunderdorp | 0.2000008 | 26.373806 |
ams.test.prediction[[4]]%>%
group_by(Buurt) %>%
summarize(mean.MAPE = mean(price.AbsError/each_month_price, na.rm = T),
mean.MAE = mean(price.AbsError, na.rm = T)) %>%
ungroup() %>%
left_join(neighbor2,by = "Buurt") %>%
st_sf() %>%
ggplot() +
geom_sf(aes(fill = mean.MAPE),colour = 'transparent') +
scale_fill_gradient(low = palette5[1], high = palette5[5],
name = "MAPE") +
labs(title = "Mean test set MAPE by Buurt",
subtitle = "April, 2018") +
mapTheme
Figure and table above shows that our model is generalizable across space. Its performance in accuracy is not that good as generalizability. That’s probably because we take many spatial features into consideration but miss some key points like time effect due to lack of data.
occupancy3 <- merge(occupancy3, listing.sf.neighbor2[c("id", "Buurt")], by = "id")
set.seed(5164)
month.var <- c(1:12)
Occupancy.monthList <- list()
ams.train <- list()
ams.test <- list()
ams.test.prediction <- list()
ams.test.table <- list()
Jan_occupancy <- st_drop_geometry(price_panel_lag)%>%
filter(month == 1)%>%
mutate(bathrooms = as.numeric(bathrooms),
bedrooms = as.numeric(bedrooms))
inTrain <- createDataPartition(
y = paste(Jan_occupancy$pool,Jan_occupancy$Buurt,Jan_occupancy$property_type,
Jan_occupancy$host_is_superhost),
p = .60, list = FALSE)
for (i in month.var){
Occupancy.monthList[[i]] <-
st_drop_geometry(price_panel_lag) %>%
mutate(bathrooms = as.numeric(bathrooms),
bedrooms = as.numeric(bedrooms) )%>%
filter(month == i)
ams.train[[i]] <- Occupancy.monthList[[i]][inTrain,]
ams.test[[i]] <- Occupancy.monthList[[i]][-inTrain,]
reg.occupancy <- lm(monthly_occupancy ~ .,
data = ams.train[[i]] %>%
dplyr::select(monthly_occupancy, beds, bedrooms, bathrooms, accommodates,
pool, parking, kitchen, AC, fireplace,
Buurt,host_is_superhost,
room_type,property_type,bed_type,
minimum_nights,dist.museum,dist.supermarkets,
Unesco,dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
name.bright, name.spacious,name.luxury,
amenities.number))
ams.test.prediction[[i]] <-
ams.test[[i]] %>%
mutate(occupancy.Predict = predict(reg.occupancy, ams.test[[i]]),
occupancy.AbsError = abs(monthly_occupancy - occupancy.Predict))
if(i ==1){
ams.test.table.all <- ams.test.prediction[[i]]
}else{
ams.test.table[[i]] <- ams.test.prediction[[i]]
ams.test.table.all <- rbind(ams.test.table.all,ams.test.table[[i]])
}
}
ams.test.occupancy.table <- ams.test.table.all %>%
dplyr::select(id, month, occupancy.Predict, monthly_occupancy, Buurt) %>%
mutate(AE = abs(monthly_occupancy-occupancy.Predict),
APE = abs(monthly_occupancy-occupancy.Predict)/monthly_occupancy)
ggplot(ams.test.occupancy.table, aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
ggplot(ams.test.occupancy.table %>%
filter(APE<1.5),
aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
The absolute percentage errors of occupancy for test set have a positively skewed distribution. Most APEs are close to 0.15 and less than 6% of tje APEs are higher than 1.5. Those APEs higher 10 might be caused by outliers, whose occupancy are usually low (e.g. 0 per month).
ams.test.revenue.table <-
merge(ams.test.occupancy.table[c("id","month","occupancy.Predict","monthly_occupancy","Buurt")],
ams.test.price.table[c("id","month","price.Predict","each_month_price")],by=c("id","month")) %>%
mutate(revenue = monthly_occupancy * each_month_price,
predictRevenue = occupancy.Predict*price.Predict) %>%
group_by(id, Buurt)%>%
summarise(annualRevenue = sum(revenue),
predictAnnualRev = sum(predictRevenue))%>%
mutate(AE= abs(annualRevenue-predictAnnualRev),
APE = abs(annualRevenue-predictAnnualRev)/annualRevenue)
ggplot(ams.test.revenue.table,
aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
ggplot(ams.test.revenue.table %>%
filter(APE<1.5),
aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
Most annual revenues predicted by our first approach have an APE close to 0.15, which is a sign to accurate prediction. However, there are still some predicted revenues that have APE higher than 1, which may cause problems in our use case.
ams.test.revenue.table%>%
filter(APE<1.5)%>%
group_by(Buurt) %>%
summarize(mean.APE = mean(APE, na.rm = T)) %>%
ungroup() %>%
left_join(neighbor2,by = "Buurt") %>%
st_sf() %>%
ggplot() +
geom_sf(aes(fill = mean.APE),colour = 'transparent') +
scale_fill_gradient(low = palette5[1], high = palette5[5],
name = "MAPE") +
labs(title = "Mean test set MAPE by Buurt",
subtitle = "2019") +
mapTheme
High MAPE of prediction exists at the outskirt of Amsterdam. Far away from the city center, those listing at the outskirt are seldom occupied by renter, since population density is usually lower at the outskirt. The following analysis also proves our speculation, indicating that the listings with high MAPEs are mainly vacant throughout the year and thus have little revenue.
revenue_panel <- merge(revenue_panel, listing.sf.neighbor2[c("id", "Buurt")], by = "id")
revenue_panel%>%
group_by(Buurt) %>%
summarize(occupancy = mean(monthly_occupancy, na.rm = T)) %>%
ungroup() %>%
left_join(neighbor2,by = "Buurt") %>%
st_sf() %>%
ggplot() +
geom_sf(aes(fill = occupancy),colour = 'transparent') +
scale_fill_gradient(low = palette5[1], high = palette5[5],
name = "occupancy") +
labs(title = "Occupancy by Buurt",
subtitle = "2019") +
mapTheme
Those areas with low occupancy are almost the same as those with high MAPE.
annualrevenue <- revenue_panel %>%
group_by(id) %>%
summarise(annual_revenue = sum(revenue))
annualrevenue<- left_join(details.sf, annualrevenue,by="id")%>%
filter(!id %in% no_price)
annualrevenue<- annualrevenue %>%
drop_na(annual_revenue)
coords <- st_coordinates(annualrevenue)
neighborList <- knn2nb(knearneigh(coords, 5))
spatialWeights <- nb2listw(neighborList, style="W")
annualrevenue$lagRevenue <- lag.listw(spatialWeights, annualrevenue$annual_revenue)
ggplot(annualrevenue)+
geom_point(aes(x = lagRevenue, y = annual_revenue), alpha = 0.26)+
geom_smooth(aes(x = lagRevenue, y =annual_revenue), method = "lm", se= FALSE, color = "orange")+
labs(title="Revenue as a function of lagRevenue",
caption = "Figure xx. Scatterplots of revenue and lagRevenue")+
plotTheme()
From the figure above, we know that though the annual revenue has correlation with lag annual revenue, their correlation is not that strong. Obviously, some listings with high annual revenues are surrounded by houses with much lower annual revenue. For these listings, lag annual revenue might be a misleading predictor. If we ignore these data, we will find that most listings’ annual revenues are similar to the nearby.
annualrevenue <- merge(annualrevenue,listing.sf.neighbor2[c("id", "Buurt")], by = "id")
#Split training and test set
set.seed(31497)
inTrain <- caret::createDataPartition(
y = st_drop_geometry(annualrevenue)$annual_revenue,
p = .6, list = FALSE)
annualrevenue.training <- st_drop_geometry(annualrevenue)[inTrain,]
annualrevenue.test <- st_drop_geometry(annualrevenue)[-inTrain,]
reg.annualrevenue <- lm(annual_revenue ~ ., data = st_drop_geometry(annualrevenue) %>%
dplyr::select(annual_revenue,beds, bedrooms, bathrooms, accommodates,
pool, parking, kitchen, AC, fireplace,
Buurt,host_is_superhost,
room_type,property_type,bed_type,
minimum_nights,dist.museum,dist.supermarkets,
Unesco,dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
name.bright, name.spacious,name.luxury,
amenities.number,lagRevenue)
)
annualrev_predict_test <- annualrevenue.test %>%
mutate(Prediction = predict(reg.annualrevenue, newdata = annualrevenue.test)) %>%
mutate(Prediction = ifelse(Prediction > 0, Prediction, mean(annualrevenue.training$annual_revenue)))%>%
filter(annual_revenue!=0)%>%
drop_na(Prediction)%>%
mutate(AE = abs(Prediction-annual_revenue),
APE = AE/Prediction)
test_result <- data.frame(MAE = c(mean(annualrev_predict_test$AE, na.rm=T)),
MAPE = c(scales::percent(mean(annualrev_predict_test$APE, na.rm=T))))
test_result %>%
kable(caption = "Figure 8. Mean absolute error and MAPE for a single test set")%>%
kable_styling("striped", full_width = F)
MAE | MAPE |
---|---|
25161.32 | 49% |
stargazer(reg.annualrevenue,
type = "text",
title ="Regression Output",
single.row = TRUE,
out.header = TRUE)
##
## Regression Output
## =======================================================================
## Dependent variable:
## ---------------------------
## annual_revenue
## -----------------------------------------------------------------------
## beds -2,671.288*** (641.807)
## bedrooms0 7,342.858 (21,428.930)
## bedrooms1 9,963.485 (21,362.120)
## bedrooms10 72,715.150* (38,608.570)
## bedrooms11 49,066.300 (63,071.530)
## bedrooms12 15,007.930 (50,473.470)
## bedrooms2 16,064.770 (21,396.530)
## bedrooms3 20,739.420 (21,467.850)
## bedrooms4 38,146.470* (21,657.710)
## bedrooms5 36,992.390 (22,902.180)
## bedrooms6 57,290.620** (26,905.420)
## bedrooms7 15,018.830 (39,087.540)
## bedrooms8 20,379.390 (36,037.210)
## bedrooms9 66,234.090 (62,880.770)
## bathrooms0.0 3,294.723 (22,069.200)
## bathrooms0.5 6,052.833 (19,028.410)
## bathrooms1.0 10,489.170 (17,916.220)
## bathrooms1.5 14,352.940 (17,955.390)
## bathrooms10.0 37,882.310 (60,834.130)
## bathrooms100.5 17,725.990 (59,498.610)
## bathrooms15.0 -7,965.049 (58,857.190)
## bathrooms2.0 17,233.890 (18,032.780)
## bathrooms2.5 15,986.710 (18,425.190)
## bathrooms3.0 16,752.270 (19,127.040)
## bathrooms3.5 48,464.670** (21,739.600)
## bathrooms4.0 14,764.660 (26,643.960)
## bathrooms4.5 70,751.860 (60,349.580)
## bathrooms5.0 -148.185 (59,469.590)
## bathrooms5.5 63,896.570 (59,879.500)
## bathrooms7.0 -13,590.590 (58,842.490)
## bathrooms8.0 21,592.720 (44,920.160)
## accommodates10 39,520.710** (16,124.190)
## accommodates11 -12,999.070 (33,845.480)
## accommodates12 1,466.921 (15,123.930)
## accommodates14 46,613.710 (29,998.690)
## accommodates16 106,032.300*** (18,462.190)
## accommodates17 100,911.400 (84,571.250)
## accommodates2 2,820.207 (2,952.888)
## accommodates3 3,662.260 (3,310.069)
## accommodates4 8,073.109** (3,271.736)
## accommodates5 13,417.040*** (4,527.325)
## accommodates6 17,891.620*** (4,547.433)
## accommodates7 9,468.115 (8,760.905)
## accommodates8 24,053.570*** (7,336.879)
## accommodates9 1,914.108 (26,627.430)
## poolPool -2,777.525 (6,509.817)
## parkingParking -2,874.765*** (1,070.198)
## kitchenNo kitchen -8,183.811*** (1,703.067)
## ACNo AC -261.767 (2,032.951)
## fireplaceNo Fireplace -1,195.410 (1,859.437)
## BuurtAalsmeerwegbuurt West -10,523.100 (7,804.907)
## BuurtAlexanderplein e.o. -24,882.970 (19,954.410)
## BuurtAmstel III deel A/B Noord -430.237 (62,233.020)
## BuurtAmstelglorie 11,343.320 (20,827.930)
## BuurtAmstelkwartier Noord -10,172.430 (12,588.370)
## BuurtAmstelkwartier West -14,576.170 (27,371.590)
## BuurtAmstelkwartier Zuid 1,601.040 (57,331.830)
## BuurtAmstelpark -5,149.907 (57,959.950)
## BuurtAmstelveldbuurt 1,681.415 (13,232.570)
## BuurtAmsterdamse Bos -77,124.290*** (19,345.770)
## BuurtAmsterdamse Poort -3,501.498 (26,931.050)
## BuurtAndreasterrein -23,813.090 (17,137.610)
## BuurtAnjeliersbuurt Noord -4,074.862 (15,723.160)
## BuurtAnjeliersbuurt Zuid -3,114.607 (15,017.990)
## BuurtArchitectenbuurt -17,115.000 (13,777.920)
## BuurtBalboaplein e.o. -11,599.170 (10,767.450)
## BuurtBanne Noordoost -7,766.115 (24,985.290)
## BuurtBanne Noordwest 2,616.713 (26,395.470)
## BuurtBanne Zuidoost -6,753.600 (20,643.230)
## BuurtBanne Zuidwest -19,005.510 (22,106.820)
## BuurtBanpleinbuurt -25,224.880 (15,873.240)
## BuurtBedrijvencentrum Osdorp -23,106.150 (56,697.970)
## BuurtBedrijvencentrum Westerkwartier -21,517.960 (26,471.590)
## BuurtBedrijvengebied Cruquiusweg -8,081.281 (29,446.040)
## BuurtBedrijvengebied Veelaan -29,944.570 (56,501.610)
## BuurtBedrijvengebied Zeeburgerkade -50,772.130 (33,349.320)
## BuurtBedrijvenpark Lutkemeer -43,575.480 (57,533.970)
## BuurtBedrijventerrein Hamerstraat -15,853.680 (21,425.930)
## BuurtBedrijventerrein Landlust -8,897.556 (16,318.010)
## BuurtBedrijventerrein Schinkel -20,785.030* (12,552.680)
## BuurtBeethovenbuurt -22,603.440 (16,221.330)
## BuurtBegijnhofbuurt 225.913 (16,650.810)
## BuurtBelgi< U+00EB> plein e.o. -22,397.800 (26,444.780)
## BuurtBellamybuurt Noord -909.526 (10,198.140)
## BuurtBellamybuurt Zuid -2,249.562 (9,440.483)
## BuurtBertelmanpleinbuurt 1,137.986 (14,916.220)
## BuurtBetondorp -11,089.760 (16,450.140)
## BuurtBG-terrein e.o. 9,704.575 (15,087.640)
## BuurtBijlmermuseum Noord -10,763.380 (32,220.670)
## BuurtBijlmermuseum Zuid -43,173.190 (32,909.930)
## BuurtBijlmerpark Oost 66,042.360 (45,895.390)
## BuurtBlauwe Zand -12,311.680 (16,232.650)
## BuurtBloemenbuurt Noord 3,125.184 (17,826.210)
## BuurtBloemenbuurt Zuid -7,305.736 (17,238.430)
## BuurtBloemgrachtbuurt -4,484.730 (14,294.550)
## BuurtBorgerbuurt -924.264 (9,552.943)
## BuurtBorneo -15,317.980 (11,267.590)
## BuurtBosleeuw -14,456.500 (13,134.320)
## BuurtBretten Oost -831.983 (58,193.890)
## BuurtBuiksloterbreek -2,970.403 (26,008.490)
## BuurtBuiksloterdijk West -23,531.560 (29,879.120)
## BuurtBuiksloterham -13,796.630 (27,002.440)
## BuurtBuikslotermeer Noord -8,379.752 (24,018.040)
## BuurtBuikslotermeerplein -29,687.010 (21,334.120)
## BuurtBuitenveldert Midden Zuid -21,609.620 (14,184.760)
## BuurtBuitenveldert Oost Midden -17,388.410 (15,745.550)
## BuurtBuitenveldert West Midden 15,215.720 (27,008.260)
## BuurtBuitenveldert Zuidoost -12,245.850 (15,909.460)
## BuurtBuitenveldert Zuidwest -10,677.840 (14,122.150)
## BuurtBurgemeester Tellegenbuurt Oost -14,126.250 (10,373.570)
## BuurtBurgemeester Tellegenbuurt West -11,761.750 (11,131.620)
## BuurtBurgwallen Oost -7,124.753 (14,545.990)
## BuurtBuurt 10 -49,272.390* (27,028.510)
## BuurtBuurt 2 -3,208.065 (19,471.730)
## BuurtBuurt 3 -19,373.720 (15,908.220)
## BuurtBuurt 4 Oost -24,511.840 (20,953.600)
## BuurtBuurt 5 Noord -20,752.410 (26,412.340)
## BuurtBuurt 5 Zuid -32,229.330 (21,084.430)
## BuurtBuurt 6 -27,135.340 (31,046.490)
## BuurtBuurt 7 -34,281.930 (28,429.700)
## BuurtBuurt 8 -16,156.210 (21,958.550)
## BuurtBuurt 9 -16,964.860 (35,541.790)
## BuurtBuyskade e.o. -12,279.100 (12,600.530)
## BuurtCalandlaan/Lelylaan -33,808.420* (20,197.160)
## BuurtCentrumeiland -25,793.050 (59,284.920)
## BuurtCircus/Kermisbuurt -13,058.690 (40,757.170)
## BuurtCoenhaven/Mercuriushaven -34,416.310 (58,802.010)
## BuurtColumbusplein e.o. -2,952.314 (9,857.044)
## BuurtConcertgebouwbuurt -9,094.406 (11,075.090)
## BuurtCornelis Douwesterrein -20,026.060 (44,903.360)
## BuurtCornelis Schuytbuurt 3,470.345 (9,438.897)
## BuurtCornelis Troostbuurt -6,256.948 (9,171.211)
## BuurtCremerbuurt Oost -3,295.712 (8,494.607)
## BuurtCremerbuurt West -8,629.172 (7,667.895)
## BuurtCzaar Peterbuurt -7,497.579 (11,548.520)
## BuurtD-buurt 26,579.750 (28,482.170)
## BuurtDa Costabuurt Noord -7,799.582 (9,380.354)
## BuurtDa Costabuurt Zuid -6,725.658 (9,521.668)
## BuurtDapperbuurt Noord -10,995.240 (8,415.776)
## BuurtDapperbuurt Zuid -10,841.580 (8,662.347)
## BuurtDe Aker Oost -21,635.680* (11,364.590)
## BuurtDe Aker West -24,160.990 (17,594.350)
## BuurtDe Bongerd -13,595.790 (21,343.190)
## BuurtDe Eenhoorn -12,343.710 (14,937.750)
## BuurtDe Kleine Wereld -14,651.340 (25,346.550)
## BuurtDe Klenckebuurt -36,650.800 (29,979.850)
## BuurtDe Omval -3,912.905 (19,788.380)
## BuurtDe Punt -27,513.780* (15,488.890)
## BuurtDe Wester Quartier 1,400.198 (11,347.670)
## BuurtDe Wetbuurt -10,233.970 (14,688.540)
## BuurtDe Wittenbuurt Noord 832.239 (14,557.540)
## BuurtDe Wittenbuurt Zuid -10,504.270 (17,842.630)
## BuurtDelflandpleinbuurt Oost -9,969.239 (20,665.990)
## BuurtDelflandpleinbuurt West -25,056.080** (11,622.700)
## BuurtDen Texbuurt -10,499.230 (13,911.720)
## BuurtDiamantbuurt 71.128 (9,925.603)
## BuurtDiepenbrockbuurt -32,189.750 (26,567.750)
## BuurtDon Bosco -13,654.840 (10,456.700)
## BuurtDorp Driemond -2,926.733 (41,426.780)
## BuurtDorp Sloten -21,748.010 (17,181.300)
## BuurtDriehoekbuurt -12,461.040 (15,919.780)
## BuurtDuivelseiland -7,903.559 (11,109.180)
## BuurtDurgerdam -11,529.800 (17,536.860)
## BuurtE-buurt -19,935.350 (24,760.610)
## BuurtEcowijk -4,896.178 (17,397.800)
## BuurtEendrachtspark 4,198.607 (58,029.200)
## BuurtElandsgrachtbuurt -1,788.972 (13,551.110)
## BuurtElzenhagen Noord -5,226.040 (20,332.710)
## BuurtElzenhagen Zuid -68,376.370 (58,114.570)
## BuurtEmanuel van Meterenbuurt -8,969.874 (14,855.890)
## BuurtEntrepot-Noordwest -10,606.900 (14,329.540)
## BuurtErasmusparkbuurt Oost 14,338.910 (13,401.500)
## BuurtErasmusparkbuurt West -7,730.713 (11,829.840)
## BuurtF-buurt -6,800.002 (25,126.420)
## BuurtFannius Scholtenbuurt -13,728.050 (13,472.890)
## BuurtFelix Meritisbuurt 1,739.221 (14,049.860)
## BuurtFilips van Almondekwartier -1,907.804 (11,974.300)
## BuurtFlevopark -14,972.820 (29,404.030)
## BuurtFrankendael 31,401.110 (20,127.750)
## BuurtFrans Halsbuurt 1,259.693 (8,723.803)
## BuurtFrederik Hendrikbuurt Noord -6,999.611 (11,506.140)
## BuurtFrederik Hendrikbuurt Zuidoost -4,542.340 (10,580.100)
## BuurtFrederik Hendrikbuurt Zuidwest -1,375.316 (12,607.780)
## BuurtFrederikspleinbuurt 9,691.447 (13,715.510)
## BuurtG-buurt Noord -15,376.980 (31,338.850)
## BuurtG-buurt Oost -13,540.540 (23,414.710)
## BuurtG-buurt West -5,975.510 (23,326.650)
## BuurtGaasperdam Noord -2,979.015 (36,375.950)
## BuurtGaasperdam Zuid -1,464.797 (42,337.610)
## BuurtGaasperpark -20,787.690 (42,906.160)
## BuurtGaasperplas -20,788.890 (42,439.010)
## BuurtGein Noordoost -6,547.009 (35,386.800)
## BuurtGein Noordwest -21,895.860 (38,348.200)
## BuurtGein Zuidwest -1,459.335 (52,781.990)
## BuurtGein Zuioost -10,123.240 (40,767.200)
## BuurtGelderlandpleinbuurt -23,520.350* (12,925.620)
## BuurtGerard Doubuurt -10,741.760 (8,664.883)
## BuurtGeuzenhofbuurt -14,851.050 (11,191.250)
## BuurtGibraltarbuurt -9,107.259 (14,107.580)
## BuurtGouden Bocht -3,183.889 (17,371.140)
## BuurtGroenmarktkadebuurt -10,587.100 (15,947.560)
## BuurtGrunder/Koningshoef -5,688.314 (27,054.450)
## BuurtHaarlemmerbuurt Oost 42,647.680*** (16,267.920)
## BuurtHaarlemmerbuurt West -9,449.227 (16,404.040)
## BuurtHakfort/Huigenbos -2,052.775 (34,819.390)
## BuurtHarmoniehofbuurt -4,092.068 (19,493.450)
## BuurtHaveneiland Noord -621.714 (22,195.310)
## BuurtHaveneiland Noordoost -21,969.150 (17,468.030)
## BuurtHaveneiland Noordwest -19,417.160 (17,406.510)
## BuurtHaveneiland Oost -17,617.610 (19,991.670)
## BuurtHaveneiland Zuidwest/Rieteiland West -20,231.860 (17,218.040)
## BuurtHelmersbuurt Oost -7,942.993 (8,654.582)
## BuurtHemelrijk 390.203 (16,217.710)
## BuurtHemonybuurt 7.559 (8,072.055)
## BuurtHercules Seghersbuurt -5,391.654 (9,526.823)
## BuurtHet Funen -14,477.950 (13,663.260)
## BuurtHiltonbuurt -28,619.390 (28,919.610)
## BuurtHolendrecht Oost 190.590 (36,637.270)
## BuurtHolendrecht West 1,783.520 (63,424.860)
## BuurtHolysloot -9,220.498 (32,644.120)
## BuurtHondecoeterbuurt -5,980.578 (11,199.100)
## BuurtHoptille 24,553.480 (32,807.060)
## BuurtHouthavens Oost -17,230.820 (19,512.820)
## BuurtHouthavens West -19,223.150 (19,644.550)
## BuurtIJplein e.o. -15,082.070 (13,835.130)
## BuurtIJsbaanpad e.o. -3,403.390 (15,019.200)
## BuurtIJselbuurt Oost -15,018.140 (10,658.060)
## BuurtIJselbuurt West -10,117.890 (11,786.230)
## BuurtJacob Geelbuurt -11,395.160 (26,524.250)
## BuurtJacques Veldmanbuurt -7,889.456 (10,611.590)
## BuurtJan Maijenbuurt -5,242.083 (11,529.740)
## BuurtJava-eiland -20,424.690 (15,088.780)
## BuurtJohan Jongkindbuurt -6,640.001 (26,438.160)
## BuurtJohannnes Vermeerbuurt -7,188.490 (10,547.800)
## BuurtJohn Franklinbuurt -5,113.546 (11,945.290)
## BuurtJulianapark -18,605.740 (24,432.080)
## BuurtK-buurt Midden 44,518.170 (45,769.050)
## BuurtK-buurt Zuidoost -21,640.130 (31,659.530)
## BuurtK-buurt Zuidwest -29,823.320 (61,104.970)
## BuurtKadijken -8,771.264 (14,519.660)
## BuurtKadoelen -2,071.137 (24,521.440)
## BuurtKalverdriehoek -14,518.300 (15,813.240)
## BuurtKantershof -16,542.300 (28,929.380)
## BuurtKattenburg -16,012.720 (16,375.980)
## BuurtKazernebuurt -4,025.522 (18,444.650)
## BuurtKelbergen -18,170.610 (40,227.490)
## BuurtKNSM-eiland -20,531.010 (12,703.520)
## BuurtKolenkitbuurt Noord -2,184.970 (17,600.200)
## BuurtKolenkitbuurt Zuid -16,275.300 (14,718.360)
## BuurtKoningin Wilhelminaplein -8,858.745 (12,762.340)
## BuurtKop Zeedijk 1,422.760 (15,989.520)
## BuurtKop Zuidas -20,550.920 (21,501.400)
## BuurtKortenaerkwartier -3,236.467 (10,942.110)
## BuurtKortvoort -879.992 (35,219.440)
## BuurtKromme Mijdrechtbuurt -11,023.870 (11,643.800)
## BuurtL-buurt -14,341.890 (28,805.100)
## BuurtLaan van Spartaan -17,454.220 (15,192.190)
## BuurtLandelijk gebied Driemond 27,276.040 (43,210.990)
## BuurtLandlust Noord -6,630.044 (13,904.980)
## BuurtLandlust Zuid -4,415.774 (11,877.330)
## BuurtLangestraat e.o. -3,282.268 (15,604.520)
## BuurtLastage 2,393.746 (15,419.450)
## BuurtLegmeerpleinbuurt 50,891.130*** (9,804.688)
## BuurtLeidsebuurt Noordoost -9,783.578 (13,290.460)
## BuurtLeidsebuurt Noordwest -19,168.780 (15,379.050)
## BuurtLeidsebuurt Zuidoost -8,371.650 (15,052.400)
## BuurtLeidsebuurt Zuidwest -18,956.850 (16,507.210)
## BuurtLeidsegracht Noord -1,574.186 (15,331.890)
## BuurtLeidsegracht Zuid -11,818.750 (15,154.090)
## BuurtLeliegracht e.o. -7,836.597 (14,949.540)
## BuurtLinnaeusparkbuurt -12,169.130 (10,568.980)
## BuurtLizzy Ansinghbuurt -11,882.950 (9,845.180)
## BuurtLoenermark -10,132.680 (24,582.830)
## BuurtLootsbuurt 35,308.470*** (9,759.124)
## BuurtLouis Crispijnbuurt -5,319.617 (17,783.320)
## BuurtLucas/Andreasziekenhuis e.o. 4,427.061 (23,867.170)
## BuurtMarathonbuurt Oost -6,973.200 (11,224.020)
## BuurtMarathonbuurt West -21,822.040** (9,374.770)
## BuurtMarcanti -11,195.690 (14,924.330)
## BuurtMarine-Etablissement -3,004.244 (17,214.840)
## BuurtMarjoleinterrein -159.116 (36,598.560)
## BuurtMarkengouw Midden -17,941.570 (20,297.970)
## BuurtMarkengouw Noord 18,851.390 (42,326.270)
## BuurtMarkengouw Zuid -69,229.180 (57,705.610)
## BuurtMarkthallen 2,254.054 (21,793.010)
## BuurtMarnixbuurt Midden 637.679 (17,663.240)
## BuurtMarnixbuurt Noord -3,767.462 (16,981.730)
## BuurtMarnixbuurt Zuid -17,059.970 (16,652.990)
## BuurtMedisch Centrum Slotervaart -18,261.710 (56,504.860)
## BuurtMeer en Oever -17,143.250 (18,775.970)
## BuurtMercatorpark -2,671.739 (22,215.420)
## BuurtMiddelveldsche Akerpolder -23,204.510 (24,277.450)
## BuurtMiddenmeer Noord -9,214.355 (10,949.920)
## BuurtMiddenmeer Zuid -20,081.270** (9,703.122)
## BuurtMinervabuurt Midden -3,504.232 (13,435.620)
## BuurtMinervabuurt Noord 3,887.702 (13,652.830)
## BuurtMinervabuurt Zuid -24,881.880* (12,821.910)
## BuurtMolenwijk -11,028.040 (41,181.700)
## BuurtMuseumplein -9,497.573 (22,389.170)
## BuurtNDSM terrein -7,426.501 (24,835.070)
## BuurtNes e.o. 2,951.705 (15,575.480)
## BuurtNieuw Sloten Noordoost -713.447 (22,717.700)
## BuurtNieuw Sloten Noordwest -9,167.478 (15,097.410)
## BuurtNieuw Sloten Zuidoost -22,787.170 (22,779.590)
## BuurtNieuw Sloten Zuidwest -13,631.110 (19,864.300)
## BuurtNieuwe Diep/Diemerpark -27,212.900 (26,515.340)
## BuurtNieuwe Kerk e.o. -3,790.079 (15,213.340)
## BuurtNieuwe Meer -20,139.540 (57,423.030)
## BuurtNieuwe Oosterbegraafplaats -44,411.240 (40,348.720)
## BuurtNieuwendammerdijk Oost -20,275.170 (20,133.280)
## BuurtNieuwendammerdijk Zuid -7,611.817 (27,907.100)
## BuurtNieuwendammmerdijk West -11,472.800 (15,983.480)
## BuurtNieuwendijk Noord 1,370.748 (17,438.240)
## BuurtNieuwmarkt 15,452.400 (15,049.820)
## BuurtNintemanterrein -9,596.249 (42,317.120)
## BuurtNoorder IJplas -118,894.100 (79,393.810)
## BuurtNoorderstrook Oost -30,367.050 (57,900.040)
## BuurtNoorderstrook West 17,748.550 (43,869.240)
## BuurtNoordoever Sloterplas -14,164.410 (15,565.260)
## BuurtNoordoostkwadrant Indische buurt -20,710.460** (8,977.755)
## BuurtNoordwestkwadrant Indische buurt Noord -13,200.650 (8,108.074)
## BuurtNoordwestkwadrant Indische buurt Zuid -8,457.326 (8,290.103)
## BuurtOlympisch Stadion e.o. -25,243.850 (15,884.360)
## BuurtOokmeer -31,512.250 (27,406.510)
## BuurtOostelijke Handelskade -20,738.130 (18,561.340)
## BuurtOostenburg -91.319 (11,388.220)
## BuurtOosterdokseiland 29,577.270 (24,009.020)
## BuurtOosterpark -26.412 (14,205.260)
## BuurtOosterparkbuurt Noordwest -5,003.169 (8,302.038)
## BuurtOosterparkbuurt Zuidoost -11,494.110 (8,488.890)
## BuurtOosterparkbuurt Zuidwest -2,799.048 (9,523.040)
## BuurtOostoever Sloterplas -12,305.400 (15,657.010)
## BuurtOostpoort -11,959.610 (10,549.310)
## BuurtOostzanerdijk -41,049.390 (30,594.570)
## BuurtOrteliusbuurt Midden -10,060.020 (11,376.470)
## BuurtOrteliusbuurt Noord -9,962.109 (12,475.650)
## BuurtOrteliusbuurt Zuid -4,687.338 (10,694.550)
## BuurtOsdorp Midden Noord -18,347.060 (21,721.090)
## BuurtOsdorp Midden Zuid -3,147.268 (21,444.700)
## BuurtOsdorp Zuidoost -21,676.350 (14,056.950)
## BuurtOsdorper Binnenpolder -45,039.420 (28,189.230)
## BuurtOsdorper Bovenpolder -29,794.400 (35,507.190)
## BuurtOsdorpplein e.o. -23,090.350 (18,664.760)
## BuurtOude Kerk e.o. 18,407.620 (15,306.030)
## BuurtOveramstel -25,971.760 (58,985.110)
## BuurtOverbraker Binnenpolder -29,042.730 (32,035.400)
## BuurtOverhoeks -14,629.690 (23,469.490)
## BuurtOvertoomse Veld Noord -11,258.580 (12,678.850)
## BuurtOvertoomse Veld Zuid -20,831.400 (12,668.760)
## BuurtP.C. Hooftbuurt -13,185.290 (12,812.790)
## BuurtPapaverweg e.o. -5,062.001 (17,738.150)
## BuurtParamariboplein e.o. -8,729.570 (8,514.198)
## BuurtPark de Meer -18,276.300 (17,931.650)
## BuurtPark Haagseweg -21,272.260 (40,875.390)
## BuurtParooldriehoek -14,954.730 (14,114.080)
## BuurtPasseerdersgrachtbuurt -7,261.779 (15,515.660)
## BuurtPieter van der Doesbuurt -7,542.986 (11,684.560)
## BuurtPlan van Gool -4,951.264 (18,981.630)
## BuurtPlanciusbuurt Noord -36,874.130 (23,965.410)
## BuurtPlanciusbuurt Zuid -22,306.930 (42,883.670)
## BuurtPlantage -8,667.484 (13,907.580)
## BuurtPostjeskade e.o. -8,606.942 (8,908.611)
## BuurtPrinses Irenebuurt -21,149.750 (14,979.390)
## BuurtRAI -17,120.780 (22,332.270)
## BuurtRansdorp -22,263.000 (27,201.400)
## BuurtRapenburg -15,534.420 (14,863.240)
## BuurtRechte H-buurt 4,435.680 (31,724.250)
## BuurtReguliersbuurt -12,688.720 (19,003.780)
## BuurtReigersbos Midden 8,493.168 (38,656.990)
## BuurtReigersbos Noord 7,511.086 (37,193.690)
## BuurtReigersbos Zuid -5,288.695 (45,431.380)
## BuurtRembrandtpark Noord -16,221.700 (14,608.340)
## BuurtRembrandtpark Zuid -6,137.339 (12,196.060)
## BuurtRembrandtpleinbuurt 5,741.425 (15,103.520)
## BuurtRI Oost terrein -1,699.868 (13,403.220)
## BuurtRieteiland Oost -34,610.600 (31,954.070)
## BuurtRietlanden -15,122.680 (13,246.880)
## BuurtRijnbuurt Midden -17,828.720 (12,202.000)
## BuurtRijnbuurt Oost -5,093.010 (11,445.190)
## BuurtRijnbuurt West -11,762.060 (16,077.490)
## BuurtRobert Scottbuurt Oost -9,702.014 (13,125.380)
## BuurtRobert Scottbuurt West -7,398.977 (12,843.880)
## BuurtRode Kruisbuurt -12,267.370 (31,189.830)
## BuurtSarphatiparkbuurt -1,488.914 (7,958.067)
## BuurtSarphatistrook -6,433.254 (12,729.460)
## BuurtScheepvaarthuisbuurt -8,545.844 (15,097.800)
## BuurtScheldebuurt Midden -17,424.510 (11,095.550)
## BuurtScheldebuurt Oost -4,894.653 (11,718.780)
## BuurtScheldebuurt West -20,817.630* (10,846.200)
## BuurtSchellingwoude Oost -22,986.360 (15,071.530)
## BuurtSchellingwoude West -21,875.230 (22,580.170)
## BuurtSchinkelbuurt Noord -12,570.720 (7,932.291)
## BuurtSchinkelbuurt Zuid -8,773.009 (10,551.000)
## BuurtSchipluidenbuurt -25,085.440 (29,160.650)
## BuurtScience Park Noord -14,025.700 (14,253.000)
## BuurtScience Park Zuid -8,987.078 (33,534.720)
## BuurtSlotermeer Zuid -8,959.439 (17,293.280)
## BuurtSloterpark -35,670.930 (25,954.150)
## BuurtSloterweg e.o. 15,353.880 (22,989.150)
## BuurtSpaarndammerbuurt Midden -16,651.530 (18,538.160)
## BuurtSpaarndammerbuurt Noordoost -17,295.380 (16,359.230)
## BuurtSpaarndammerbuurt Noordwest -22,250.110 (19,695.480)
## BuurtSpaarndammerbuurt Zuidoost -7,744.012 (16,580.520)
## BuurtSpaarndammerbuurt Zuidwest -12,021.750 (15,891.960)
## BuurtSpiegelbuurt -3,333.827 (13,711.910)
## BuurtSporenburg -21,145.350* (11,679.690)
## BuurtSportpark Middenmeer Noord -4,332.089 (33,349.470)
## BuurtSportpark Middenmeer Zuid -35,627.330 (33,230.850)
## BuurtSportpark Voorland 47,930.460 (56,671.490)
## BuurtSpuistraat Noord -3,873.748 (15,311.280)
## BuurtSpuistraat Zuid 13,044.390 (15,767.000)
## BuurtStaalmanbuurt -8,041.261 (11,277.820)
## BuurtStaatsliedenbuurt Noordoost -23,021.170 (15,112.890)
## BuurtStationsplein e.o. -45,898.790 (57,772.730)
## BuurtSteigereiland Noord -17,613.590 (15,449.510)
## BuurtSteigereiland Zuid -13,763.690 (13,162.780)
## BuurtSurinamepleinbuurt -14,333.230 (10,153.200)
## BuurtSwammerdambuurt -2,817.918 (8,710.319)
## BuurtTeleport -33,036.530 (42,831.450)
## BuurtTerrasdorp -6,277.521 (21,802.840)
## BuurtTransvaalbuurt Oost -2,033.633 (8,653.721)
## BuurtTransvaalbuurt West -12,214.380 (9,507.166)
## BuurtTrompbuurt -6,089.512 (10,929.240)
## BuurtTuindorp Amstelstation 8,716.206 (23,252.370)
## BuurtTuindorp Frankendael -30,133.200** (15,319.610)
## BuurtTuindorp Nieuwendam Oost -4,455.441 (15,999.460)
## BuurtTuindorp Nieuwendam West -4,193.887 (20,137.340)
## BuurtTuindorp Oostzaan Oost -4,388.822 (24,322.970)
## BuurtTuindorp Oostzaan West -28,878.280 (37,226.030)
## BuurtTwiske Oost 37,675.010 (46,980.680)
## BuurtTwiske West -17,394.130 (29,949.710)
## BuurtUilenburg 1,994.829 (15,172.440)
## BuurtUtrechtsebuurt Zuid -1,389.929 (14,016.330)
## BuurtValeriusbuurt Oost 323.068 (12,438.950)
## BuurtValeriusbuurt West -10,380.730 (10,338.540)
## BuurtValkenburg -9,811.787 (15,946.620)
## BuurtVan Brakelkwartier -9,692.878 (14,038.960)
## BuurtVan der Helstpleinbuurt -14,635.590* (8,301.621)
## BuurtVan der Kunbuurt 10,595.920 (24,628.600)
## BuurtVan der Pekbuurt -12,857.410 (14,785.060)
## BuurtVan Loonbuurt -16,207.540 (13,633.780)
## BuurtVan Tuyllbuurt -16,105.840* (9,069.247)
## BuurtVelserpolder West 14,467.650 (22,333.130)
## BuurtVeluwebuurt -24,144.810 (19,833.220)
## BuurtVenserpolder Oost 11,672.310 (20,431.000)
## BuurtVliegenbos 11,228.420 (16,842.880)
## BuurtVogelbuurt Noord -13,637.700 (17,361.950)
## BuurtVogelbuurt Zuid 3,918.417 (13,183.060)
## BuurtVogeltjeswei 20,321.210 (44,905.370)
## BuurtVondelpark Oost -12,525.380 (28,746.920)
## BuurtVondelpark West -7,704.670 (16,260.600)
## BuurtVondelparkbuurt Midden 3,184.597 (10,487.350)
## BuurtVondelparkbuurt Oost -7,768.348 (10,188.400)
## BuurtVondelparkbuurt West -3,428.592 (8,504.418)
## BuurtVU-kwartier -8,255.928 (26,606.350)
## BuurtWalvisbuurt -10,282.530 (32,162.300)
## BuurtWaterloopleinbuurt -995.185 (17,082.000)
## BuurtWeesperbuurt -10,143.520 (12,464.880)
## BuurtWeespertrekvaart -3,613.675 (18,122.740)
## BuurtWeesperzijde Midden/Zuid -11,510.980 (9,416.461)
## BuurtWerengouw Midden -15,261.060 (17,188.640)
## BuurtWerengouw Noord 44.493 (41,870.010)
## BuurtWerengouw Zuid -24,620.540 (19,997.120)
## BuurtWestelijke eilanden -7,478.560 (16,531.650)
## BuurtWesterdokseiland -13,114.810 (15,025.260)
## BuurtWestergasfabriek -1,686.973 (19,624.720)
## BuurtWesterstaatsman -6,785.331 (13,240.610)
## BuurtWestlandgrachtbuurt -14,308.750* (8,310.584)
## BuurtWeteringbuurt -6,079.378 (12,600.200)
## BuurtWG-terrein 2,112.995 (9,052.888)
## BuurtWielingenbuurt -17,555.950 (12,514.190)
## BuurtWildeman -26,263.310 (17,658.100)
## BuurtWillemsparkbuurt Noord 2,716.918 (10,455.200)
## BuurtWillibrordusbuurt -2,460.136 (8,715.516)
## BuurtWittenburg -10,007.930 (12,385.250)
## BuurtWoon- en Groengebied Sloterdijk -18,757.580 (25,720.690)
## BuurtZaagpoortbuurt -8,867.990 (16,795.760)
## BuurtZamenhofstraat e.o. -18,760.870 (57,239.360)
## BuurtZeeburgerdijk Oost -10,445.620 (40,747.330)
## BuurtZeeburgereiland Noordoost 14,361.180 (33,868.850)
## BuurtZeeburgereiland Noordwest -9,004.920 (29,499.000)
## BuurtZeeburgereiland Zuidoost 10,913.910 (56,943.330)
## BuurtZeeburgereiland Zuidwest -19,883.710 (20,474.900)
## BuurtZeeheldenbuurt -13,671.810 (15,397.250)
## BuurtZorgvlied -32,237.310 (34,529.890)
## BuurtZuidas Noord -15,548.150 (29,352.630)
## BuurtZuidas Zuid -5,044.906 (17,472.730)
## BuurtZuiderhof -23,179.270 (56,833.920)
## BuurtZuiderkerkbuurt -3,648.583 (14,898.720)
## BuurtZuidoostkwadrant Indische buurt -14,939.670 (10,191.320)
## BuurtZuidwestkwadrant Indische buurt -10,862.380 (10,233.330)
## BuurtZuidwestkwadrant Osdorp Noord -19,933.180 (20,170.920)
## BuurtZuidwestkwadrant Osdorp Zuid -18,923.320 (11,841.530)
## BuurtZunderdorp -6,302.578 (23,623.350)
## host_is_superhostf 15,894.890 (28,426.430)
## host_is_superhostt 12,785.280 (28,446.140)
## room_typePrivate room -11,802.700*** (1,265.647)
## room_typeShared room -4,815.139 (7,260.233)
## property_typeApartment 43,082.970*** (7,248.714)
## property_typeBarn 28,276.010 (29,530.720)
## property_typeBed and breakfast 33,624.280*** (7,791.096)
## property_typeBoat 50,957.830*** (7,908.296)
## property_typeBoutique hotel 32,061.260** (13,169.310)
## property_typeBungalow 29,639.260 (21,991.320)
## property_typeCabin 35,468.430** (17,122.620)
## property_typeCamper/RV 24,537.630 (43,112.450)
## property_typeCampsite 42,959.310 (42,505.690)
## property_typeCasa particular (Cuba) 48,560.900* (26,281.760)
## property_typeCastle 54,056.800 (56,916.680)
## property_typeChalet 24,922.210 (33,757.540)
## property_typeCondominium 44,622.510*** (7,902.195)
## property_typeCottage 32,902.680* (18,750.250)
## property_typeEarth house 19,470.900 (56,588.910)
## property_typeGuest suite 37,283.640*** (8,643.844)
## property_typeGuesthouse 45,351.830*** (11,713.960)
## property_typeHostel 35,166.730 (30,552.600)
## property_typeHotel 19,776.940 (24,296.300)
## property_typeHouse 42,527.720*** (7,454.560)
## property_typeHouseboat 46,622.500*** (8,233.094)
## property_typeLighthouse 9,220.159 (58,873.520)
## property_typeLoft 43,902.860*** (7,771.848)
## property_typeNature lodge 22,368.520 (60,601.140)
## property_typeOther 38,507.460*** (10,896.820)
## property_typeServiced apartment 18,652.390* (10,220.100)
## property_typeTent 59,151.250 (59,422.320)
## property_typeTiny house 50,278.150* (26,448.660)
## property_typeTownhouse 45,168.620*** (7,616.014)
## property_typeVilla 41,410.970*** (12,796.160)
## bed_typeCouch 2,500.072 (31,781.630)
## bed_typeFuton -852.284 (16,890.650)
## bed_typePull-out Sofa 109.502 (15,404.320)
## bed_typeReal Bed 3,544.437 (14,644.340)
## minimum_nights -98.868*** (32.544)
## dist.museum -0.187 (3.144)
## dist.supermarkets -3.132 (4.352)
## Unescowithin -1,981.182 (9,288.370)
## dist.metro -1.434 (4.236)
## dist.plaza -3.963 (3.786)
## dist.nightclub 4.178 (3.563)
## dist.beach -4.586 (3.441)
## dist.parks
## name.brightnot bright -1,759.521 (1,726.801)
## name.spaciousspacious 4,269.843*** (1,213.611)
## name.luxurynot luxury -3,080.513 (1,989.613)
## amenities.number -40.087 (52.228)
## lagRevenue -0.078*** (0.016)
## Constant 465,051.500 (7,856,334.000)
## -----------------------------------------------------------------------
## Observations 19,980
## R2 0.072
## Adjusted R2 0.046
## Residual Std. Error 55,927.570 (df = 19434)
## F Statistic 2.776*** (df = 545; 19434)
## =======================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
ggplot(annualrev_predict_test,
aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
ggplot(annualrev_predict_test %>%
filter(APE<1.5),
aes(x=APE)) +
labs(title = "APE Distribution",caption = "Figure XX. A histogram of APE") +
geom_histogram()+
plotTheme()
annualrev_predict_test%>%
filter(APE<1.5)%>%
group_by(Buurt) %>%
summarize(mean.APE = mean(APE, na.rm = T)) %>%
ungroup() %>%
left_join(neighbor2,by = "Buurt") %>%
st_sf() %>%
ggplot() +
geom_sf(aes(fill = mean.APE),colour = 'transparent') +
scale_fill_gradient(low = palette5[1], high = palette5[5],
name = "MAPE") +
labs(title = "Mean test set MAPE by Buurt",
subtitle = "2019") +
mapTheme
Compared to approach 1, this approach is more generalizable across space. There are fewer places with high MAPEs of prediction and those listings with high MAPE are also dispersed at the outskirt.
###5.3 Cross Validation (for annual revenue prediction only)
k-folds cross validation
compare the baseline regression to see how much we improved the model
####5.3.1 normal cv
#calculate annual revenue and join it back to detais.sf
annualrevenue.raw <- revenue_panel %>%
group_by(id) %>%
summarise(annual_revenue = sum(revenue)) %>%
dplyr::select(id, annual_revenue)
annualrevenue <- left_join(details.sf,annualrevenue.raw,by = "id")
annualrevenue <- merge(annualrevenue, listing.sf.neighbor2[c("id", "Buurt")], by = "id")
annualrevenue$Buurt <- tidyr::replace_na(annualrevenue$Buurt, "NA")
# use caret package cross-validation method
fitControl <- trainControl(method = "cv",
number = 20,
# savePredictions differs from book
savePredictions = TRUE)
set.seed(856)
# for k-folds CV
#Run Regression using K fold CV
# annual revenue
reg.cv.revenue <-
train(annual_revenue ~ ., data = st_drop_geometry(annualrevenue) %>%
dplyr::select(annual_revenue,beds, bedrooms, bathrooms, accommodates,
pool, parking, kitchen, AC, fireplace,
Buurt,host_is_superhost,
room_type,property_type,bed_type,
minimum_nights,dist.museum,dist.supermarkets,
Unesco,dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
name.bright, name.spacious,name.luxury,
amenities.number
)%>%
na.omit(),
method = "lm",
trControl = fitControl,
na.action = na.pass,)
revenue.cv.MAE <- reg.cv.revenue$results$MAE
revenue.cv.MAESD <- reg.cv.revenue$results$MAESD
revenue.cvtable <- matrix(ncol = 2, c(revenue.cv.MAE, revenue.cv.MAESD), byrow = F)
rownames(revenue.cvtable) <- "Value"
colnames(revenue.cvtable) <- c("MAE", "MAESD")
revenue.cvtable %>%
kable(caption = "Table of MAE & MAESD for k-fold cross-validation (annual revenue)") %>%
kable_styling("striped", full_width = F)
MAE | MAESD | |
---|---|---|
Value | 25666.97 | 1565.045 |
reg.cv.revenue.base <-
train(annual_revenue ~ ., data = st_drop_geometry(annualrevenue) %>%
dplyr::select(annual_revenue,beds, bedrooms, bathrooms, accommodates)%>%
na.omit(),
method = "lm",
trControl = fitControl,
na.action = na.pass)
revenue.cv.MAE <- reg.cv.revenue.base$results$MAE
revenue.cv.MAESD <- reg.cv.revenue.base$results$MAESD
revenue.cvtable <- matrix(ncol = 2, c(revenue.cv.MAE, revenue.cv.MAESD), byrow = F)
rownames(revenue.cvtable) <- "Value"
colnames(revenue.cvtable) <- c("MAE", "MAESD")
revenue.cvtable %>%
kable(caption = "Table of MAE & MAESD for k-fold cross-validation (annual revenue. base)") %>%
kable_styling("striped", full_width = F)
MAE | MAESD | |
---|---|---|
Value | 26091.55 | 1132.515 |
Adding new features helps us lower MAE when predicting annual revenues, but it also increases MAESD as well.
# price
reg.cv.price <-
train(price ~ ., data = st_drop_geometry(annualrevenue) %>%
dplyr::select(price,beds, bedrooms, bathrooms, accommodates,
pool, parking, kitchen, AC, fireplace,
Buurt,host_is_superhost,
room_type,property_type,bed_type,
minimum_nights,dist.museum,dist.supermarkets,
Unesco,dist.metro,dist.plaza, dist.nightclub,
dist.beach, dist.parks,
name.bright, name.spacious,name.luxury,
amenities.number
)%>%
na.omit(),
method = "lm",
trControl = fitControl,
na.action = na.pass)
price.cv.MAE <- reg.cv.price$results$MAE
price.cv.MAESD <- reg.cv.price$results$MAESD
price.cvtable <- matrix(ncol = 2, c(price.cv.MAE, price.cv.MAESD), byrow = F)
rownames(price.cvtable) <- "Value"
colnames(price.cvtable) <- c("MAE", "MAESD")
price.cvtable %>%
kable(caption = "Table of MAE & MAESD for k-fold cross-validation (price)") %>%
kable_styling("striped", full_width = F)
MAE | MAESD | |
---|---|---|
Value | 43.45604 | 2.066383 |
reg.cv.price.base <-
train(price ~ ., data = st_drop_geometry(annualrevenue) %>%
dplyr::select(price,beds, bedrooms, bathrooms, accommodates),
method = "lm",
trControl = fitControl,
na.action = na.pass)
price.cv.MAE <- reg.cv.price.base$results$MAE
price.cv.MAESD <- reg.cv.price.base$results$MAESD
price.cvtable <- matrix(ncol = 2, c(price.cv.MAE, price.cv.MAESD), byrow = F)
rownames(price.cvtable) <- "Value"
colnames(price.cvtable) <- c("MAE", "MAESD")
price.cvtable %>%
kable(caption = "Table of MAE & MAESD for k-fold cross-validation (price.base)") %>%
kable_styling("striped", full_width = F)
MAE | MAESD | |
---|---|---|
Value | 48.94492 | 1.541473 |
Adding new features helps us decrease both MAE and MAESD when predicting prices.
reg.cv.revenue.resample <- reg.cv.revenue$resample
reg.cv.revenue.base.resample <- reg.cv.revenue.base$resample
reg.cv.price.resample <- reg.cv.price$resample
reg.cv.price.base.resample <- reg.cv.price.base$resample
var_list <- list()
var_list[[1]] <- ggplot(reg.cv.revenue.resample, aes(x=MAE)) + geom_histogram(color = "grey30", fill = "#4757A2", bins = 50) +
ylim(0,4)+xlim(22000,30000)+
labs(title="Histogram of Mean Average Error Across 20 Folds, Revenue") +
plotTheme()
var_list[[2]] <- ggplot(reg.cv.revenue.base.resample, aes(x=MAE)) + geom_histogram(color = "grey30", fill = "#4757A2", bins = 50) +
ylim(0,4)+xlim(22000,30000)+
labs(title="Histogram of Mean Average Error Across 20 Folds, Revenue Baseline") +
plotTheme()
var_list[[3]] <- ggplot(reg.cv.price.resample, aes(x=MAE)) + geom_histogram(color = "grey30", fill = "#4757A2", bins = 50) +
ylim(0,4)+xlim(37,56)+
labs(title="Histogram of Mean Average Error Across 20 Folds, Price") +
plotTheme()
var_list[[4]] <- ggplot(reg.cv.price.base.resample, aes(x=MAE)) + geom_histogram(color = "grey30", fill = "#4757A2", bins = 50) +
ylim(0,4)+xlim(37,56)+
labs(title="Histogram of Mean Average Error Across 20 Folds, Price Baseline") +
plotTheme()
do.call(grid.arrange,c(var_list, ncol = 2, top = "Histogram of MAEs"))
Histograms above also prove our conclusions. New regressions (with new features) perform better than baseline as the distributions of MAE move towards lower (left).
The goal of the algorithm is to predict the direct economic income that can be brought back to the community by a new Airbnb lisiting. We also want to inform the residents about the changing occupancy rate along time, letting them know when the visitors will be staying in the neighborhood. Generally speaking, our algorithm succeeded in predicting the revenue with acceptable error around 38%, and the errors mainly happen on the fringe of Amsterdam, where Airbnb density is lower and outlier concentrate. But we’re not predicting the occupancy rate very well, probably because of more subjective data such as rating and comments are not included.
Overall, our algorithm doesn’t perform well enough in accuracy and our prediction on prices is better than occupancy and revenue. That’s mainly because occupancy is hard to predict without previous data (time lag). To predict occupancy, what people can do is to predict the occupancy in next week or next month based on the occupancy this week and continue doing it for a year. This approach seems better than ours because occupancy is strongly related to time, not only space. However, this approach makes no sense in our use case as we have to predict the annual for a new listing. We can neither obtain its previous occupancy, nor predict it month by month. In order to improve our algorithm while ensure it can work in our use case, we suggest trying on the following approaches:
There are quite a few outliers in the data set. Some listings have extremely high prices in certain months with zero occupancy. We excluded some of the outliers but not all of them, they account for some extremely large errors. To further improve our model, we will try to find the commonality of these outliers and get rid of them.
As we mentioned before, occupancy is more volatile than price, and depends less on physical features. Also, price itself can also be influencing the occupancy. Because we didn’t find crime and population data at smaller geography, we didn’t test the generalizablity among different socio-econimic context, which are also likely to influence price and occupancy.
During our research, the relationship between some features and the dependent variable is not linear. We tried to use logarithm or reciprocal to convert the variables but didn’t make much progress. From the cases of Airbnb predictions that we researched on, there are some other regression that performs better then OLS, suchs as XGBoost
and Random Forest
. Maybe by using these regressions, we can also improve our predictions.
amsterdam attractions | http://tour-pedia.org/about/datasets.html