“Hotel-Pick-Up” Model in Forecasting Ticket Demand

Data Set

airline ticket booking data

  • training set : 90 flights over four weeks before the denaturing date.

  • validation set : 7 flights over four weeks before the denaturing date.

  • Data size: Small

  • Tuples(rows): Training – 5185 ; Validation – 203

  • Observations(records): 16570

  • Raw data format: csv

Programming Tool

  • Python (packages: Pandas, Numpy)

Graphing Tool

  • R (packages: ggplot2)


Will the "Hotel-Pick-Up" model improve the accuracy in forecasting airline booking demand?


The "Hotel-Pick-Up" model will improve the accuracy of forecasting the final demand by 33.4% against the "Naïve" model. The final MASE number is 0.6667693735856192.


Forecasted final ticket demand for a flight four weeks before the departure date

airline ticket demand


Forecast Model : Python (Functional Programming)

Functions’ Structure


import pandas as pd
import numpy as np

def airlineForecast(trainingDataFileName,validationDataFileName):
    # Function_1: [inpt2df]: convert the raw data to dataframe
    # (1)convert the csv file to pandas dataframe
    def inpt2df(fileName):
        df = pd.read_csv(fileName)
        return df

    # Function_2: [cal_day2go_weekday]: calculate the prior day and weekday
    # (1) convert the departure date and booking date to datatime 
    # (2) calculate the weekday of departure day, attach the result to the new column of "weekday"
    # (3) calculate the booking day prior to the departure day, attach the result to the new columns of "prior_day"
    def cal_day2go_weekday(df, col_departure_date = 'departure_date', col_booking_date = 'booking_date'):
        df[col_departure_date] = pd.to_datetime(df[col_departure_date])
        df[col_booking_date] = pd.to_datetime(df[col_booking_date])
        df['day_of_week'] = df[col_departure_date].dt.weekday_name
        df['prior_day'] = df[col_departure_date] - df[col_booking_date]
        df['prior_day'] = df['prior_day'].dt.days
        return df

    # Function_3: [final_ticket_sold]: calculate the final ticket sold for each departure flight
    # (1) subset the dataframe for the final booking number in the departure day 
    # (2) remain the departure day and final booking number
    # (3) rename the "cum_bookings" to the "final_demands"
    def final_ticket_sold(df, col_departure_date = 'departure_date', col_booking_date = 'booking_date'):
        df = df[df["prior_day"] == 0]
        df = df.drop([col_booking_date, "day_of_week", "prior_day"], axis = 1)
        df = df.rename(columns = {"cum_bookings":"final_demands"})
        return df

    # Funtion_4: [rm_departure_day]: remove the records of departure day
    # (1) remove the booking record departure day from the dataframe

    def rm_departure_day(df):
        df = df[df["prior_day"] != 0]
        return df

    # Funtion_5: [addi_model]: prceed the additive method in each record
    # (1) add a new column of "addi_param" with the results of final_demands minus the cum_bookings

    def addi_model(df):
        df["addi_param"] = df["final_demands"] - df["cum_bookings"]
        return df

    # Funtion_6: [multi_model]: prceed the multilplictive method in each record
    # (1) add a new column of "multi_param" with the results of final_demands minus the cum_bookings

    def multi_model(df):
        df["multi_param"] = df["cum_bookings"]/df["final_demands"]
        return df

    # Funtion_7: [final_model]: group the dataframe by "prior_day" and "day_of_week" calculate the final parameters for additive and multiplicative method
    # (1) drop all unnecessary columns
    # (2) group the dataframe by "prior_day" and "day_of_week", average the "addi_param" and "multi_param" by group
    # (3) return the dataframe only contain the necessary columns

    def final_model(df):
        df = df.drop(["departure_date", "booking_date" ,"cum_bookings", "final_demands"], axis = 1)
        df = df.groupby(["day_of_week" , "prior_day"]).mean()
        df = df.reset_index()
        return df

    # Funtion_8: [forcast_model]: feed in the validation dataframe and build model, add two colums of forcast based on the two methods

    def forecast_model(df, model):
         df = pd.merge(df, model, how = "left", left_on = ["day_of_week", "prior_day"], right_on = ["day_of_week", "prior_day"])
         df["addi_forecast"] = df["cum_bookings"] + df["addi_param"]
         df["multi_forecast"] = df["cum_bookings"]/df["multi_param"]
         return df
    # Funtion_9: [forecast_model_plus]: based on the week day of departure day, the forcase will use different method
    def forecast_model_plus(df, lyst4addi = ["Monday", "Tuesday", "Wednesday", "Thursday"], lyst4multi = ["Friday", "Saturday", "Sunday"]):
        df_addi = df[df["day_of_week"].isin(lyst4addi)]
        df_addi = df_addi.drop(["multi_forecast", "multi_param", "addi_param", "prior_day", "day_of_week", "naive_forecast", "final_demand", "cum_bookings", "booking_date", "departure_date"], axis=1)
        df_addi = df_addi.rename(columns = {"addi_forecast":"complex_forecast"})
        df_multi = df[df["day_of_week"].isin(lyst4multi)]
        df_multi = df_multi.drop(["addi_forecast", "multi_param", "addi_param", "prior_day", "day_of_week", "naive_forecast", "final_demand", "cum_bookings", "booking_date", "departure_date"], axis=1)
        df_multi = df_multi.rename(columns = {"multi_forecast":"complex_forecast"})
        new_df = df_addi.append(df_multi)
        df = df.merge(new_df, how = "left", left_index = True, right_index = True)
        return df

    # Function_10: [mase] feed in the validation model and indicate which result will use in the maze calculation
    def mase(df, forcastModel):
        benchmark = pd.Series(abs(df["final_demand"]-df["naive_forecast"]))
        current_model = pd.Series(abs(df["final_demand"]-df[forcastModel]))
        mase_number = current_model.sum()/benchmark.sum()
        return mase_number
    train_df = inpt2df(trainingDataFileName)
    valid_df = inpt2df(validationDataFileName)
    train_df = cal_day2go_weekday(train_df)
    valid_df = cal_day2go_weekday(valid_df)    
    final_demands = final_ticket_sold(train_df)
    train_df = train_df.merge(final_demands, how="left", left_on= "departure_date", right_on="departure_date")
    train_df = rm_departure_day(train_df)
    train_df = addi_model(train_df)
    train_df = multi_model(train_df)
    final_model = final_model(train_df)
    forecast_1 = forecast_model(valid_df,final_model)
    forecast_1 = forecast_model_plus(forecast_1)
    forecast_output = forecast_1[["departure_date","booking_date","complex_forecast"]]
    mase = mase(forecast_1,"complex_forecast") #complex_forecast, multi_forecast, addi_forecast
    # print("MASE:" + str(mase))
    # print(forecast_output)
    final_list = ["MASE: " + str(mase), forecast_output]
    return final_list

Diagram : R (ggplot2)

df <- read_csv("C:/Users/shuai/Desktop/forecast - Copy.csv")
df <- gather(df, "method" , "forecast" , -booking_date)

ggplot(data = df, aes(x = booking_date , y = forecast , colour = method , group = method)) + 
  geom_line() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("Booking Date") + 
  ylab("Final Demand Forecast")


Fill in your details below or click an icon to log in:

WordPress.com 徽标

You are commenting using your WordPress.com account. Log Out /  更改 )

Google photo

You are commenting using your Google account. Log Out /  更改 )

Twitter picture

You are commenting using your Twitter account. Log Out /  更改 )

Facebook photo

You are commenting using your Facebook account. Log Out /  更改 )

Connecting to %s