I had left a comment a while back on Tammy Powlas great document here: Forecasting Using Time Series Analysis – SAP Predictive Expert Analytics.

I thought I would use one spare hour to “drink my own champagne” and give it a try on the same example with Automated Analytics.

I collected the data at the exact same source – from the Bureau of Transportation Statistics (http://www.rita.dot.gov/bts/acts) and I downloaded US domestic passenger data by year/month from January 1996 to June 2015.

My goals:

  • see if I can get a reliable forecasting model.
  • see if I can predict the volume of passenger traffic up to December 2017.

I open SAP Predictive Analytics and jump to the Modeler/Time series section.

1st Step.png

First I load my very simple .csv file and specify the date format.

2nd Step.PNG

I need to tell the software that my date column gives the order of the data set (Order =1 for Month variable).

I can give a quick look to the data – number of passenger enplanements by month.

3rd Step.PNG

I set my two variables:

  • Month variable is the Time
  • my Target variable is Passengers Enplanements.

4th Step.PNG

I will generate my model and ask for 30 forecasts to get the predictions until December 2017.

5th Step.PNG

The performance of the model is very good.

98% of the signal is explained and only 2% of the signal cannot be explained, we will understand why later.

The model has a polynomial trend and cycles based on the month of the year.

As Tammy explained, there is more traffic in the summer months due to the summer vacations.

Model Performance.PNG

I can see the signal in the View Signal Components window. Real figures appear in green. The ones that are forecasted by my model appear in blue.

We see that we have an outlier here (red rectangle), the model does not really understand what happened in September 2001 (while we do understand and remember).

The rest of the signal is pretty well modeled, although we visually spot some differences in the years 2004 to 2009 with real traffic being higher to the modeled values. Is this due to higher pre-crisis airline traffic?

7th Step.PNG

We can visually get the uncertainty around the 2015 to 2017 forecasts.

8th Step.PNG

I personally prefer exporting to Excel and seeing the forecasted values there. The projected passenger traffic for the month of December ’17 is 55,26 million, the upper bound of the forecasted interval is 57,94 million and the lower bound of the interval is 52,58 million. July 2017 might be the peak with close to 63 million passengers.

We could certainly further improve the accuracy and reliability forecast taking into account more variables beyond the sole date. This is a core capability that is provided by the time series algorithm of Automated Analytics.

9th Step.PNG

Thanks for reading the document, I hope you enjoyed it!

Credits are due to Tammy Powlas for the original document idea.

I will be happy to share the data set with anyone interested by this example. You can also download it directly from the link at the beginning of this article.


To report this post you need to login first.


You must be Logged on to comment or reply to a post.

      1. Erik MARCADE

        Chicken! You could compare the results of the two techniques 🙂 .

        I think that this post should be complemented by another one that includes extra predictor variables that can handleld automatically by Automated (as you rightly pointed out) such as the average gas price, why not airline traffic as you suggested, or any other US indicator we can find out (number of cars sold, etc…)

        1. Antoine CHABERT Post author

          Thanks for the comment Erik!

          I’ll release a upgraded or complementary version when time allows, here I just literally used one spare hour 😉 .

          Economical or energy-related factors in particular sounds like nice tracks to follow for adding extra-predictor variables and further improving the accuracy of the forecasts.


Leave a Reply