Skip to Content
Author's profile photo Former Member

Data Geek Challenge: Analyzing flight delays with SAP Lumira

Update: after several comments and questions, I have created a revised version.

Please find it here and enjoy:

Data Geek Challenge: Analyzing flight delays with SAP Lumira (Revised)


According to the United States Bureau of Transportation Statistics, 643 million passengers travel over 9 million flights every year in the US only. A significant portion of these passengers are impacted by delays and cancellations. For this SAP Data Geek challenge, we will be analyzing flight data from 2003 to 2013 (more than 172,000 records) with SAP Lumira.

Data Sources:

 

Importing data source

First, let’s import the data file into SAP Lumira / Visual Intelligence.

/wp-content/uploads/2013/10/screen_01_302277.png

Now, we need to handle the time dimension.

/wp-content/uploads/2013/10/screen_02_302278.png

/wp-content/uploads/2013/10/screen_03_302391.png

Then, we will convert some columns from Attributes to Measures and rename them.

/wp-content/uploads/2013/10/screen_04b_302399.png

Once this is done for all measures, we need to group them for easier analysis. This is done via formula.

/wp-content/uploads/2013/10/screen_05_302400.png

Now, we can start visualizing these data. First, we need to know if the number of flights, delays and cancelations has evolved over time.

/wp-content/uploads/2013/10/screen_06_302401.png

The percentage of delayed or canceled flights is hard to analyze this way. Let’s create a new formula to calculate these ratios.

/wp-content/uploads/2013/10/screen_07_302407.png

/wp-content/uploads/2013/10/screen_08_302408.png

The result is more clear, but the trend is hard to analyze. Let’s add a running average.

/wp-content/uploads/2013/10/screen_09b_302409.png

/wp-content/uploads/2013/10/screen_10_302410.png

We can now see that on average, the flight delays have increased over the last 10 years. With this information, let’s dig deeper and analyze which type of delay is the highest source of inconvenience for the passengers.

/wp-content/uploads/2013/10/screen_11_302411.png

From the above chart, we can easily see that the Air Carriers are the main source of flights delays in the US from 2003 to 2013.

If we add the Carrier Name dimension to the chart and filter the data by only the Carrier Delay, we get the following chart.

/wp-content/uploads/2013/10/screen_12_302412.png

Notice that American Airlines and American Eagles are both in the top 5 of the worst carriers. Since both are sharing the same hubs, their difficulties may be related to geography. However, the current dataset is missing exact location like latitude and longitude. To do this, let’s add and merge with a list of all airport codes and corresponding GPS location and create a geographical hierarchy.

/wp-content/uploads/2013/10/screen_13_302413.png

/wp-content/uploads/2013/10/screen_14_302414.png

/wp-content/uploads/2013/10/screen_15b_302415.png

/wp-content/uploads/2013/10/screen_16_302416.png

Let’s now use this new information and visualize flight delays by geography, filtered by American Airlines and American Eagles.

/wp-content/uploads/2013/10/screen_17_302417.png

However, when removing the filter and showing all airlines, it seems like the delays are more related to large airports like Chicago, Dallas or Atlanta.

/wp-content/uploads/2013/10/screen_18_302418.png

This analysis doesn’t provide enough information, but brings the idea of analyzing the geography repartition of weather related delays.

/wp-content/uploads/2013/10/screen_19_302419.png

Here again, the data at hand doesn’t provide us enough insight, but when weather comes in mind, seasonality is expected. Let’s redraw this chart using timelines.

/wp-content/uploads/2013/10/screen_20_302420.png

Weather delays are indeed seasonal. However, surprisingly, weather delays cannot only be limited to snow and storms since May, June and July are also very impacted, probably due to thunderstorms and / or hurricane season.

As a conclusion, I want to try and estimate the chances for my American Airlines flight to SAP TechEd in Las Vegas on October 2013 of being delayed or canceled. Since the dataset end on June 2013, I will need to use Predictive Analytics to estimate this.

/wp-content/uploads/2013/10/screen_21b_302421.png

/wp-content/uploads/2013/10/screen_22_302422.png

/wp-content/uploads/2013/10/screen_23b_302424.png

SAP Lumira’s forecast feature calculates that on a flight with American Airlines to Las Vegas in October, the probability of getting delayed is 0.27%. This is of course a pretty rough estimate. With more time and understanding of the dataset, we could improve this calculation by merging the dataset with other data like weather records and forecasts, event schedules that could impact travel (think Christmas, Thanksgiving, large conferences, etc.). Also, notice that the data at hand indicates the arrival airport, but is lacking the information about the departure airport, which probably has an impact on delays and cancellations.

As a conclusion, this short first hands-on test of SAP Lumira shows a very effective solution for the analyzing, visualization and forecast of large datasets. Go ahead and try it for yourself!

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo vijaykumar ijeri
      vijaykumar ijeri

      Very patiently done! Nice work 🙂

      Author's profile photo Former Member
      Former Member

      Appreciate the effort you have put into it 🙂

      ~ Krishna.

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Update: after several comments and questions, I have created a revised version.

      Please find it here and enjoy:

      Data Geek Challenge: Analyzing flight delays with SAP Lumira (Revised)

      Author's profile photo Former Member
      Former Member

      This is very good post for understanding SAP Lumira at first sight. This can be useful in many industries for taking on the go decisions and much worthy 🙂 . Nice job Julien. Appreciate.

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Thanks for your kind words. Don't forget to vote!