Lumira Data Access Extension – Box Office Statistics
Less than a month ago, I started using SAP Lumira for data visualization. I was impressed, both with the ease with which I could make beautiful and interesting visualizations from data and with the variety of sources Lumira could draw that data from. Recently, however, I’ve been looking into the support Lumira offers for those data formats that it can’t connect to directly.
For those sources, Lumira offers the ability to create and use Data Access Extensions (DAE), console applications that allow it to connect with types of data sources it normally could not. A DAE is simply a console application that takes input parameters, reads the data from the source in question, formats it as character-separated values (CSV), and prints it to the console. The language and environment used for developing the DAE don’t matter – as long as the end result is an application that Lumira is able to execute which returns the data in the expected format, you can use whatever is most comfortable for you.
In order to learn about DAE, I first followed the basic example in the official SAP Developer Guide, implementing an extension as a Java application to read data from an XML file – Lumira can already accept XML, but this example provided a good, simple introduction. I also read through the excellent blog post of Trevor Dubinsky, where he introduced DAE in an easily understandable way. I recommend both of these resources if you’re just starting out with these extensions.
After learning about DAE and implementing the sample XML case, I wanted to target something closer to a real use case, pulling data from an online source that wouldn’t be available to Lumira without the benefit of DAE. I chose to interpret box office data from the table found at http://boxofficemojo.com/weekend/chart/, since I thought it would be interesting to visualize and I liked the fact that the data would change every week, rather than staying static. Since the data was stored as an HTML table, my extension would need to get the HTML document from the site and parse the data from there.
I chose to implement my DAE in Java using Eclipse as an IDE, so I looked for a library to parse HTML into Java. I found JSoup, and by importing their library and reading the documentation it was relatively simple to connect to the site, pull the HTML document to a Document object, and write a program to parse out the data and write it to console as CSV.
At this point, my code had the input and output required to act as a DAE, but it did not fit the requirement of being an executable application. Natively, Eclipse supports exporting a Java project as a runnable .jar, but the documentation I’ve read specifies that a Lumira DAE must be a .exe. As a work-around, a friend turned me on to the utility JSmooth, which creates native Windows launchers (in the form of .exes) that run .jar files in turn. By wrapping my exported code in this, and placing both files in Lumira’s extension folder, I was able to achieve success. (This is a bit inconvenient, both because of the extra step and the extra file dependency – if anyone knows a better way to implement DAE in Java, please leave a comment or send me a message and I’ll update the guide)
Now that I had a working DAE in Lumira, pulling live data from the HTML on the website, I composed a simple data story. I focused on a pie chart breakdown of the weekend box office gross by title and week number (since it’s worth showing, for example, when a low-grossing movie is doing so because it’s been in theatres successfully for 20 weeks already) and a bubble chart, showing total gross in comparison to budget, with bubble size indicating current weekend gross (and thus correlating with the earning power the movie has left before its run ends). Sadly, the budget data is missing for a fairly large number of entries. I also added a filter panel so that users can compare movies that have been in theatres for similar amounts of time to one another.
Thank you all for reading, and I hope you have found an interest in this exciting feature of SAP Lumira.