I’ve blogged about SAP’s increasing activity in sporting statistics last year and I was pleased to see that SAP was also working with the NBA to produce its new statistics page. The site just went live and I wanted more technical details than was included in the press releases. As always, I tried different angles / approaches to get the information I wanted.
The NBA site provides “fans with interactive access to unlimited amounts of official NBA statistics and analysis”.
Here are some the more interesting aspects of the site:
The database contains every box score of every game played since the league’s inception in 1946. It graphically displays every player’s shooting tendencies. It allows fans to analyze and compare lineup combinations. And for the first time, the N.B.A.’s site includes advanced metrics — like true shooting percentage, usage rate and defensive efficiency — that have been available on other sites for years. [SOURCE]
The engine, powered by SAP’s HANA platform, can process 4.5 quadrillion combinations of data, said Ken DeGennaro, the league’s vice president for information technology [SOURCE]
The main reason SAP Hana was selected was that the NBA wanted to support fast, flexible querying. With the entire stat dataset held in memory on Hana, fans will be able to split, filter and query data as they see fit, Gliedman said. It’s not a huge trove of data, at less than a terabyte, but conventional OLAP cubes would have confined analysis to a limited set of predefined queries.
“You can select any date range, any point within a game, and you can do things like come up with your own definition of ‘clutch shooters,'” he said. “Is that two minutes before the end of the game or five minutes? You decide.” [SOURCE]
Blocks in basketball
For me, different methods to discover additional pieces of information are similar to taking shots in basketball. Some approaches are more successful than others. Sometimes you are too far away from the hoop. Other shots are blocked by opponents. In this blog, I want to include my less successful attempts as well – the shots that were blocked.
What is a block?
In basketball, a block (short for blocked shot), not to be confused with blocking, occurs when a defensive player legally deflects a field goal attempt from an offensive player. The defender must not touch the offensive player’s hands or otherwise a foul is called. In order to be legal, the block must occur while the shot is traveling upward or at its apex.. [SOURCE]
Here is a picture of a typical block.
Block #1: Web Site analysis
As I mentioned above, SAP has worked with various sports leagues on statistical apps. To understand such apps, I usually look under the covers and look at the HTML page sources. Web-based apps are usually preferable for such exercises since forensic research is easier in this environment. Sometimes I get lucky and discover something interesting.
I thought maybe NBA might be using the web-server-related features of HANA XS as its foundation, an analysis of the HTTP headers, however, showed that it appears that ASP.NET and IIS 7.5 are being used server-side.
HANA can be used as a data source from ASP. NET so I assume that it is the architecture.
Block #2: Data Streams
I decided to take a look at the data traffic traveling from server to client. Maybe, there were some interesting clues there.
A quick look at the returned data showed no OData or other typical HANA elements.
What was interesting is that the query to get the data demonstrates the amount of flexibility provided to end-users as a result of HANA. I’m waiting for the first mash-up that uses this API to create an NBA-related gambling site.
I then realized that this analysis on the client / browser was naive. Due to security concerns (“provide no details on internal technical architectures”, etc), the developer would probably never include any information on the internal use of HANA.
Note: After realizing that I would probably find no concrete evidence to help me understand the use of HANA in the app, I decided to return to the more productive realm of assumptions based on analysis of press reports / articles.
The NBA data in the HANA database
Once I accepted it as a given that the HANA database was being used on the site, I wanted to explore other interesting aspects of the HANA usage. For example, I was curious as to how the sporting data was being analyzed in the database.
I found an intriguing clue:
So how does this all work? The stats database is built on top of the official NBA play-by-play data that is tracked courtside during every game — this includes every point, rebound, assist, steal, block, turnover, missed shot, foul and substitution, the point in the game when each one these events occurred and which players were on the floor when it happened. [SOURCE]
Here is what NBA play-by-play data looks like:
This is the data that is available on the NBA stats site. The actual data used to populate the database might be even richer in details.
This description of using play-by-play data rang a bell and reminded me of a recent SCN blog about a similar app for the NFL called “SAP NFL Media Center Analytics”
The NFL provided data in the form of playbooks that we used as a basis to create a play-by-play dataset. The playbooks are pretty much a verbal description of each game, which we analyzed and parsed into a fully defined, detailed dataset.
For the showcase historical data has been used. The original concept included real-time data from the Super Bowl. But since the application was mainly used in the media center in the week leading up to the game, this feature was not implemented. [SOURCE]
Thus, it appeared that both apps used what I might call non-traditional data sources.
How might this data populate the HANA database? A recent blog on HANA text analysis gives a possible explanation:
But from my perspective, one of the coolest new features in SPS05 is Text Analysis. The main goal of this new feature is to extract meaningful information from texts. In other words, companies can now process big volumes of data sources and extract meaningful information without having to read every single sentence. Now, what is meaningful information? The extraction process will identify the “who”, “what”, “where”, “when” and “how much” (among other things) from unstructured data and this way will enable you to enrich your structure data.
Now, I don’t know if this text analysis technology is really used on the NBA site but the visible bread crumbs lead us in this direction.
The shots I took looking for concrete evidence about HANA usage on the NBA site were all blocked. The shot I took that was based on an unhealthy assumption is still up in the air – I don’t know if it will be a swish or whether I have missed the backboard completely. The most important to remember is that a blocked shot doesn’t mean the end of the game – it means you need a new strategy to get past your opponent. Although some might consider a detailed analysis of press articles a waste of time, it is only when you combine the puzzle pieces from various sources that a picture often appears.