The Big Data stories behind India’s Blackout
The recent blackout of India offers a multitude of angles for commentary and my colleague Phillip Vaughan took it as an opportunity to both highlight the global issue of a shifting in-balance between supply and demand with demand outstripping the ability of the infrastructure to meet peak needs and to advocate for accelerated improvements in energy efficiency. I want to explore the energy efficiency part a bit more and offer two Big Data storylines for discussion purposes.
Energy efficiency typically implies that a consumer finds a way to use less energy to accomplish substantially the same outcome as before. Replacing old hardware such as an air conditioning unit or a refrigerator with modern equipment is a good example of “same outcome, less input”. The challenge for utilities is to identify those customers with inefficient equipment and/or those whose behavior suggests inefficiency. Other demand-side management programs such as Demand Response and Time-of-Use pricing schemes provide monetary incentives to customers to shift load away from peak demand period. The baseline for these programs is the metered consumption of a customer. What about the consumption that goes un-metered? What about those end-users who are not actual “customers” of a utility?
The blackout in India has quickly been linked to the theft of energy which according to a 2009 report by the World Bank titled “Reducing Technical and Non-Technical Losses in the Power Sector” account for more than 30% of the generated electricity. According to the World Bank report high losses of energy is a problem shared by many emerging markets. But what about markets in North America or Europe? It is acknowledged as a serious financial issues resulting in an estimated $6 billion annual loss. The implied loss as a percentage of revenue is 1-3% which puts the North American utility industry on par with the theft (“shrink”) incurred by North American retailers.
At SAP, we are working with Choice Technologies to provide the market a leading-edge revenue assurance solution aimed at curbing consumer-level energy theft. This is the first Big Data storyline! Detecting energy theft with high confidence requires the application of logic to a combination of meter data and customer master data – including additional data points such as weather information and socio-demographic information improves the data correlation options. The number of fraud features and their combinations often result in 100s of individual patterns that need to be executed sequentially against the data set. Even in a non-smart-meter world, the data volume can be substantial. A recent internal benchmark for a pattern with 5 fraud features yielded a pattern execution time for 4 million customers with 5 years of data of 1h 19m on a traditional RDBMS. Considering only mutation of this one pattern where the analyst would want to change the parameters by say 5% for 2 of the 5 features, would result in a total of 400 variant pattern possibilities. Using the same benchmark, this would require 22 straight days of processing in a RDBMS to execute all of them. Executing the same patterns in SAP HANA resulted in 4.24 seconds and 28 minutes of execution times, respectively. The processing speed of an in-memory database such as SAP HANA enables iterative queries by business analysts which will lead to the faster optimization of patterns and the development of new patterns which in turn will lead to more effective revenue assurance.
The second Big Data storyline also relates to energy loss, but focuses on the distribution grid and not on end-consumers. Utilities today know what goes into the distribution grid and they know the amount they issue bills for. The difference is some type of loss. We just discussed the loss incurred by known end-customers; but how do utilities know where they are incurring technical losses, e.g. line losses, step-down losses, as well as sophisticated energy theft from people tapping directly into higher voltage distribution lines? The roll-out of smart metering devices to both end-consumers and to various points on the distribution grid opens up the ability to aggregate-and-balance the distribution grid at high frequencies. The processing speed of SAP HANA plays very well here. However, of even greater interest to our customers is that SAP HANA has embedded statistical capabilities, e.g. R-libraries and PAL libraries, that enables statistical models to be published directly into the database script vs. having to move data from the database to a server-client system where data sets are built. SAP HANA allows for the operationalizing of statistical models and that is game-changing; coupled with the processing prowess this makes for a compelling Big Data storyline.
What distribution-grid algorithms should SAP embed in SAP HANA as part of a standard solution offering?