SAP provided this webcast today, giving a background on data science.

/wp-content/uploads/2014/07/1figds_502633.png

Figure 1: Source: SAP

A data scientist uses mathematics and IT to solve business problems, asks the right questions, and use technical tools and programming languages

/wp-content/uploads/2014/07/2fig_502652.png

Figure 2: Source: SAP

How does data science differ from  BI?  The SAP speaker said BI defines standard reporting functionality while data science contains a math component

The maturity model shown in Figure 2 shows data mining, applying math standards to a dataset, algorithms, decisions trees, to find a pattern, to create clusters, or forecast a time series

Modeling comes in using a business process with a causal model, what are the driving factors, invent a math formula, or to use the data to fine-tune parameters

Optimization is looking at deviations, changing safety stocks.

/wp-content/uploads/2014/07/3fig_502653.png

Figure 3: Source: SAP

Figure 3 was a quiz – the numbers are in Euros

One set of numbers is true

The other set of Numbers is made up – invented by person

54% of the attendees thought the left column was false (including me).  Wait until the end for the “final answer”.

/wp-content/uploads/2014/07/4fig_502654.png

Figure 4: Source: SAP

Figure 4 shows a retail example of how customers buy things

Retail generates data, measuring the impacts of sales promotions

Retail produces new products, need to be tested, problem is large # of products fail

Company puts two new products on shelf to sell products, which product has more or less

Figure 4 looks like product A is more successful, and it might leave Product A on shelf and remove B

First new flavor you may buy out of curiosity.  The second effect – eaten it and like/not like, buy again

During the first month , the first effect is stronger, second effect is more important to keep customers in long term

You want to see how people buy product repeatedly to determine success and ask the right questions

/wp-content/uploads/2014/07/5fig_502655.png

Figure 5: Source: SAP

Figure 5 is a supply chain optimization example for a railway, where they manage a large supply chain of spare parts, with a complex set up, with different locations of serving trains, parts available, broken part.

Supply chains are managed in each location, replenishing policy – use spare parts until drop at reorder point and consume over time.   Parameters are involved.

Who says reorder point is where it should be?

Solution looks at simulation in the future, using the historical information – statistical distribution for demand

They then optimize parameter to reduce reorder point so inventory is smaller to enable forecasting

/wp-content/uploads/2014/07/6fig_502656.png

Figure 6: Source: SAP

Next example of newspaper sales in Figure 6 provides forecasting with optimizing

If not send enough, lose sales, but if too much the newspaper  incurs the cost of sending newspapers back

How many newspapers send each day is the model.

Look at history to forecast future sales; add safety stock

As an example, say Shop “B” in Figure 6 is a small shop next to football stadium, gameday sell a lot, others days not.  It needs to take into effect special factors

More precise is the variability of demand

It uses model to optimize papers to print/sell

/wp-content/uploads/2014/07/7fig_502659.png

Figure 7: Source: SAP

Another use case covered was Utilities with sensor data analytics – power utilities – use for processes

“Before data science get the data quality in place” the speaker said.

Data record could have millions of entries – could be incomplete

Use data science to improve data quality:

  • Look at & manually update- labor intensive
  • Define business rules; business experts apply to dataset; takes time
  • Use math algorithms to identify patterns in data

Combine all three approaches to improve data quality


A data science team consists of those with a math background and combines those with technical and visualization (to hide complexity) and the back end use big data.  See http://readwrite.com/2014/07/21/data-scientist-income-skills-jobs

/wp-content/uploads/2014/07/8fig_502657.png

Figure 8: Source: SAP

Figure 8 shows SAP UI5 front end with good user experience with functionality

/wp-content/uploads/2014/07/9fig_502658.png

Figure 9: Source: SAP

Figure 9 shows how often customers buy 2 products at same time, to help promotions (does this mean orange juice is bought together with frozen bread)?

/wp-content/uploads/2014/07/10fig_502661.png

Figure 10: Source: SAP

Figure 10 shows the “Least common denominator”

BI and UI5 with UI5 combining transactions & analytical world, real-time, nice looking graphs with limited effort

SAP Big Data Platform includes HANA, Sybase portfolio

The algorithm side includes different tools with PAL in HANA or SQL algorithms, Java for specific coding

Spare parts simulator was built using Java


How to start a data science project:

Use cases workshop

Proof of concept project

Business case for full solution

Working with both business and IT

Data Science Quiz – Results

/wp-content/uploads/2014/07/11fig_502660.png

Figure 11: Source: SAP

How first number of numbers distributed as shown in Figure 11

When you falsify tax statement, you make the numbers look random

Every digit has the same probability

The speaker said open Wikipedia – look at numbers that describe quantities count – length of wall, write the down – first digit of number, 1 often, 2, often, 3 less,

8 and 9 almost never appear – Benfords’ law

Digit 1 once on right, so the right side is not real

Left side is real

Question & Answer:

Q: Retail example – are there other data points beside repeat purchasing to determine whether products are more popular?

A: visibility of shelf space – not data points used, include them as influencing factors

Systematically test programs

Q: Data mining – use full or sample?

A: It depends on business problem

Q: When talking about data quality, when take out outlier, need to understand where outliers are coming from – how assess?

A: Depends on business problem

Q: Related to data quality, example had corrected sensor data, assumption is you’re not dropping wrong data, then how ensure single source of truth?

A: Sent uncleansed data, used data mining to cleanse, data cleansing proposition, and use field examinations, and compare datasets – where don’t agree, which means a method has failed

Q: How far can you use Hadoop for Data Science and analysis and with SAP HANA?

A: Connect Hadoop with HANA – smart query layer- see Adobe example from ASUG Annual Conference 0404 Adobe’s Story of Integrating Hadoop … | ASUG

Q: How build Data Science skills? Reading list, training?

A: Depends on where start from – math, statistics, high performance computing – look at data mining tools (SAP Predictive Analysis, InfiniteInsight) – really meant to make Data Science for end user, see SAP training

_____________________________________________________________________________________________________________

For more information, SAP TechEd && d-code Las Vegas has 136 “Big Data” sessions and ASUG has 10 Big Data sessions

Monday, October 20th, ASUG will host a hands-on BI session – more to come soon.


ASUG has a Harness the Big Data Monster webcast on August 5 – register here

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply