Skip to Content
Business Trends
Author's profile photo Thadeus Suzenski

Machine Learning Thursdays: Ask an Expert—Legal Implications for Big Data

Today’s Ask an Expert interview is with Joshua Sun. Joshua is assistant General Counsel, SAP Global Legal, whose practice focuses on product development, new business models, and technology partnering initiatives.  He is based in the SAP Palo Alto, California office.

Ask an Expert

What is your involvement with Machine Learning/Artificial Intelligence?

I’ve recently started to advise the machine learning teams at SAP.  I’ve been actively involved in learning this side of the business and how they consume data.  Prior to working with the machine learning team at SAP, I worked with the SAP Data Network team under Helen Arnold, advising on data use.  The experience I developed working with Helen’s team was useful when it came to working with the machine learning teams as both lines of business are driven by the use of data—although in different ways.

The SAP Data Network seeks to create data products based upon existing data sets. Machine learning teams are typically using data to train machine learning systems meaning that data isn’t the product per se but the insights gathered and the learning derived from the data becomes the product.  My role has primarily been to advise the teams on using data in a manner that is compliant with regulatory laws and helping them to think through the variety of options in utilizing data.

How do you see the market shift going forward?

Currently, many of the machine-learning initiatives involve training machines to perform basic human functions such as seeing, listening, or sorting.  We’re  training these machines to learn to recognize images or text much in the same way a human might be able to do so.  We’re also training machines to understand text or natural language.

Once a machine begins to recognize relevant information, the machine can then start to process the information in basic ways, such as sorting and classifying data, which saves time and effort that a human would have to expend to perform these basic tasks.  As we advance from the basic building blocks, machines will be able to perform more sophisticated tasks, learn to perform tasks more efficiently, or provide insights from rapid analysis of large data sets that humans might have missed.

What hurdles do you foresee from a software/implementation standpoint?

As my involvement is less technical and more legal, the obstacles I see tend to be related to obtaining the rights to use data.  There is a tremendous demand to find data sources to train and improve our machine learning software and it is often difficult to keep up with the demand for data.  Machine learning requires the use of many types of data— both structured and unstructured.

There are two basic considerations before we can use data for machine learning: (1) obtaining the right to use data; and (2) complying with data protection laws. (I tend to focus on the first issue of advising our clients on how to obtain the right to use data while the Data Protection and Privacy team advises product teams on data protection regulations.)

There are many options for obtaining the right to use data.  For example:

  • We may be able to purchase a license to use data
  • We may be able to use publicly available data or open source data sets
  • In some cases, we may have the right to use data on our systems for machine learning
  • Or we may be able to partner with our customers on machine learning projects

We work with our machine learning teams to help them identify potential ways to properly obtain the right to data sets.  Data protection laws are fixed and non-negotiable, so we work with the teams to figure out whether they really need to use regulated data and whether there are options that allow them to do what they need to do without the use of regulated data.  If they need to use regulated data, we advise them on ways to use data in compliance with laws.

What is the most exciting part of AI for SAP and the public?

The area of innovation that I think will be exciting for SAP and the public is also one of the most basic.  The ability of machine learning to perform many of the basic, mundane tasks that people do and free people to do more interesting activities is undoubtedly one of the most interesting.

This also happens to be one of the areas of AI that we believe will come to fruition soon, with wide-ranging impacts and opportunities.

Are there any major regulatory/legal issues on the horizon that you expect to impact the industry?

The European Commission put forth the General Data Protection Regulation (GDPR) in 2012 and approved it in 2016 as a replacement for the Data Protection Directive 95/46/EC.  It will come into force in the Spring of 2018.  SAP has been working to update our data protection agreements and procedures to comply with the GDPR but we don’t believe it will significantly change what we can do with respect to machine learning activities.

In the United States, the Federal Trade Commission has from time to time provided guidance on proper practices for collecting, using and protecting the privacy of consumers, although much of the focus is on proper disclosure, keeping promises and maintaining adequate security, which SAP typically already does.

How have current regulations struggled to keep up with the expansion of the industry?

There are a host of issues that make it difficult to work with regulated data.  One is that every country has different laws and there isn’t uniformity across geographic regions.  As an international company operating globally, SAP needs to comply with a wide variety of regulations.  What is regulated and how it is regulated differs from country to country so we often must find common denominators when we create policies.

Another issue is determining where people are located and whose laws apply.  When the EU first passed Data Protection Directive 95/46/EC in 1995, the world was a very different place.  With the internet and cloud services, particularly when those technologies are used by multinational companies that SAP serves, data flows across geographic regions and you wind up having to navigate many layers of regulation.  Rules tend to be broad and can sometimes create obligations that are not necessarily more protective of individuals.

There may be ways to balance protection of individuals while allowing data to be used for productive purposes.  For example, the new GDPR introduces the concept of “pseudonymization”—a process rendering data neither anonymous nor directly identifying—as a method of complying with the GDPR’s data security requirements.  It also relaxes some requirements when personal data is pseudonymized.

Approaches like this may be helpful to evolving regulation to meet the demands of our changing business landscape.

 Thanks Joshua!

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      thanks for this overview that clearly describes legal requirements. There are further regulatory requirements, e.g. industry specific, that add more to the complexity.

      The question is, how to prevent your organization with the risks and financial losses of not being compliant and still avoiding extensive costs for implementing compliance.

      Very often Regulatory Compliance is an afterthought within projects making it more complex and costly if such requirements are included after project implementation. We all have seen this all too often.

      Getting the right combination of skills into projects, allowing to embedd compliance and regulatory requirements are more than scarce.

      Required are a combination of profound technical knowledge about the Machine Learning system and data structures, compliance/regulatory and legal experience and business know-how to create solutions with embedded compliance. Identifying this combined experience is a challenge but the only way to minimize the risk of being non compliant with legal and regulatory requirement.

      Pseudonymization is a good example and measure to minimize the GDPR risk and allows to still develop and test business processes across on-premise and cloud systems.

      hope to get a good discussion on your article, Joshua.
      thanks Heinrich