This blog describes issues that could occur when installing, configuring or running Native Spark Modeling in SAP Business Objects Predictive Analytics.  It explains the root causes of those issues and if possible provides solutions or workarounds.

The official SAP Business Objects Predictive Analytics documentation contains full details for the connection configuration in the “Connecting to your Database Management System” guides and can be found at SAP Predictive Analytics – SAP Help Portal Page .

There is also a very useful blog post on configuration of Native Spark Modeling with screenshots of each step – Configure Native Spark Modeling in SAP BusinessObjects Predictive Analytics 3.0 .

What is Native Spark Modeling?

Native Spark Modeling builds the Automated predictive models by leveraging the combined data store and processing power of Apache Spark and Hadoop.

Native Spark Modeling was introduced from Predictive Analytics 2.5.  The concept is also sometimes called in-database processing or modeling.  Note that both Data Manager and Model Apply (scoring) already support in-database functionality. For more details on Native Spark Modeling have a look at :

Big Data : Native Spark Modeling in SAP Predictive Analytics 2.5

Table of Contents

How to..

  • How to find the logs.
    • by default the Spark logs are written to the native_spark_log.log file. The location depends on the operating system.
    • Workstation/desktop on Windows – logs by default are in %TEMP%
    • In server mode the logs are written to the tmp folder under the installation

Troubleshooting

Configuration

Issue

Native Spark Modeling does not start.  For example you should see the “Negotiating resource allocation with YARN” progress message in the Desktop client when Native Spark Modeling is configured correctly.

NegotiatingResourceAllocationWithYARN.png

Solution

Check you have Native Spark Modeling checkbox enabled in the preferences (under Preferences -> Model Training Delegation).

PreferencesModelTrainingDelegation.png

Check you have at least the minimum properties in the configuration files (hadoopConfigDir and hadoopUserName entries in the SparkConnections.ini file for the Hive DSN and the Hadoop client XML files in the folder referenced by hadoopConfigDir property).

 

 

Issue

SparkConnections.ini file has limited support for full path names with spaces on Windows.

Solution

Prefer relative paths instead.

e.g. for a ODBC DSN called MY_HIVE_DSN use the following relative path instead of the full path for the hadoopConfigDir parameter

SparkConnection.MY_HIVE_DSN.hadoopConfigDir=../../../SparkConnector/hadoopConfig/MY_HIVE_DSN

Issue

Error message includes “Connection specific Hadoop config folder doesn’t exist”. (error from Spark)

Solution

Check that the SparkConnections.ini file contains a valid path to the configuration folder.

Issue

Error message contains “For Input String”.  For example “Unexpected Java internal error…For Input String “5s””. (error from Spark)

Solution

Check the hive-site.xml file for the DSN and remove the property that is causing the issue.

For example, open hive-site.xml, search for the string “5s” as mentioned in the error message and remove that property.

Issue

Error message “JNI doesn’t find class”.

Solution

This can be a JNI (Java Native Interface) classpath issue.  Restarting the desktop client normally fixes it.  Otherwise check the classpath settings in the KJWizardJni.ini file are referring to the correct jar files.

Issue

Error message “The dataset default.IDBM_xxxxx cannot be dropped”.

Solution

If your training was successful but the error comes as one of the final steps in generation then this is due to permission required for deletion of intermediate tables (table name prefixed with ‘IDBM’). You need to make sure the ODBC user has sufficient rights to delete or update the tables. You atleast need to make sure the user belongs to ‘hive’ and ‘spark’ groups in Hadoop.

 

Monitoring and Logging

Issue

The Spark logs in the native_spark_log.log file are missing any entries.

Solution #1

The property relating to logging configuration in the Spark.cfg file could be wrong.

To correct it, edit the Spark.cfg file by adding “File” to the end of the property.

   Spark.cfg before fix

Spark.log4jConfiguration=”../../../SparkConnector/log4j.properties”

   Spark.cfg after fix

Spark.log4jConfigurationFile=”../../../SparkConnector/log4j.properties”

Solution #2

This can also happen when there are other issues with configuration, particularly the hive-site.xml or core-site.xml files.

The hive-site.xml should only contain one property, “hive.metastore.uris”. For example

hive.metastore.uris

thrift://your_thriftserver_hostname:9083

Issue

The Spark logs in the native_spark_log.log file can be limited.

Solution

The logging level can be increased by modifying the log4j.properties file (under the SparkConnector directory). For example, change the log4j.rootLogger level from ERROR to INFO to show more logging information for Spark.

log4j.rootLogger=INFO,file

Also refer to the logs on the Hadoop cluster for additional logging and troubleshooting information.

For example use the YARN Resource Manager web UI to monitor the Spark and Hive logs to help troubleshoot Hadoop specific issues.  The Resource Manager web UI URL is normally

http://your_resourcemanager_hostname:8088/cluster/apps

Support for Multiple Spark versions

Issue

There is a restriction that one spark version (jar file) can be used at one time with Native Spark Modeling.

HortonWorks HDP and Cloudera CDH are running Spark 1.4.1 and Spark 1.5.0 respectively.

Solution

It is possible to switch the configuration to one or the other spark version as appropriate before modeling.

See the “Connecting to your Database Management System” guide in the official documentation (SAP Predictive Analytics – SAP Help Portal Page) for more information on switching between cluster types.

Please restart the Server or Desktop after making this change.

 

Training Data Content Advice

Issue

There is a limitation that the training data set content cannot contain commas in the data values. For example a field containing a value “Dublin, Ireland”.

Solution

Pre-process the data to cleanse commas from the data or disable Native Spark Modeling for such data sets.

Also be careful when creating a table in Hive that the data does not contain a header row with the column names.  The Hive “create table” statement will include the header information as a data row.

KxIndex Inclusion

Issue

Crash occurs when including KxIndex as an input variable.  By default the KxIndex variable is added by Automated Analytics to the training data set description but it is normally an excluded variable.  There is a limitation that the KxIndex column cannot be included in the included variable list with Native Spark Modeling.

Solution

Exclude the KxIndex variable (this is the default behaviour).

HadoopConfigDir Subfolder Creation

Issue

The configuration property HadoopConfigDir in Spark.cfg by default uses the temporary directory of the operating system.

This property is used to specify where to copy the Hadoop client configuration XML files (hive-site.xml, yarn-site.xml and core-site.xml).

If this is changed to use a subdirectory (e.g. \tmp\PA_HADOOP_FILES) it is possible to get a race condition that causes the files to be copied before the subdirectory is created.

Solution

Manually create the subdirectory.

 

OutOfMemoryError and Memory Configuration Tuning (Desktop only)

Issue

Training on the Desktop fails with an error message “java.lang.OutOfMemoryError: Java heap space”.

The issue is caused by the Automated Desktop user interface sharing the same Java (JVM) process memory with the Spark connection component (Spark Driver).  Also the default “out of the box” configuration is set to use a relatively low amount of memory.

Solution

Modify the configuration parameters to increase the memory for the Desktop user interface and the Spark Driver.

The KJWizardJni.ini configuration file contains the total memory available to the Automated Desktop user interface and SparkDriver.

The Spark.cfg configuration file contains the optional property DriverMemory.  This should be configured to be approximately 25% less than the DriverMemory property.

The SparkConnections.ini configuration file can be further configured to tune the Spark memory.

Please restart the Desktop client after making configuration changes.

e.g. example Automated Desktop memory and Spark configuration settings

In Spark.cfg

Spark.DriverMemory=6144

In KJWizardJni.ini

vmarg.1=-Xmx8096m

In SparkConnections.ini

SparkConnection.MY_HIVE_DSN.native.”spark.driver.maxResultSize”=”4g”

 

Spark/YARN Connectivity

Issue

Virtual Private Network (VPN) connection issue (mainly Desktop).

Native Spark Modeling uses YARN for the connection to the Hadoop cluster.  There is a limitation that the connectivity does not work over VPN.

Solution

Chose one of the following solutions –

  • revert to non-VPN connection if possible e.g. connect to a terminal/Virtual Machine that can connect to the cluster on the same network
  • switch from Desktop to Client-Server installation.  It is recommended to install Automated Server on an edge node/jumpbox co-located with the cluster for best performance

 

Issue

Single SparkContext issue (Desktop only).

A SparkContext is the main entry point for Spark functionality.  There is a known limitation in Spark that there can be only one SparkContext per JVM.

For more information see https://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/SparkContext.html

This issue may appear when a connection to Spark cannot be created correctly (e.g. due to a configuration issue) and subsequently the SparkContext cannot be restarted.  This is an issue that only affects the Desktop installation.

Solution

Restart the Desktop client.

Issue

Get error message

Unexpected Spark internal error.  Error detail: Cannot call methods on a stopped SparkContext

Solution

Troubleshoot by looking in the diagnostic messages or logs on the cluster (for example using the web UI).

One possible cause is committing too many CPU resources in the SparkConnections.ini configuration file.

Example of Hadoop web UI error diagnostics showing over commit of resources :

OvercommitResources.png

SparkConnections.ini file content with too many cores specified:

OvercommitResourcesSparkConnections.png

Issue
Get error message : “Unable to connect to Spark Cluster after waiting for 100 seconds. Check that the Yarn master ports are open”
or  in the logs :  IDBM_SPARK_CONTEXT_CREATION_TIMEOUT

Solution
This issue occurs because Native Spark Modeling can not reach Yarn resource manager.
Hence make sure xmls files are up to date and that yarn-site.xml points to correct Yarn resource manager link. XML files are downloaded from cluster and placed under “hadoopConfig” folder; during run time Native Spark Modeling process also keeps it under “%TMP%” directory on OS.

Path to default %TMP% directory is specified in KxJWizardjni.ini and in Spark.cfg file with classpath variable. You can either leave the default path in these configuration files or change to folder where you intent to keep these xml files temporarily during the processing of Native Spark Modeling.

Issue
Get error message : “Lost task 4.3 in stage 0.0 ”

Or in the native_spark_log.log : “org.apache.spark.SparkException: Job aborted due to stage failure:Lost task 4.3 in stage 0.0 java.io.InvalidClassException: org.apache.spark.rdd.MapPartitionsRDD; local class incompatible: stream classdesc serialVersionUID = -1059539896677275380, local class serialVersionUID = 6732270565076291202”

Solution
The probable reason for this error is version mismatch for spark on client and on cluster. So for example if you have “spark-1.6.1-bin-hadoop2.6” i.e. version 1.6.1 of spark under the jars folder in Automated Analytics installation where as if installed spark version is 1.6.0 on cluster then the mentioned error would come.
Make sure you have same version for spark jars on client that match to the one on cluster. As per above example , instead download and keep “spark-1.6.0-bin-hadoop2.6” under the jars folder and specify the same spark assembly jar version on HDFS in “SparkConnections.ini” file .

Hive

Issue

Hive on Tez execution memory issue

Scope HortonWorks clusters only (with Hive on Tez) and Data Manager functionality

HortonWorks HDP uses Hive on Tez to greatly improve SQL execution performance.  The SQL generated by the Data Manager functionality for Analytical Data Sets (ADS) can be complicated.  There is a possibility the Tez engine will run out of memory with default settings.

Solution

Increase the memory available to Tez through the Ambari web administrator console.

Go to the tez-configs under Hive and change setting tez.counters.max to 16000.  It is also recommended to increase the tez.task.resource.memory.mb setting.  It is necessary to restart the Hive and Tez services after this change. If this still does not work it is possible to switch the execution engine to Map Reduce again through Ambari.

Issue

It is possible to set the database name in the ODBC Hive driver connection configuration.  For example, instead of using the “default” database, it is possible to configure a different database in the ODBC Administrator dialog on Windows or the ODBC connection file for the UNIX operating system.

Native Spark Modeling requires the default database for the Hive connection.

Solution

Keep the database setting to default for the Hive DSN connection.  It is still possible to use a Hive table/view in a different database to default.

HiveDriverDatabaseSetting.png

Data Manager

Issue

Time-stamped population with user-defined target field is not contained in a Temporal ADS (Analytical Data Set). i.e. when you train your model using Data Manager with “Time-stamped Population” having a target variable, your target variable may not be visible in the list of variables in the modeler.

Solution

If you want to include the target field you can either have it as part of the original data set or define a variable (with relevant equation) in the “Analytical Record” instead.

Metadata Repository

Issue

The metadata repository cannot be in Hive.  Also output results cannot be written directly into Hive from In-database Apply (model Scoring) or Model Manager.

Solution

Write the results to local filesystem instead.

High Availability

Starting release 3.1, Native Spark Modeling supports HA option on Hadoop clusters. This means that you need additional hdfs-site.xml which points to NameSerivce node address in case of HA.

An issue might occur in Native Spark Modeling if HA was previously not enabled but have been enabled and you intend to use Native Spark Modeling with HA settings.

Issue
You get error message “java.lang.IllegalArgumentException: Wrong FS: hdfs:///<xxxxx>:8020/user/hive/warehouse/<idbm_xxxx>, expected: hdfs://nameservice1”

Solution
HA enablement requires logical address for Nameservice and in this case it is called “nameservice1″(this is usually a default name for Nameservice).  After enabling HA, you are supposed to update the Hive meta-store to use the new name-service for its HDFS warehouse i.e. “nameservice1” and not the active hdfs address. So update this property in hive-site.xml to get pass through the issue.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply