How SAP HANA PAL can help to reduce cost intensive manual post-processing
The challenge – cost intensive manual post-processing in accounting processes
In financial accounting, transaction data that is transferred to the corresponding accounting system often contains errors or risks. When processing those erroneous or risky business transactions, errors or problems in the accounting system itself can occur resulting in the need for manual post processing in order to correct corresponding problems. Because a lot of time is required to process these post processing orders, they are very cost-intensive.
By analyzing transaction data before it is actually processed in the accounting system, conditions indicating errors or risks can be identified in advance, thereby avoiding or significantly reducing the need for manual and cost-intensive post processing.
In general, erroneous data sets can relate to problems that are rather simple to detect like missing field entries or unknown values in the system. On the other hand, there are also more complex patterns that indicate a possible error or risk. Because the analysis of large amounts of data with patterns that are difficult to detect can easily exceed human capabilities, it makes sense to use artificial intelligence for automatically learning from and adapting to given data sets.
The solution approach – Identifying erroneous data by applying machine learning algorithms of SAP HANA PAL
Within a bachelor thesis this approach is prototypically implemented by building and training a neural network using SAP’s library HANA Predictive Analysis Library (PAL). This way, the potential benefit from the application of neural networks on manual and cost-intensive processes can be investigated.
Because of the given scenario it is too complex to access “real”, productive customer data which reflects errors and conditions occurring in a productive system, the data to be used by the network is generated artificially. For that, suitable input features as well as their characteristics are determined manually so that training- and test data can be generated accordingly. By manually predefining a perfect, expected score card model for the error/ risk classification and generation, the performance of the trained network can be validated precisely by comparing the predefined to the learned weightings.
The outcome – Highly accurate prediction by using neural networks offering a huge potential for cost reduction
Applying a neural network on data generated as described above shows that errors and/ or risks can be detected very accurately in the given example. With a training data set of 100.000 examples and an error rate of about 5%, a prediction accuracy of close to 100 percent can be reached.
Furthermore, several tests and experiments conducted in the scope of this project give helpful hints for possible future applications that might be based on productive data.
The key findings are:
- The more data is available for training the network, the better the performance and thus the prediction accuracy of the network will be
- The higher the percentage of erroneous/ risky transactions within the data is, the less training examples are required for achieving a satisfactory result
- The network can precisely indicate irrelevant input features that have no effect on the error classification by learning a weight value that is close to 0 for the respective transition
- Normalizing the data before feeding it into the network results in a faster training time as well as much more accurate predictions
- With HANA PAL, data stored on regular HANA instances can be easily used for the application of appropriate machine learning algorithms in the same place so that data does not need to be moved. However, this library holds some restrictions that might cause problems when working on more complex scenarios
Because the result and the accuracy of the network strongly depends on the training data, it is very important to preprocess it before feeding it into the network and to carefully choose relevant input features.
All in all the findings give important hints and insights for future applications. Especially the shown influence of amount and structure of training data and features needs to be considered when developing future applications.
Apart from the use case described above further possible use cases include credit scoring, error classification and fraud detection:
In credit scoring neural networks are used to give recommendations and to make decisions whether a customer should be granted credit based on his past behavior and credit history.
Furthermore, neural networks can be used to classify errors that occurred in a particular system. By analyzing error messages, corresponding data and contributory influences errors can be classified in order to help speeding up the search for root causes and the correction of errors.
Finally, neural networks are often used to detect fraud based on transaction and customer characteristics – a popular use case is the detection of credit card fraud.
All in all, the thesis has shown that, in general, the application of neural networks can be very beneficial when enough data with a high data quality is available so that processes can be supported and/ or optimized.