NextLytics Blog

In-Database Machine Learning with SAP HANA Predictive Analysis Library

Written by Sebastian | Jul 13, 2020 8:31:24 AM

There is no question that data is the new currency in the digital age. However, only very few companies use the full potential of their data, for example to uncover previously unknown findings and correlations or to predict trends. Predictive Analysis Library (PAL) is designed to make this step in the SAP ecosystem easier. As part of the Application Function Library (AFL), Predictive Analysis Library (PAL) contains pre-defined algorithms that can be used for predictive analysis and data mining.

These can be used immediately and are executed within SQLScript procedures, so no additional hardware or software is required. In addition, you also benefit from high speed, since the algorithms are executed directly within SAP HANA at database level. In this article we will take a look at the functions of PAL.

 

Machine learning in the SAP ecosystem without the PAL

In machine learning projects, the data is usually transferred from the HANA database to an external application server. There, a machine learning model is used to generate forecasts based on the transferred data. The results are then transferred back to the HANA and can be integrated into the existing reporting.

This procedure has several disadvantages. Investments in additional hardware and maintenance have to be made. Additional license fees are also often incurred. In addition, very large amounts of data have to be loaded back and forth between the SAP HANA database and the machine learning server, which takes up a lot of time and resources. Predictive Analysis Library adresses precisely these issues.

 

Application areas of the PAL

The Predictive Analysis Library provides functions for predictive analytics and data mining that can be called and executed as SQLScript procedures directly within the HANA. This eliminates the need for additional servers or applications and eliminates data transfer to and from the HANA database. Algorithms are delivered for the following use cases:

  • Preprocessing
  • Clustering
  • Classification
  • Regression
  • Time Series
  • Association
  • Statistics
  • Social Network Analysis
  • Recommender System
  • Miscellaneous (e.g. ABC Customer Analysis)

PAL also offers several incremental machine learning algorithms that learn and update a model "on the fly", so that predictions are based on a dynamic model. This approach can be used especially in the field of streaming analytics. With PAL even more complex algorithms like a Multi Layer Perceptron (MLP) can be implemented over a multitude of layers (= Deep Learning).

 

Technical requirements

To use the PAL procedures, the following requirements must be met. First, you need the SAP HANA Platform with the version that corresponds to the PAL version. Second, you must install the Application Function Library (AFL), which contains the Predictive Analysis Library (PAL). Finally, you need to activate the Script Server on the SAP HANA instance. For more information, see SAP Notes 1898497 and 1650957.

Advantages of the PAL

Everything in one place

The main advantage of PAL is that the data can be processed directly at the source where it is created or kept. You can run the algorithms directly in S/4HANA or BW/4HANA or BW on HANA. This significantly reduces the number of complex ETL processes. Huge amounts of data no longer have to be exported to third-party analytics tools via interfaces or manual exports.

HANA Power

Another great advantage of the Predictive Analysis Library is the enormous computing power that is achieved with the help of SAP HANA. Usually, the hardware of the HANA has several terabytes of working memory and a large number of CPUs. Furthermore, the algorithms delivered in the PAL are also optimized for use in HANA to achieve short computing times even with resource-intensive algorithms.

HANA native algorithms

The HANA native algorithms also represent an advantage of the PAL. For example, if you want to use algorithms for forecasting or to identify anomalies in your business process, as a SAP HANA user you do not need to purchase expensive machine learning software. You do not have to worry about database interfaces either, because the processing of the data and output of the results are carried out directly on the SAP HANA database.

No additional license fees

When using the PAL, you do not incur any additional license fees, as is the case with the SAP Predictive Analytics product. However, greater expertise is required with regard to the analytical functions, since the tool as a whole is aimed more at Data Scientists than at business users.

How to bring SAP BW and state of the art machine learning together


Easy modeling and access via Python

The training of the models and the execution of forecasts can be initiated either in HANA Studio via HANA SQLScript procedures or via the SAP HANA Application Function Modeler Web IDE in the browser. In contrast to HANA 1.0, where the models had to be written using wrapper procedures, HANA 2.0 offers the possibility to use flow graphs in the web-based modeler.

In a flow graph, individual functions are represented as graphical objects and can be linked to each other. The result is a flowchart that shows the flow of data through the individual process steps and algorithms. The functions can be inserted from a repository using drag & drop and parameterized using input fields. The procedures and tables required to process the algorithm are automatically created in the background. Thus, modeling is even possible without any code at all.

Advanced users can also run algorithms within the PAL using an official Python module (hana-ml). The advantage here is that Python has a significantly higher level of acceptance among data scientists than the SAP machine learning tools. Therefore, modeling can be performed much closer to the industry standard.

Integration in SAP BW

The results obtained using a PAL algorithm can be exported to a HANA table. This allows you to access the results with all common reporting tools (e.g. SAP Lumira or SAP Analytics Cloud) and present them graphically to the business department in a prepared report. Since the entire process can be automated end-to-end, the report can always be supplied with the latest data. Either at the push of a button or through a process that is executed every night.

Disadvantages of the PAL

However, in addition to the numerous advantages of PAL, there are also some disadvantages that should be considered.

Limitation of the algorithms

PAL already provides a multitude of classical algorithms for advanced analytics. However, it should be noted that only the predefined algorithms can be selected. The use of customer-defined algorithms is not possible within the PAL.

Cumbersome use and lack of acceptance by data scientists

Especially data scientists usually use the programming languages Python or R as well as open source frameworks like TensorFlow or PyTorch for machine learning and analytics solutions. Using the PAL via SQLScript or the Web IDE may seem attractive to SAP BW users, but it is relatively unusual in an industry comparison. The Python module hana-ml for controlling the PAL represents a step in the right direction, but currently it has a limited range of functions and certain restrictions. In practice, these factors often lead to a low acceptance of PAL by data scientists.

Lack of GPU support

Many modern and popular machine learning algorithms, especially in the area of deep learning, rely on the use of GPUs instead of CPUs. In this way, performance increases by a factor of over 10 can easily be achieved compared to the use of CPUs. Unfortunately, PAL does not offer the possibility of using GPUs and thus loses part of the performance advantage in many use cases. 

Complicated access to external data sources

Nowadays, much of the data for analysis comes from external sources, such as non-SAP databases, APIs and social media. With PAL, only HANA tables can be used as data sources. Although it is possible to connect external data sources to the HANA or to import them via ETL processes, this detour is more complex than in many other Advanced Analytics tools, where external data sources can often be directly integrated for analysis. 

 

Possible PAL alternatives

If you want to use the HANA in-database analytics options, but do not want to worry about a selection of algorithms or parameter tuning, you can use the SAP HANA Automated Predictive Library (APL), which has a very similar workflow to PAL. The difference here is that APL focuses on automated analytics: algorithms are automatically selected and fine-tuned to answer simple business questions.

Another application for SAP is the SAP Predictive Analytics (PA) application, which is aimed more at business users than data scientists and has a more GUI-based modeling approach. Furthermore, additional license costs are incurred and the advantage of in-database analytics is lost, since it is executed on the application server. If you want to learn more about SAP Predictive Analytics, you can find our article on this topic here.

Furthermore, the SAP Data Intelligence solution can be named as an alternative to PAL. However, since this is a very new solution from SAP, there has so far been little experience and limited documentation. Nevertheless, it is a technologically strong solution that also has a variety of functions in the area of ​​data management.

Finally, an R integration and the SAP HANA External Machine Learning Library (EML) are available for data science users in the SAP HANA ecosystem. The R integration allows code of the popular open source programming language R to be integrated within SQLScript and executed on an external server. Similarly, the SAP HANA External Machine Learning Library allows you to access pre-trained and externally provided machine learning models from the TensorFlow framework and generate forecasts using SQLScript (you can find a link to our article on EML here).

If none of the SAP solutions for machine learning and advanced analytics are suitable for you, there is also the option of providing a database interface for your data scientists and using an additional, individually configurable open source stack. In addition to a high-performance HANA database interface, our NextLytics Python Software Development Kit (NLY-SDK) also offers you many useful additional functionalities with which you can provide the optimal conditions and maximum flexibility for your machine learning and advanced analytics developments.

 

Our Summary - SAP HANA Predictive Analysis Library

As part of the SAP Application Function Library (AFL), the SAP Predictive Analysis Library contains SQLScript procedures for analytical algorithms such as clustering, classification and regression. Using the PAL, the algorithms can be executed directly in HANA. This integration into the SAP BW landscape is an enormous advantage. There is no need to transfer data from the HANA database to third-party machine learning or advanced analytics tools and back. Thanks to HANA's powerful hardware, the execution time is significantly reduced. 

However, there is only a limited number of algorithms that can be used. Furthermore, the tool cannot directly access external data sources. Although the data can be loaded redundantly into HANA, this is often not desired due to the widespread silo thinking. Furthermore, most data scientists and business users have probably never worked with SQLScript and the SAP-specific tools. This could reduce user acceptance. 

Whether the SAP HANA Predictive Analysis Library is the right solution for you depends on many factors. To answer this question, we recommend our white paper "SAP BW and State of the Art Machine Learning". In this white paper, in addition to the entire SAP machine learning portfolio, we also examine an open source-supported approach based on the NextLytics Python Software Development Kit (NLY-SDK) and give clear recommendations on how you can get the most out of your data.