SAP HANA & Python: State of the Art Data Transfer for Machine Learning
Increasing data volumes and rapid changes in everyday business require that analyses also adapt to the prevailing speed of the data. The Python programming language with its flexible, extensive libraries in the data science area and the high-performance in-memory database management system SAP HANA are an unbeatable combination for fast data analyses. However, the possibilities are slowed down by the traditional database connection. Data can be accessed in real time in HANA, but is only transferred gradually via the Hana Database Client (hdbcli) driver. In addition, the upload of data is complicated by the relational database schema, which must be defined in advance, as opposed to NoSQL database types, and the connection data is not infrequently included in clear form in the code.
By further developing the database connection between the SAP HANA and Python with state-of-the-art parallelization and user-friendly standardization steps of data download and upload, the development process gains speed and simplicity.
Performance, security and usability for the connection of SAP HANA with Python
The NLY SAP HANA Python Connector is designed as a Python module to establish a high-performance, secure connection to the SAP HANA database via the Open Database Connectivity (ODBC) interface. The technologies PyArrow, Turbodbc and the secure user store (hdbuserstore) provide an up-to-date data connection especially for the data engineering and data science area:
Perfectly suited for advanced analyses
Performance through parallelization
Through parallelization, the data download with the HANA connector reaches up to 9 times the speed compared to the connection with the standard SAP driver. In the upload, even 10 times the speed can be achieved.
Secure authentication
The access to the HANA database by means of the connector is optimally carried out via the authentication by means of the token of the HDBUSERSTORE. There, the connection data of the HANA are securely stored on the client side. An alternative login with username and password is possible, but not necessary.
Full compatibility with Pandas DataFrames
Pandas DataFrames are often used for data manipulation in the context of artificial intelligence. With the HANA connector, data is not only loaded directly into a DataFrame, but also during an upload its data types are matched accordingly to the HANA data types and tables that do not yet exist are created automatically.
Practical functions for the development process
In terms of its functions, the connector is closely aligned with the everyday work of developers. Frequently used SQL statements, such as the output of all tables of a schema, all columns of a table and checking for the existence of a table are integrated into the cursor as easily accessible functions.
Performance
The download speed increases up to factor 9.
In addition to automatic data type conversion, the use of parallelization ensures a 10x faster upload.
Areas of application
Artificial Intelligence
In the field of artificial intelligence, the amount of analysis data required is often immense. A fast data download not only helps in the data exploration phase, but also in model generation as well as future prediction. Especially in the prototyping phase, the time gained can be used productively.
Data Engineering
Flexible data manipulation of data is quickly realized in Python with the HANA Connector and executable with a speed advantage. Productive systems additionally benefit from the failover support provided by the HDBUSERSTORE. Efficient storage formats greatly reduce the amount of memory required for data retrieval.
Installation
The NextLytics Python SAP HANA connector is installed as a Python module. Setup of the hdbuserstore is also included in the documentation and a CLI tool simplifies setup and connection management.
Do you have questions or are you interested in the connector?
Please do not hesitate to contact us.
NextLytics is always at your side as an experienced project partner. We help you to effectively solve your data problems from data integration to the use of machine learning models. Use the form below to ask your question and we will get back to you as soon as possible.
We look forward to hearing from you!
Find interesting articles about SAP Connectors in our Blog
More flexibility for processes: Connecting SAP BW and Apache Airflow
The integration of systems is an interesting and necessary field that no modern IT landscape can do...
Use Google API and ABAP to enrich SAP data with geoinfos
Nowadays, geodata play an increasingly important role. If you want to exploit the full potential of...