Performance is one of the main arguments for using SQLScript. With the introduction of the HANA database, SAP has introduced the "code pushdown" concept. With this new paradigm, time-consuming calculations with high data volumes are delegated to the database level. Using SQLScript, you have native access to the HANA database and can avoid unnecessary data transfers between the database and the application server.
However, you can also design your SQL code in such a way that the programs run very slowly. Therefore, in this article we present proven tips to achieve high SQLScript performance.
High performance is more achievable when working with a small amount of data. Therefore, limit the data volume as early as possible and select only the data that you really need. A few easy-to-follow rules will help you:
Get into the habit of always explicitly selecting the columns of a table. So instead of just selecting everything, as in the following statement ...
SELECT * FROM Customers;
... you should include the required columns in the statement:
SELECT
CustomerName,
City
FROM
Customers;
In addition to explicitly selecting columns, you should always include the WHERE clause in your SELECT statements. This is used to filter the data. This way, only the records that meet a certain condition are extracted. Because often only a subset of all existing data is needed for the calculations.
SELECT
CustomerName,
City
FROM
Customers
WHERE
Country='Germany';
Please note that the HAVING clause does not have the same effect. While the WHERE clause reduces the amount of data during selection, the HAVING clause forms a filter on the already grouped and aggregated data. Therefore, the HAVING clause cannot be used to improve performance.
If you have worked with ABAP before, you do not care about the client any further, because it has been selected implicitly in OpenSQL. With SQLScript, on the other hand, you must always explicitly define the required client.
By filtering out the data of other clients, you increase performance. This is especially true for joins. If the client is not defined, the Cartesian product of all clients is used for the result.
When processing SQLScript queries, SAP HANA uses different engines. Basically, a distinction can be made between the Row Engine and the Column Engine. These are specialized for different tasks. To achieve the highest possible performance, you should avoid switching between the engines in your code.
Thus, the row-by-row processing, for example in the case of a window function, takes place in the Row Engine. In contrast, joins, aggregations and calculations are performed in the Column Engine. If it is necessary to switch between the engines when executing a query, the data must be materialized. This comes at the expense of performance.
Just like switching between Row and Column Engine, you should avoid switching between declarative and imperative statements. The reason is that only declarative statements can be optimized by SAP HANA. Therefore, you should use them preferentially and avoid imperative statements.
What is the difference between declarative and imperative statements? With declarative statements, you tell the system what data you would like to have. The system independently finds the best way to provide you with the data. With imperative statements, on the other hand, you explicitly tell the system how it should deliver the data. However, this "micromanagement" also limits the system's potential to optimize execution.
Declarative statements include SELECT, JOIN, WHERE as well as clauses such as WITH, GROUP BY, HAVING, ORDER BY and also predefined functions such as MAP. Imperative statements include FOR and WHILE LOOP as well as IF ELSE branches.
While loops are generally bad for performance, it can be made worse by executing Data Manipulation Language (DML) statements inside the loops. This includes any statements that read or modify data in tables. For example SELECT, INSERT, UPDATE or DELETE.
Instead of inserting or deleting data within the loops, try to bundle the adjustments together and then execute them in a single statement whenever possible.
In contrast to the table function, the scalar function returns a scalar (i.e. a single) value. In general, scalar functions perform worse in terms of performance. The poor performance can be especially apparent with large amounts of data. Before going live, you should therefore run a performance test with a sufficiently large amount of data.
In addition, you should not execute SELECT queries in scalar functions for performance reasons. These are executed very often, which is detrimental to performance. Please note that this also includes queries to the DUMMY table.
If your scalar function always returns the same result given constant input parameters, you can use the DETERMINISTIC keyword to turn on buffering. It is set after the definition of the parameters. Buffering is particularly useful for functions with binary results. If you expect different results, DETERMINISTIC should not be used, because the performance may even be worse due to caching.
Now you know the typical pitfalls and how to avoid them. Follow our guidelines and get the most out of your SQLScript programs!
Do you have questions about SQLScript? Or do you want to convert your transformation routines to SQLScript and are looking for experienced developers with SQLScript know-how? Please do not hesitate to contact us.