Skip to content
NextLytics
Megamenü_2023_Über-uns

Shaping Business Intelligence

Whether clever add-on products for SAP BI, development of meaningful dashboards or implementation of AI-based applications - we shape the future of Business Intelligence together with you. 

Megamenü_2023_Über-uns_1

About us

As a partner with deep process know-how, knowledge of the latest SAP technologies as well as high social competence and many years of project experience, we shape the future of Business Intelligence in your company too.

Megamenü_2023_Methodik

Our Methodology

The mixture of classic waterfall model and agile methodology guarantees our projects a high level of efficiency and satisfaction on both sides. Learn more about our project approach.

Products
Megamenü_2023_NextTables

NextTables

Edit data in SAP BW out of the box: NextTables makes editing tables easier, faster and more intuitive, whether you use SAP BW on HANA, SAP S/4HANA or SAP BW 4/HANA.

Megamenü_2023_Connector

NextLytics Connectors

The increasing automation of processes requires the connectivity of IT systems. NextLytics Connectors allow you to connect your SAP ecosystem with various open-source technologies.

IT-Services
Megamenü_2023_Data-Science

Data Science & Engineering

Ready for the future? As a strong partner, we will support you in the design, implementation and optimization of your AI application.

Megamenü_2023_Planning

SAP Planning

We design new planning applications using SAP BPC Embedded, IP or SAC Planning which create added value for your company.

Megamenü_2023_Dashboarding

Dashboarding

We help you with our expertise to create meaningful dashboards based on Tableau, Power BI, SAP Analytics Cloud or SAP Lumira. 

Megamenü_2023_Data-Warehouse-1

SAP Data Warehouse

Are you planning a migration to SAP HANA? We show you the challenges and which advantages a migration provides.

Business Analytics
Megamenü_2023_Procurement

Procurement Analytics

Transparent and valid figures are important, especially in companies with a decentralized structure. SAP Procurement Analytics allows you to evaluate SAP ERP data in SAP BI.

Megamenü_2023_Reporting

SAP HR Reporting & Analytics

With our standard model for reporting from SAP HCM with SAP BW, you accelerate business activities and make data from various systems available centrally and validly.

Megamenü_2023_Dataquality

Data Quality Management

In times of Big Data and IoT, maintaining high data quality is of the utmost importance. With our Data Quality Management (DQM) solution, you always keep the overview.

Career
Megamenü_2023_Karriere-2b

Working at NextLytics

If you would like to work with pleasure and don't want to miss out on your professional and personal development, we are the right choice for you!

Megamenü_2023_Karriere-1

Senior

Time for a change? Take your next professional step and work with us to shape innovation and growth in an exciting business environment!

Megamenü_2023_Karriere-5

Junior

Enough of grey theory - time to get to know the colourful reality! Start your working life with us and enjoy your work with interesting projects.

Megamenü_2023_Karriere-4-1

Students

You don't just want to study theory, but also want to experience it in practice? Check out theory and practice with us and experience where the differences are made.

Megamenü_2023_Karriere-3

Jobs

You can find all open vacancies here. Look around and submit your application - we look forward to it! If there is no matching position, please send us your unsolicited application.

Blog
NextLytics Newsletter Teaser
Sign up now for our monthly newsletter!
Sign up for newsletter
 

Apache Airflow Summit 2022 - Trends and Topics

The Apache Airflow Summit is the highlight of the year for the community of developers, contributors, and users of the popular workflow orchestration platform. The 2022 edition of the Summit was held from May 23rd to May 27th as a global hybrid conference with local sessions in 13 different cities around the globe, 53 sessions of presentations and discussions, and boasting an audience of almost 8000 individuals following along in the online conference interface. Our NextLytics consultants joined the conference to look out for the newest developments and trending topics and have consolidated their observations into this conference summary for you!

airflow_summit_loading

Apache Airflow Summit conference welcome screen (image source: airflowsummit.org/)

Clustering Airflow Summit Sessions

With 53 sessions spanning from 30 to 60 minutes and no officially announced conference tracks to help navigating, it can be challenging to find the most interesting presentations and discussions. Following along many of the live sessions and re-watching almost all sessions from the 5 day Summit, we have derived 4 clusters of major topics addressed:

  1. Apache Airflow core developments and new features
  2. Apache Airflow operations and monitoring
  3. Apache Airflow application examples and best practices
  4. Apache Airflow Machine Learning

If you don’t have the opportunity or resources to check out the massive amount of recorded content from the Summit yourself, we present you a quick rundown of the most important developments in the following sections. Read to the very end for an additional glance at the most promising future features of Airflow that were hinted at during the conference!

summit_topics_EN

Main topics of the Apache Airflow Summit 2022 conference

Airflow Core Features

Considering Airflow 2.3 had only been released on April 30th, 2022, one of the main topics of the Summit was obviously going to be the showcasing of new features and recap of some of the internal workings of the newest release.

A major development in Airflow 2.3 has been the introduction of proper “Dynamic DAG” functionality. Whereas before all best practice examples for dynamically creating tasks during DAG runtime have been introduced with the customary phrase “we know this is NOT good practice”, there now is one! As part of the Airflow core, any type of Operator can be multiplied (“expanded”) by iterating through contents of Python lists or dictionaries. No more custom helper-code required.

The new Grid View component in the user interface is a highly visible new feature. Displaying all tasks and their individual executions and state in a compact form, this is a significant step forward from previously available visualizations. Expanded tasks created by the new dynamic DAG features are displayed in grouped form and are intuitively accessible in this view. The additional Details panel allows users to quickly check in-depth information about a DAG to further improve the overall user experience of the web interface.

grid_airflow.apache.org

Screenshot of the Grid View display component introduced in Airflow 2.3 (image source: airflow.apache.org)

More of a hidden but important architectural development is the introduction of “deferrable” Operators. To achieve this, a new component called Triggerer has been introduced into the Airflow core to handle all asynchronous jobs, i.e. jobs that wait for an external condition for longer periods of time instead of actively processing anything themselves. This interface is a logical successor to the experimental “Smart Sensors” concept which enabled resource-efficient scanning for trigger conditions via a grouping of sensor activities. Performance improvements of up to 90% less compute resources spent are reported by the developers. The implementation is as easy as using the asynchronous type of available operators, which will be implemented bit by bit by the Airflow community.

For more information about the latest features in Airflow, check out the official release blog post or contact us to discuss the potential of upgrading your Airflow system with regard to your specific requirements and environment!


Optimize your workflow management with Apache Airflow!

NextLytics Whitepaper Apache Airflow


Airflow Operation and Monitoring

Another large part of sessions focused on how to operate Airflow in different (cloud) environments and especially how to address the challenges arising from having a growing number of workflows or instances running in production: monitoring and (data) observability.

As Apache Airflow reaches a mature state, focus shifts from introducing sparkling new features to optimizing day-to-day operations for many teams. This shift has been clearly visible at the 2022 conference. Software-as-a-Service providers shared their insights into running many Airflow environments efficiently on Kubernetes clusters and/or cloud infrastructure. While some of this knowledge may be applicable in smaller scale operations, the more prevalent and omnipresent hot topic has been how to monitor Airflow, its overall performance, and specifically the success state of the business logic packages into DAGs.

With Apache Airflow ETL operations in mind, the question of how to reduce “data downtime” has been discussed in multiple sessions. Slightly different definitions of the concept have been presented but in general it refers to problems arising from errors somewhere in the data pipeline that is orchestrated by Airflow as the central component. Robust system operation based on DevOps principles and automation are one part of the solution. Check out our previous article about how to create a robust system using GitLab CI/CD if you want to know more about this. The second and still more challenging part is to monitor data flows and data quality across workflows. Airflow is a great tool to ensure robust execution of DAGs but needs extension to enable the monitoring of success or failure of the actual business logic implemented.

This is where the concept of data lineage and the OpenLineage specification come into play. Data lineage information is metadata about the origin and processing steps taken to create a certain piece of information, for example a value in a data warehouse table. Lineage is expected to enable a boost in efficiency for tracing problems through complex data ecosystems as well as communicating meaning and responsibility attached to datasets and data products. The goal of OpenLineage is to become the communication standard for reporting and checking data lineage metadata across all systems contributing to data pipelines. Current developments aim to provide out-of-the-box compatibility for Airflow and data processing DAGs and tasks, and some examples have been shown at the Summit. While this integration effort is making progress, it is still a complex procedure that requires deep understanding of both Airflow internals and the tools used for the integration.

Currently, there is no single best way to implement data observability and companies like Monte Carlo, Databand, or Acryl Data (with their open source project DataHub) actively push their services with promises of low-effort plug-and-play solutions.

Airflow Best Practices and Applications

Presentations from many vendors and conference sponsors about how to best use Airflow in different scenarios and in combination with different third party solutions from the data ecosystems are another traditionally strong theme of the Summit.

Among the many presentations on these topics, using CI/CD tools to automate Airflow operations and DAG development processes has been a frequent recommendation. Details of how to implement such DevOps principles for Airflow paralleled our own collection of best practices. Other best practices for workflow and data flow implementation included the application of containerized process steps using the DockerOperator or KubernetesOperator, the use of (automated) testing regimes, and utilization of features like Pools and Service Level Agreements. We have covered some of these Apache Airflow best practices in the NextLytics blog before and will highlight others in more detail in the near future.

One interesting observation for customers of Airflow SaaS offerings is that access to the Airflow API has been one of the most sought after features in multiple sessions. Chances are that SaaS providers like AWS are prioritizing work on enabling full use of the REST API for customers to allow for deep integration with other systems like SAP BW process chains and services.

Airflow for Machine Learning

Last but not least, and very much in line with how we often deploy Airflow for our customers, aspects of using the orchestration platform for Machine Learning applications and operation was a major topic in 2022.

Airflow has shaped up to be one of the most popular orchestration tools in the Machine Learning operations (MLOps) landscape. Scalability, extensibility, and richness in features make Airflow a great choice for managing ML workflows - a view shared by many leading companies from the tech industry as contributions to this Summit have again demonstrated.

Major reasons why Airflow is considered perfect for orchestrating MLOps workflows mentioned in presentations and discussions at this conference were:

  • DAG execution is very robust and gives many options to recover from failed states and/or to re-run tasks as necessary
  • DAGs can freely integrate with many different systems through the many available Operators, allowing teams to choose the best tools and ML frameworks for their specific needs
  • Recurring steps in the ML pipeline can be easily containerized and run using Docker or Kubernetes Operators, introducing security features and version control into the process
  • Airflow can be integrated with third party system through use of the REST API, again enabling for more advanced integration with the existing system context

Apache Airflow Summit Highlights_EN

Apache Airflow is a popular solution for orchestrating Machine Learning development and operations frameworks.
The diagram shows a schematic of how Airflow can be used to manage ML workflows around a Feature Store.

Airflow Machine Learning best practices displayed at the Summit 2022 also revolved around the emerging components like feature stores and model registries to create user-friendly end-to-end ML platforms.

The (near) Future of Airflow

With so many improvements presented, best practices shared, and ideas sparked - what more can be expected from Apache Airflow in the near future? To conclude our first recap of the Airflow Summit 2022, here are some of the features that have been announced for coming releases:

The ability to have multi-event triggers based on the deferrable operator interface will further enhance the design possibilities for DAGs. A native DAG versioning interface will provide a reliable way to migrate through changes in different environments, make custom versioning workarounds obsolete, and allow for more robust deployments. Data-driven scheduling and data-aware DAG execution will introduce the ability to actively refer to Dataset objects in DAGs and use these (and changes on them) to create cross-DAG dependencies. And finally, the Airflow web interface will soon see a complete relaunch as a modern React-based web application to lift user experience to a whole new level.

We already can’t wait to witness the next releases of Airflow and the inspiring community exchange about new trends and topics at next year's Airflow Summit! In the meantime, we would be happy to hear about your teams’ challenges and collaborate on creating efficient solutions by making use of Airflow’s broad capabilities.

Learn more about Apache Airflow

, ,

avatar

Markus

Markus has been a Senior Consultant for Machine Learning and Data Engineering at NextLytics AG since 2022. With significant experience as a system architect and team leader in data engineering, he is an expert in micro services, databases and workflow orchestration - especially in the field of open source solutions. In his spare time he tries to optimize the complex system of growing vegetables in his own garden.

Got a question about this blog?
Ask Markus

Blog - NextLytics AG 

Welcome to our blog. In this section we regularly report on news and background information on topics such as SAP Business Intelligence (BI), SAP Dashboarding with Lumira Designer or SAP Analytics Cloud, Machine Learning with SAP BW, Data Science and Planning with SAP Business Planning and Consolidation (BPC), SAP Integrated Planning (IP) and SAC Planning and much more.

Subscribe to our newsletter

Related Posts

Recent Posts