Skip to content
NextLytics
Megamenü_2023_Über-uns

Shaping Business Intelligence

Whether clever add-on products for SAP BI, development of meaningful dashboards or implementation of AI-based applications - we shape the future of Business Intelligence together with you. 

Megamenü_2023_Über-uns_1

About us

As a partner with deep process know-how, knowledge of the latest SAP technologies as well as high social competence and many years of project experience, we shape the future of Business Intelligence in your company too.

Megamenü_2023_Methodik

Our Methodology

The mixture of classic waterfall model and agile methodology guarantees our projects a high level of efficiency and satisfaction on both sides. Learn more about our project approach.

Products
Megamenü_2023_NextTables

NextTables

Edit data in SAP BW out of the box: NextTables makes editing tables easier, faster and more intuitive, whether you use SAP BW on HANA, SAP S/4HANA or SAP BW 4/HANA.

Megamenü_2023_Connector

NextLytics Connectors

The increasing automation of processes requires the connectivity of IT systems. NextLytics Connectors allow you to connect your SAP ecosystem with various open-source technologies.

IT-Services
Megamenü_2023_Data-Science

Data Science & Engineering

Ready for the future? As a strong partner, we will support you in the design, implementation and optimization of your AI application.

Megamenü_2023_Planning

SAP Planning

We design new planning applications using SAP BPC Embedded, IP or SAC Planning which create added value for your company.

Megamenü_2023_Dashboarding

Dashboarding

We help you with our expertise to create meaningful dashboards based on Tableau, Power BI, SAP Analytics Cloud or SAP Lumira. 

Megamenü_2023_Data-Warehouse-1

SAP Data Warehouse

Are you planning a migration to SAP HANA? We show you the challenges and which advantages a migration provides.

Business Analytics
Megamenü_2023_Procurement

Procurement Analytics

Transparent and valid figures are important, especially in companies with a decentralized structure. SAP Procurement Analytics allows you to evaluate SAP ERP data in SAP BI.

Megamenü_2023_Reporting

SAP HR Reporting & Analytics

With our standard model for reporting from SAP HCM with SAP BW, you accelerate business activities and make data from various systems available centrally and validly.

Megamenü_2023_Dataquality

Data Quality Management

In times of Big Data and IoT, maintaining high data quality is of the utmost importance. With our Data Quality Management (DQM) solution, you always keep the overview.

Career
Megamenü_2023_Karriere-2b

Working at NextLytics

If you would like to work with pleasure and don't want to miss out on your professional and personal development, we are the right choice for you!

Megamenü_2023_Karriere-1

Senior

Time for a change? Take your next professional step and work with us to shape innovation and growth in an exciting business environment!

Megamenü_2023_Karriere-5

Junior

Enough of grey theory - time to get to know the colourful reality! Start your working life with us and enjoy your work with interesting projects.

Megamenü_2023_Karriere-4-1

Students

You don't just want to study theory, but also want to experience it in practice? Check out theory and practice with us and experience where the differences are made.

Megamenü_2023_Karriere-3

Jobs

You can find all open vacancies here. Look around and submit your application - we look forward to it! If there is no matching position, please send us your unsolicited application.

Blog
NextLytics Newsletter Teaser
Sign up now for our monthly newsletter!
Sign up for newsletter
 

Running Apache Airflow on Windows Server - Does this make sense?

The right data at the right time in the right place, prepared appropriately for the respective recipients. This is how one could describe the ideal vision of a functioning business intelligence infrastructure. In reality, achieving this often requires a large number of system components that have to harmonise with each other at the right pace. As the number of source systems and consumers of an analysis data warehouse increases, the coordination and monitoring of status and timeliness of processes alone becomes a challenge. An orchestration service such as Apache Airflow can be the right tool for maintaining an overview. Airflow not only offers a wide range of functions for controlling, securing and tracking planned tasks, but can also be operated free of licensing costs as an open source project. But for many companies, the question quickly arises: Can Apache Airflow be operated on a Windows server? Today we would like to present the possibilities, advantages and disadvantages.

We keep coming across companies that are looking for a suitable service for the definition and management of workflows and data pipelines. NextLytics strongly recommends Apache Airflow. Airflow relies entirely on a code-based approach for the definition of workflows and processes, which creates great reliability and enables a fully traceable operating process for this beating heart of a business intelligence infrastructure. Airflow continues to focus on extensibility, is anchored in Python, the most popular programming language of our time, and is highly customisable with countless extension modules. But that Apache Airflow is primarily designed for operation on Linux operating systems, in a container virtualisation platform or in the cloud is sometimes a hurdle: Data should only be processed in the company's own data centre or there is not yet a cloud strategy ready for implementation. And finally: the in-house IT department only offers Windows Server as an operating system.

Apache Airflow and Windows: the possibilities

Apache Airflow is based on a micro-services architecture, i.e. different, individually scalable services take on different tasks and communicate with each other via web service interfaces. Such an architecture can be operated with little effort using container virtualisation, provided the appropriate infrastructure is available. A virtual server with Docker Engine is often sufficient for small to medium-sized systems. More powerful systems feel particularly at home in a Kubernetes cluster. Docker in particular has gone to great lengths in recent years to be able to run as a virtualisation layer on Windows Server operating systems. At first glance, the case seems clear: deploy Windows Server with Docker Engine, roll out the Airflow containers and you're ready to go.

Apache Airflow on Windows Server

In practice, you are quickly confronted with different variants and problems. The most common operating variants for Airflow using Docker on a Windows system are

  1. Operation of the Docker Engine using Docker Desktop
  2. Operation of a Docker Engine in a virtual Linux environment provided via "Windows Subsystem for  Linux" (WSL)

There are numerous instructions and best practices for both variants on the Internet, but beware: it doesn't work really smoothly!

Apache Airflow with Docker Desktop

Probably the easiest way to run Airflow on a Windows operating system is to use the "Docker Desktop" programme. Docker Desktop provides a graphical user interface to activate the Docker Engine and start Docker containers. The current version of Docker Desktop uses the Windows Subsystem for Linux v2 to set up a virtual Linux environment in the background and run the Docker Engine in it, completely unnoticed by the user. The command line interface of the Docker Engine is then passed through to the Windows operating system and can be accessed directly from Powershell using familiar commands such as "docker run" or "docker compose".

In this variant, integration into the Windows server is as seamless as possible. Base directories for a compose-based installation of Airflow are located directly in the Windows file system and can be managed with the usual tools such as Powershell, Git for Windows and Explorer.

Another big plus point is that Docker Desktop automates port forwarding between the operating system and Docker containers, so there is no additional effort or configuration required. Continuous Integration and Continuous Delivery (CI/CD) operations of Airflow can be facilitated with the toolset of choice, be it Gitlab CI/CD or Azure DevOps Pipelines.

Docker Desktop_Apache Airflow on Windows Server

Docker Desktop is designed as a graphical programme that is executed by a logged-in user. If Apache Airflow is to be operated as a permanently running service, this is initially a hindrance. However, the desired behaviour can still be achieved using an additional start script that is executed via Windows Service Management when the server is started. The Docker Engine can also be set up using configuration parameters so that it is started automatically on reboot.

In this variant, our tests have shown that integrating Airflow directories from the Windows file system into the Docker containers is problematic. Obviously, file locking mechanisms are negotiated differently than on Linux-based servers, so that the import of Airflow DAG definition files is prone to errors. It is therefore advisable to configure Airflow Docker images in such a way that the directories usually munted from the host system are fully included in the image.


Effective workflow management with Apache Airflow 2.0

NextLytics Whitepaper Apache Airflow


What is the obstacle to operating Apache Airflow in this way? None from a technical point of view, only the fact that Docker Desktop is a product that requires a licence. For test operations and small companies, use can be free of charge. From a certain annual turnover or more than 500 employees, however, licences must be purchased. Prices start at between 60 and a few hundred Dollars per year for a few users. 

Is there another way to run the open source system Airflow on Windows Server without additional licence costs then? We have also tested the following variant for you:

Apache Airflow with WSL

The Windows Subsystem for Linux has been integrated into Microsoft's operating systems practically across the board for several years now. It enables the operation of lightweight virtual operating system environments based on Linux within a running Windows operating system. Shouldn't it also be possible to run Apache Airflow using Docker in such a WSL environment completely like on a native Linux server?

We have tested the variant with an Ubuntu WSL environment. The WSL environment is created according to the usual instructions and controlled via the command line, just like a fully-fledged Ubuntu server. The Docker Engine in the WSL environment is installed in the usual way and a standard local operating environment for Apache Airflow with Git and docker-compose is prepared.

This variant becomes complicated at the interface between the Windows operating system and the WSL Linux environment. WSL environments are designed to be started and operated by a logged-in user at runtime. The automatic start with the host operating system is not initially provided for and must be constructed using scripts. There is also no integration of user IDs between the two environments, which means that the WSL environment must have its own local user management. Finally, there is also no communication between the host operating system and the Docker Engine in the WSL environment, meaning that port releases have to be set manually.

The conclusion of our tests for this variant is that the Linux operating system expertise required is even greater than for the operation of a native server. The number of complex workarounds due to the lack of automatic integration with the Windows server operating system also speaks against a serious productive operation of Airflow.

In a comparison of the two most promising options for installing Apache Airflow on Windows Server operating systems, Docker Desktop is the clear winner despite the additional licence costs. Nevertheless, this variant is not ideal, as it is more optimised for testing and developing systems in single-user mode.

The alternatives

Based on our many years of experience, we still recommend that even customers with a Windows server landscape run Apache Airflow on a native Linux operating system. The integration of established distributions such as Ubuntu, Red Hat Enterprise Linux or OpenSuse with Microsoft Active Directory domains is very mature and supported out of the box these days. Identity management and access control can therefore also be easily transferred to a Linux server. Technical support for the operating system itself is available as a service from many providers; we ourselves are happy to offer this service directly as part of Airflow support contracts. Operating the Docker Engine from the Airflow system environment itself eliminates many workarounds and stable operation has been tested and proven countless times worldwide.

If compliance does not allow for a server with a Linux operating system under in-house responsibility, obtaining Apache Airflow as a cloud service is a good alternative. There are turnkey offerings from Amazon Web Services ("Amazon Managed Workflows for Apache Airflow", MWAA) or Google Cloud Platform ("Cloud Composer"). If the requirements are more specialised or if direct contact and customer-oriented technical support are desired, NextLytics is also happy to operate an Airflow instance on the infrastructure of your choice. In any case, a cloud strategy and definition of data protection guidelines are required.

Apache Airflow on Windows Server - Our Conclusion

As an orchestration service for data warehouse and business intelligence platforms, Apache Airflow offers many advantages that are just as popular with companies with a Windows-based server landscape. Running Airflow directly on Windows Server operating systems is possible in principle, but always involves complex technical tricks and/or additional licence costs. Apache Airflow works best on Linux servers or directly as a cloud service and can usually also be integrated well into Windows environments in these variants. We will be happy to advise you on the best solution - NextLytics would like to open up the possibilities of Apache Airflow to everyone.

Learn more about Apache Airflow

avatar

Markus

Markus has been a Senior Consultant for Machine Learning and Data Engineering at NextLytics AG since 2022. With significant experience as a system architect and team leader in data engineering, he is an expert in micro services, databases and workflow orchestration - especially in the field of open source solutions. In his spare time he tries to optimize the complex system of growing vegetables in his own garden.

Got a question about this blog?
Ask Markus

Blog - NextLytics AG 

Welcome to our blog. In this section we regularly report on news and background information on topics such as SAP Business Intelligence (BI), SAP Dashboarding with Lumira Designer or SAP Analytics Cloud, Machine Learning with SAP BW, Data Science and Planning with SAP Business Planning and Consolidation (BPC), SAP Integrated Planning (IP) and SAC Planning and much more.

Subscribe to our newsletter

Related Posts

Recent Posts