Apache Airflow is an open-source orchestration platform that provides a Python code–based interface to schedule, manage, and scale workflows and a user interface to monitor the status of all workflows and tasks. Due to its code-first nature, Airflow is highly customizable and extensible with the ability for users to add their own operators and hooks. It is characterized by elasticity since it has the ability to scale in order to host any amount of workflow pipelines and can enhance resource efficiency and workflow optimization with parallel execution of tasks. As a system of micro services it benefits from advantages like scalability, fault isolation, deployment flexibility since the user can just deploy or update individual components, and others. Though often used to orchestrate data processing pipelines, Airflow is generally agnostic of what kinds of tasks it schedules and can be used for any imaginable purpose where digital workloads need scheduling and orchestration.
Small intro to Single Sign On (SSO)
These days managing numerous online accounts and having to remember multiple passwords can be a daunting task. SSO offers a solution by enabling users to access multiple accounts with a single set of credentials. The above mentioned way is not only beneficial for the user who does not have to remember different credentials and avoid the repeated sign in process in several resources, but enhances security and governance. With the use of Microsoft Entra ID as an Identity Provider(IdP) in order to establish SSO into the organization, the IT security will be strengthened and the organization will have more effective corporate governance by using a centralized authenticator.
SSO, as practice, is widely used in order to streamline user’s access and safeguard sensitive information. There are a number of ways to implement SSO in an application such as SAML or different OAuth 2.0 flows, the latter of which we are going to use in this example.
How it works
SSO works by centralizing the authentication process to an Identity Provider. When a user is prompted to use his credentials in a protected application, the user is redirected to the IdP, where the user has to authenticate themselves. After successful authentication, the IdP takes over and depending on the specific authentication flow that is used, the tokens received from the IdP are parsed and the user is redirected back to the application they did the login for.
There are a variety of different OAuth flows for different use cases. In our use case we are going to implement the “OAuth 2.0 authorization code grant”, also known as “auth code flow”. This is an OAuth flow commonly used by Single Page Web Applications (SPA) and Server Based Web Applications like Airflow.
source: https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-auth-code-flow (article version)
https://learn.microsoft.com/en-us/entra/identity-platform/media/v2-oauth2-auth-code-flow/convergence-scenarios-native.svg (image)
A small intro to Microsoft Entra ID
Microsoft Entra ID, formerly known as Azure Active Directory (Azure AD) is a cloud based identity and access management service that users within an organization can use to access internal (any cloud apps developed by your organization) or external (Microsoft 365, Azure portal or even one of the numerous of the SaaS applications). It provides different benefits to the users based on their role as it is shown in the table below:
Role |
Benefit |
IT Admins |
Enables them to manage access to applications and resources based on business needs with its ability to provide multi-factor authentication between critical resources. |
App Developers |
Can use Microsoft Entra ID as a standards-based authentication provider that helps them add single sign-on (SSO) to apps that work with a user's existing credentials. Developers can also use Microsoft Entra APIs to build personalized experiences using organizational data |
Cloud services users |
They already use Microsoft Entra ID as every Microsoft 365, Office 365, Azure, and Dynamics CRM Online tenant is automatically a Microsoft Entra tenant. |
Effective workflow management with
Apache Airflow 2.0
Configuring Airflow to use OAuth2
Authentication in Apache Airflow is managed via an underlying implementation of the Flask-AppBuilder. Therefore configuring OAuth authentication for Apache Airflow needs to happen in the configuration file for the webserver component of the application, and since it is based on the Flask-AppBuilder, the same approach can be applied to other applications based on the same technology, like for example Apache Superset and many others.
First of all, the ”AUTH_TYPE” variable has to be set to ”AUTH_OAUTH”.
The configuration in the “OAUTH_PROVIDERS” will vary depending on the exact identity provider, so the setup might not be one-to-one applicable for other OAuth identity providers like AWS.
Afterwards you will need to create a role mapping between the access roles configured in your Azure Application and the role structure in your instance of Apache Airflow. This ensures that each user will be able to access only the scope you have defined for them, which is especially useful in larger deployments or development teams. In our case this means mapping the Azure Application role “airflow_prod_admin” to the Airflow role “Admin” and so on.
Last but not least, you will have to implement the “AzureCustomSecurity” class, which handles requesting and parsing the Access Token from the Identity provider.
How to configure the App in Azure
To create an application in Azure and define whatever needed to achieve the SSO on Airflow the user has to follow a number of steps. Firstly, on the Azure portal click Microsoft Entra ID and then, under the manage menu, Enterprise applications. There the user has to create a new application and select “Integrate any other application you don’t find in the gallery”.
Then you have to create the application, and add a redirect and logout URL according to your airflow instance. Then configure the app roles like described above in the tab App Roles. Lastly you will have to set up an optional claim and a group claim in the “Token configuration” section, allowing access to the fields “email”, “family_name”, “given_name”, “preferred_username” and “upn”. This makes the token issued by the IdP contain this information, which is required by airflow to execute the authentication.
Implementing Single Sign On Authentication - Our conclusion
Integrating SSO into Apache Airflow can significantly enhance both, security and user convenience and also align with modern authentication standards. Making use of an IdP to achieve this, the organization can simplify access management and the users will not have to worry about numerous sets of credentials enhancing productivity and the user experience. As organizations continue to embrace cloud based solutions, implementing SSO is a crucial step to ensure that both organizations and users can work with efficiency and security. We at NextLytics will be happy to advise you on the best solution - for this specific use case or other challenges you might face.