NextLytics Blog

Efficient Continuous Deployment Monitoring with Apache Airflow

Written by Apostolos | Jul 11, 2024 9:09:00 AM

Continuous Integration and Continuous Deployment (CI/CD) is an integral part of every modern software development life cycle. A DevOps engineer is tasked with ensuring that the application is tested, built and deployed properly to a QA environment and/or delivered to the customers in a timely manner. This pipeline usually runs many times a day and the results of each individual job is immediately documented. For example, if a unit test fails or the code has syntactical errors, the team would immediately be notified and start working on fixing the issue. When the deployment is finished, potential runtime errors could arise, causing the application to exit abnormally. Failing to establish a connection to the database, failed validation of environment variables and misconfiguration of the Docker containers are problems that - among other things - can only be detected after the CI/CD pipeline is finished.

If your system does not include sophisticated monitoring tools like Prometheus and Grafana, a simple Apache Airflow DAG could do the job just fine. Running every few hours, it can identify problematic Docker containers and compose a report which will be emailed to the DevOps team, or sent to a Google space group chat as a notification. Let’s see how this DAG can be implemented.

Collect problematic containers

Our application consists of a REST API and a Postgres database. This pair is deployed on the QA environment every time we open a merge request and gets redeployed when new commits are available. All container names are prefixed with the project name (“my-project” in this example), followed by the branch name.

First, we develop two shell scripts to identify abnormally exited and restarting containers. Fortunately, this can be accomplished by leveraging the powerful Docker CLI and the grep tool.

collect_exited_containers.sh

#!/bin/bash

docker ps -a --filter 'status=exited' --format '\t\t\t\t' | grep -v 'Exited (0)' | grep “my-project”

collect_restarting_containers.sh

#!/bin/bash

docker ps -a --filter 'status=restarting' --format '\t\t\t\t' | grep nexthub

The scripts are filtering all Docker containers based on their status (exited or restating), formatting the output and selecting only the ones associated with our project. Note that we exclude containers that exit with status code 0, since this indicates a normal termination.

We also give execute permissions to both scripts.

$  chmod +x collect_exited_containers.sh collect_restarting_containers.sh

Develop Airflow DAG for monitoring

The DAG consists of five tasks:

  • collect_exited_containers
  • collect_restarting_containers
  • compose_report
  • send_email_report
  • send_notification_report

The first two are self explanatory, we’ve already outlined the shell scripts help with identifying problematic containers, so the task implementations will be fairly straight forward.

Effective workflow management with Apache Airflow 2.0 -
Download the Whitepaper here! 

The report will be composed by this simple Python callback function.

We basically fetch the results from the two previous tasks and if they’re not empty, we add them to the final message. A typical message would look like this:

Abnormally exited containers:

my-project_api_documentation-upgrades                        Exited (137) 2 days ago

my-project_api_localization-data-refactoring                   Exited (137) 2 days ago

Restarting containers:

my-project_api_establish_db_connection                         Restarting (2) 6 seconds ago

Please check the containers logs in order to fix the related issues.

 

Sending an email report can be accomplished with the EmailOperator.

And pushing a notification to a Google Chat space is done by sending a POST request to a predefined webhook.

The tasks can be arranged like so, thus creating dependencies between them.

Continuous Deployment with Apache Airflow  - Our Conclusion

With this simple sequence of tasks, we can create a mechanism that is responsible for monitoring the deployed Docker containers of an application and notify us if something is wrong. This use case demonstrates once again the versatility of Airflow very clearly, being able to integrate and communicate with multiple systems.

You can find the whole code for this DAG here. If you have questions about this implementation or if you’re wondering how you can use Apache Airflow to empower your data driven business we are happy to help you. Contact the Data Science and Engineering team at Nextlytics today.