Efficient Continuous Deployment Monitoring with Apache Airflow

Apostolos

Written By: Apostolos - 11 July 2024

Continuous Integration and Continuous Deployment (CI/CD) is an integral part of every modern software development life cycle. A DevOps engineer is tasked with ensuring that the application is tested, built and deployed properly to a QA environment and/or delivered to the customers in a timely manner. This pipeline usually runs many times a day and the results of each individual job is immediately documented. For example, if a unit test fails or the code has syntactical errors, the team would immediately be notified and start working on fixing the issue. When the deployment is finished, potential runtime errors could arise, causing the application to exit abnormally. Failing to establish a connection to the database, failed validation of environment variables and misconfiguration of the Docker containers are problems that - among other things - can only be detected after the CI/CD pipeline is finished.

If your system does not include sophisticated monitoring tools like Prometheus and Grafana, a simple Apache Airflow DAG could do the job just fine. Running every few hours, it can identify problematic Docker containers and compose a report which will be emailed to the DevOps team, or sent to a Google space group chat as a notification. Let’s see how this DAG can be implemented.

Collect problematic containers

Our application consists of a REST API and a Postgres database. This pair is deployed on the QA environment every time we open a merge request and gets redeployed when new commits are available. All container names are prefixed with the project name (“my-project” in this example), followed by the branch name.

First, we develop two shell scripts to identify abnormally exited and restarting containers. Fortunately, this can be accomplished by leveraging the powerful Docker CLI and the grep tool.

collect_exited_containers.sh

#!/bin/bash

docker ps -a --filter 'status=exited' --format '\t\t\t\t' | grep -v 'Exited (0)' | grep “my-project”

collect_restarting_containers.sh

#!/bin/bash

docker ps -a --filter 'status=restarting' --format '\t\t\t\t' | grep nexthub

The scripts are filtering all Docker containers based on their status (exited or restating), formatting the output and selecting only the ones associated with our project. Note that we exclude containers that exit with status code 0, since this indicates a normal termination.

We also give execute permissions to both scripts.

$  chmod +x collect_exited_containers.sh collect_restarting_containers.sh

Develop Airflow DAG for monitoring

The DAG consists of five tasks:

  • collect_exited_containers
  • collect_restarting_containers
  • compose_report
  • send_email_report
  • send_notification_report

The first two are self explanatory, we’ve already outlined the shell scripts help with identifying problematic containers, so the task implementations will be fairly straight forward.

collect_restarting_containers_Continuous_Deployment

collect_exited_containers_Continuous_Deployment


Effective workflow management with Apache Airflow 2.0 -
Download the Whitepaper here! 

NextLytics Whitepaper Apache Airflow


The report will be composed by this simple Python callback function.

compose_report_callback_Continuous_Deployment

compose_report_Continuous_Deployment

We basically fetch the results from the two previous tasks and if they’re not empty, we add them to the final message. A typical message would look like this:

Abnormally exited containers:

my-project_api_documentation-upgrades                        Exited (137) 2 days ago

my-project_api_localization-data-refactoring                   Exited (137) 2 days ago

Restarting containers:

my-project_api_establish_db_connection                         Restarting (2) 6 seconds ago

Please check the containers logs in order to fix the related issues.

 

Sending an email report can be accomplished with the EmailOperator.

email_sending_Continuous_Deployment

And pushing a notification to a Google Chat space is done by sending a POST request to a predefined webhook.

push_callback_Continuous_Deployment

notification_sending_Continuous_Deployment

The tasks can be arranged like so, thus creating dependencies between them.

tasks_Continuous_Deployment

Screenshot 2024-07-03 at 09-15-18 cicd_check_monitoring - Grid - Airflow_Continuous_Deployment

Continuous Deployment with Apache Airflow  - Our Conclusion

With this simple sequence of tasks, we can create a mechanism that is responsible for monitoring the deployed Docker containers of an application and notify us if something is wrong. This use case demonstrates once again the versatility of Airflow very clearly, being able to integrate and communicate with multiple systems.

You can find the whole code for this DAG here. If you have questions about this implementation or if you’re wondering how you can use Apache Airflow to empower your data driven business we are happy to help you. Contact the Data Science and Engineering team at Nextlytics today.

Learn more about Apache Airflow

Topics: Machine Learning, Apache Airflow

Share article