Corporate sustainability is in demand. Regardless of whether the sustainability of products is increased or the internal ecological footprint is reduced. In order for the trend to take hold in companies in the long term, it is important to build up and track key indicators. The CO2 footprint is a prominent measure of ecological sustainability. Here, the greenhouse gas emissions caused by systems, processes, and resources are converted into a carbon dioxide equivalent. Carbon accounting is also important in the field of Data Science in order to bring about an improvement with regard to the sustainable design of Artificial Intelligence.
Under the term "Carbon Accounting", systematic recording of the emissions is introduced in companies. This is a complex undertaking, as various data regarding the object of measurement must be entered or measured. In the case of AI, emission generation during model runtime and training is especially important, while hardware provision is often neglected. Nevertheless, coverage raises awareness and is the first step for positive change.
In this article, we will introduce you to metrics and frameworks to measure the footprint of your AI applications. First, we explain the benefits you can gain from tracking emissions.
Measuring a subject provides insight into the current state and the possibility to initiate changes in the right direction. For thecommunication to the management a reporting can be built up, which shows the change process. Here it can be tracked whether the footprint of all AI applications is increasing or even on a downward trend due to the introduction of measures. Root cause analysis to derive measures is enabled by the capture: Which models have the highest footprint and why? Especially when qualitative data on carbon intensity of energy is available, it is possible to determine times and places that lead to increased emissions. Comparing alternatives can provide an answer as to which option is appropriate for sustainable AI design. For example, different types of hardware can be compared. Here, only the parameters can be adjusted in tools without having to make a real change. Here, the theoretical execution time can also be adjusted. For example, there are studies that show that an expensive model training during the week has a 5.7-8.5% higher footprint than if the training is done on the weekend with a lower server load. You can find more measures in our whitepaper on Green AI.
Many influencing factors determine the actual footprint of AI applications.
For companies, incorporating current energy mix and carbon intensity values to accurately estimate the footprint of the Machine Learning model is complex. On the one hand, the data is not available or is difficult to access. On the other hand, the mapping to the execution times of processes is costly and has to be planned and implemented with a dedicated infrastructure.
For the operational everyday life some substitute metrics have therefore resulted, which are sufficient for the first insight.
Energy consumption can be used to approximate the footprint for an averaged Carbon Intensity of the site. Monitoring purely by cost runs the risk that, even with constant instance type and configuration, an abberant price increase will confound the system.
In fact, the step of recording emissions is so fundamental to the emerging topic of Green AI that there are already a number of emissions calculators available. The first companies are already incorporating them into self-service systems. Business users, for example, get an awareness of the emissions caused when selecting a Time Series Forecasting Model.
For developers, there is basically the option to integrate the emission calculator as a program library or to access a web-based variant.
Machine Learning Emissions Calculator
Since 2020, the web-based Machine Learning Emissions Calculator has emerged from a research project. Here, public data sources are used to always obtain current values regarding the energy consumption of the hardware and the energy mix of the site. The calculator is suitable as a first projection. Here, differences in the use of different hardware and cloud locations can easily be made visible. A calculation in the own infrastructure can also be specified.
Machine Learning CO2 Impact Calculator: The web-based Machine Learning Emissions Calculator
requires only a few pieces of information to make an initial projection.
CodeCarbon
A Python library which determines the footprint of software. Here also an execution on a personal computer can be quantified. The publicly available package is easy to use and has as output a CSV file with the current CO2 emissions. An API that tracks the emissions over time is currently available in alpha version. The emissions are also converted to kilometers driven to allow easy communication to the outside world. CodeCarbon is recommended due to its easy integration into existing code.
from codecarbon import EmissionsTracker tracker = EmissionsTracker() tracker.start() # code to track tracker.stop() |
CarbonTracker
For the development of deep learning models, CarbonTracker is a good way to estimate the footprint via a library. Here, the energy consumption is projected over a given number of measured epochs. The output is done directly in the output.
CarbonTracker: The following components were found: CPU with device(s) cpu:0. CarbonTracker: Carbon intensity for the next 2:59:06 is predicted to be 107.49 gCO2/kWh at detected location: Copenhagen, Capital Region, DK. CarbonTracker: Predicted consumption for 1000 epoch(s): Time: 2:59:06 Energy: 0.040940 kWh CO2eq: 4.400445 g This is equivalent to: 0.036549 km traveled by car CarbonTracker: Finished monitoring. |
https://github.com/lfwa/carbontracker
EnergyVis
Based on CarbonTracker, EnergyVis can provide a dashboard for live tracking of emissions. The downside, however, is that only regions in the US are included in the current version. However, in further iterations, regions in Europe will also be added. A live demo can be viewed at https://poloclub.github.io/EnergyVis/.
Capturing the footprint of machine learning models is an important cornerstone to create awareness for environmental sustainability in the data science domain. In principle, initial steps can be taken without quantifying the impact, but communication of the success is thereby diminished.
Many libraries and web-based tools are already available for developers to estimate the footprint. Embeddable libraries are particularly suitable for automation.
Do you have further questions about the sustainable design of your data science area? We will be happy to advise you on possible steps and support you in the implementation. Please contact us.