Skip to content
NextLytics
Megamenü_2023_Über-uns

Shaping Business Intelligence

Whether clever add-on products for SAP BI, development of meaningful dashboards or implementation of AI-based applications - we shape the future of Business Intelligence together with you. 

Megamenü_2023_Über-uns_1

About us

As a partner with deep process know-how, knowledge of the latest SAP technologies as well as high social competence and many years of project experience, we shape the future of Business Intelligence in your company too.

Megamenü_2023_Methodik

Our Methodology

The mixture of classic waterfall model and agile methodology guarantees our projects a high level of efficiency and satisfaction on both sides. Learn more about our project approach.

Products
Megamenü_2023_NextTables

NextTables

Edit data in SAP BW out of the box: NextTables makes editing tables easier, faster and more intuitive, whether you use SAP BW on HANA, SAP S/4HANA or SAP BW 4/HANA.

Megamenü_2023_Connector

NextLytics Connectors

The increasing automation of processes requires the connectivity of IT systems. NextLytics Connectors allow you to connect your SAP ecosystem with various open-source technologies.

IT-Services
Megamenü_2023_Data-Science

Data Science & Engineering

Ready for the future? As a strong partner, we will support you in the design, implementation and optimization of your AI application.

Megamenü_2023_Planning

SAP Planning

We design new planning applications using SAP BPC Embedded, IP or SAC Planning which create added value for your company.

Megamenü_2023_Dashboarding

Dashboarding

We help you with our expertise to create meaningful dashboards based on Tableau, Power BI, SAP Analytics Cloud or SAP Lumira. 

Megamenü_2023_Data-Warehouse-1

SAP Data Warehouse

Are you planning a migration to SAP HANA? We show you the challenges and which advantages a migration provides.

Business Analytics
Megamenü_2023_Procurement

Procurement Analytics

Transparent and valid figures are important, especially in companies with a decentralized structure. SAP Procurement Analytics allows you to evaluate SAP ERP data in SAP BI.

Megamenü_2023_Reporting

SAP HR Reporting & Analytics

With our standard model for reporting from SAP HCM with SAP BW, you accelerate business activities and make data from various systems available centrally and validly.

Megamenü_2023_Dataquality

Data Quality Management

In times of Big Data and IoT, maintaining high data quality is of the utmost importance. With our Data Quality Management (DQM) solution, you always keep the overview.

Career
Megamenü_2023_Karriere-2b

Working at NextLytics

If you would like to work with pleasure and don't want to miss out on your professional and personal development, we are the right choice for you!

Megamenü_2023_Karriere-1

Senior

Time for a change? Take your next professional step and work with us to shape innovation and growth in an exciting business environment!

Megamenü_2023_Karriere-5

Junior

Enough of grey theory - time to get to know the colourful reality! Start your working life with us and enjoy your work with interesting projects.

Megamenü_2023_Karriere-4-1

Students

You don't just want to study theory, but also want to experience it in practice? Check out theory and practice with us and experience where the differences are made.

Megamenü_2023_Karriere-3

Jobs

You can find all open vacancies here. Look around and submit your application - we look forward to it! If there is no matching position, please send us your unsolicited application.

Blog
NextLytics Newsletter Teaser
Sign up now for our monthly newsletter!
Sign up for newsletter
 

Feature Stores in Machine Learning Architectures

As more and more business intelligence use cases rely on machine learning (ML) models to support advanced analytics, operating these models in a reliable and scalable framework becomes a cornerstone of data teams’ work. An emerging logical component of the ML framework is the feature store that bridges data sources and model development. Where this has previously been a socio-technical interface between data warehouse teams and data scientists, introducing an actual technological component to harmonize data usage in ML development may greatly improve efficiency.

This article will introduce the concept of the feature store and highlight the promises it holds. You will learn whether your organization could benefit from a feature store or not. 

2022-02_overview_Feature Store

The feature store acts as an additional abstraction layer between data sources and data scientists. Source systems in this context may refer to already well-defined and curated data warehouse or data lake ecosystems. Data retrieved from these systems for machine learning applications will be subject to a feature engineering process usually applied by the data scientist working on an ML model. Feature engineering adds further transformation and cleaning steps to the data retrieved from a warehouse or data lake to meet the syntactic and semantic demands of the chosen machine learning algorithms. This may be as simple as querying aggregates from the right data warehouse tables, or it may be of immense effort as data from multiple (internal and external) sources has to be merged, complex aggregates have to be calculated, normalizations applied, etc.

example_feature_stores_enginneering_workflow

Typical example of feature engineering steps in a usual ML project with three sources. The result will be stored in a feature store for direct access instead of repeating those steps.

As more ML models are established in the enterprise, the same features are also used more often. Specifically, customer features can be used in customer segmentation for targeted marketing, an incoming payment forecast, or in the creation of recommendation systems. The complexity of the feature engineering steps multiplies when they have to be applied for multiple ML workflows. Not only is the computational cost increased but the resulting duplication of code invites errors, divergence of processes, and increased maintenance effort.

The feature store aims to reduce these risks and efforts. All feature engineering steps are unhinged from the model development and moved to a dedicated feature store ingestion stage. The centralization is supplemented by explicit versioning of the features in order to be able to use the same features in retraining. At the same time, a new version of these (for example, by adjusting the fill value of missing values) can be flexibly tried out. Data scientists only have to query the feature store API to retrieve input for their work in the required syntactic and semantic form, e.g. loading data directly as a DataFrame object instead of downloading and reading from a file first. Apart from decoupling and streamlining the development process, this enables reuse and usage tracking of features. Validation may be included in the feature ingestion stage to provide only feature data that meets necessary quality criteria.

Feature Store Functionality

Introducing a feature store into the machine learning workflow promises to decouple and disentangle the data flow between source systems and model building/serving stages. Additionally, feature stores offer some distinct functionality to further improve efficiency of the process which are not typically provided by common sources:

  • Integration of online and offline data sources
  • Training data management
  • Feature registration and discovery
  • Feature versioning and “time travel” functionality
  • Tracking of data lineage

How to advance your business through AI and Machine Learning - Download your whitepaper here! 

ML AI for Business


Online/Offline Source Integration

The feature store provides a single point of contact for downstream data consumers like machine learning models. The task of collecting data items from different source systems is hidden from the data scientist behind the feature store API. The feature store can internally split between low-latency “online” sources and bulk data “offline” sources to optimize performance. If newest data items from a streaming source are required for a downstream application, these can be directly pulled from the message queue system, aggregated with most recent items from the online storage, and delivered in near real-time. If a large historical set of data items is required, these can be loaded from offline sources like data objects from a data lake system, aggregated, and delivered as a single batch in due time. The complexity of joining semantically identical data from different sources is no longer of any concern for the data scientist preparing the model.

Training Data Management

A feature store may provide an API endpoint to join feature sets into training datasets internally. The training dataset managed by the feature store engine and pre-calculated splits into test and validation subsets can be directly retrieved by a downstream consumer. Data scientists working with a feature store that handles training data like this can offload another recurring task from actual model development jobs. The same training dataset can be reused in an arbitrary number of downstream jobs.

Feature Registration and Discovery

Depending on the feature store software, user interfaces and API endpoints for self-service feature registration and feature discovery can be part of the system. Feature or data discovery is an integral part of data scientists’ work. Depending on the maturity of source systems or (meta)data catalog systems, it can also be a time consuming and expensive task. A feature store with a rich user interface may enable fast discovery of the right data items for a new machine learning project or help to point out missing items. If feature registration is offered as a self-service, data scientists could be encouraged to create the necessary items themselves without waiting for data engineering or data warehouse teams to implement pipelines or views.

Feature Versioning / Time Travel

Acquiring different chronological snapshots of the same data can be necessary for appropriate training and evaluation of machine learning models. Feature stores combine every registered value with a timestamp and can provide comprehensive versioning of all data items. Retrieval of snapshots is generally a simple matter of including an additional timestamp parameter when querying data from the API.

Data Lineage Tracking

Based on the feature versioning mechanisms, feature stores can collect, track, and sometimes visualize data lineage. Lineage across the entire data engineering and machine learning pipeline may be crucial to build reliable and trusted data-driven applications. Feature stores come equipped with the necessary API endpoints to retrieve lineage metadata for registered items, e.g. when the respective feature version has been created, retrieved, or combined into a training dataset.

Feature Stores - Our Conclusion

Feature stores aim to further harmonize the machine learning development and operations processes. As more and more models are translated and incorporated into business applications, the additional logical feature layer increases data teams’ efficiency by enabling re-use and easy reproduction of tasks.

The feature store can be an important component in a machine learning operations framework and contribute significantly to an overall advanced analytics self-service infrastructure. It can help to declutter model development workflows and provide data engineering- and data science teams with a well-defined interface to improve collaboration. Introducing a new layer of abstraction comes at the cost of increased system complexity and maintenance though. The decision whether to invest into a dedicated feature store should therefore be well considered and actual pain points of your organization's current architecture carefully weighed against the possible benefits.

We will focus on the topic of feature stores again next week with an overview of implementation options and available software solutions in the market.

Do you have further questions about feature stores as an addition to your data science infrastructure or do you need an implementation partner? We are happy to support you from problem analysis to technical implementation. Get in touch with us today!

Learn more about Machine Learning and AI

avatar

Markus

Markus has been a Senior Consultant for Machine Learning and Data Engineering at NextLytics AG since 2022. With significant experience as a system architect and team leader in data engineering, he is an expert in micro services, databases and workflow orchestration - especially in the field of open source solutions. In his spare time he tries to optimize the complex system of growing vegetables in his own garden.

Got a question about this blog?
Ask Markus

Blog - NextLytics AG 

Welcome to our blog. In this section we regularly report on news and background information on topics such as SAP Business Intelligence (BI), SAP Dashboarding with Lumira Designer or SAP Analytics Cloud, Machine Learning with SAP BW, Data Science and Planning with SAP Business Planning and Consolidation (BPC), SAP Integrated Planning (IP) and SAC Planning and much more.

Subscribe to our newsletter

Related Posts

Recent Posts