Skip to content
NextLytics
Megamenü_2023_Über-uns

Shaping Business Intelligence

Whether clever add-on products for SAP BI, development of meaningful dashboards or implementation of AI-based applications - we shape the future of Business Intelligence together with you. 

Megamenü_2023_Über-uns_1

About us

As a partner with deep process know-how, knowledge of the latest SAP technologies as well as high social competence and many years of project experience, we shape the future of Business Intelligence in your company too.

Megamenü_2023_Methodik

Our Methodology

The mixture of classic waterfall model and agile methodology guarantees our projects a high level of efficiency and satisfaction on both sides. Learn more about our project approach.

Products
Megamenü_2023_NextTables

NextTables

Edit data in SAP BW out of the box: NextTables makes editing tables easier, faster and more intuitive, whether you use SAP BW on HANA, SAP S/4HANA or SAP BW 4/HANA.

Megamenü_2023_Connector

NextLytics Connectors

The increasing automation of processes requires the connectivity of IT systems. NextLytics Connectors allow you to connect your SAP ecosystem with various open-source technologies.

IT-Services
Megamenü_2023_Data-Science

Data Science & Engineering

Ready for the future? As a strong partner, we will support you in the design, implementation and optimization of your AI application.

Megamenü_2023_Planning

SAP Planning

We design new planning applications using SAP BPC Embedded, IP or SAC Planning which create added value for your company.

Megamenü_2023_Dashboarding

Dashboarding

We help you with our expertise to create meaningful dashboards based on Tableau, Power BI, SAP Analytics Cloud or SAP Lumira. 

Megamenü_2023_Data-Warehouse-1

SAP Data Warehouse

Are you planning a migration to SAP HANA? We show you the challenges and which advantages a migration provides.

Business Analytics
Megamenü_2023_Procurement

Procurement Analytics

Transparent and valid figures are important, especially in companies with a decentralized structure. SAP Procurement Analytics allows you to evaluate SAP ERP data in SAP BI.

Megamenü_2023_Reporting

SAP HR Reporting & Analytics

With our standard model for reporting from SAP HCM with SAP BW, you accelerate business activities and make data from various systems available centrally and validly.

Megamenü_2023_Dataquality

Data Quality Management

In times of Big Data and IoT, maintaining high data quality is of the utmost importance. With our Data Quality Management (DQM) solution, you always keep the overview.

Career
Megamenü_2023_Karriere-2b

Working at NextLytics

If you would like to work with pleasure and don't want to miss out on your professional and personal development, we are the right choice for you!

Megamenü_2023_Karriere-1

Senior

Time for a change? Take your next professional step and work with us to shape innovation and growth in an exciting business environment!

Megamenü_2023_Karriere-5

Junior

Enough of grey theory - time to get to know the colourful reality! Start your working life with us and enjoy your work with interesting projects.

Megamenü_2023_Karriere-4-1

Students

You don't just want to study theory, but also want to experience it in practice? Check out theory and practice with us and experience where the differences are made.

Megamenü_2023_Karriere-3

Jobs

You can find all open vacancies here. Look around and submit your application - we look forward to it! If there is no matching position, please send us your unsolicited application.

Blog
NextLytics Newsletter Teaser
Sign up now for our monthly newsletter!
Sign up for newsletter
 

Which feature store fits your machine learning framework?

In last week’s NextLytics blog article, we highlighted the common capabilities of feature store software and discussed the reasons to contemplate the investment into such an additional component. 

This week, we focus on the options your organization has if you want to implement a feature store and which arguments should be considered.

Recap

Let’s begin with a short reprise of what a feature store is and how your organization or team might benefit from introducing one into your machine learning architecture.

The Feature Store Concept

The concept of dedicated feature stores has been around for a few years now. Large tech and internet companies have coined the term to describe their emergent solutions to a remaining blank spot on the data landscape right in the middle of the data processing chain. As of today, feature stores are reportedly an integral part of the infrastructure of successful data-driven companies and a cornerstone of scalable machine learning frameworks. Feature stores act as an interface between different data sources and machine learning development and operations frameworks. They can contribute to a more efficient feature engineering and data discovery process and free data scientists from repetitive tasks. 

 

2022-02 Feature Store_machine learning frameworkDo you need a Feature Store?

Are your data scientists spending more time on data wrangling and feature engineering than actually developing and optimizing machine learning models? Are the same features prepared again and again for every AI application you operate? Do historical copies of feature data sets and training data sets pile up in a data lake? Are your data engineers launching more and more RESTful APIs to serve custom metrics to machine learning applications? Or does the number of custom tables created in an analytics database by your ML team grow exponentially?

If you can answer one or more of these questions with a definitive “yes”, you might want to investigate the possibility of introducing a dedicated feature store into your data science infrastructure.

Implementation Options

But what options are available for your organization if you decide to incorporate a feature store into your current data science architecture?

Three approaches can be surveyed in blog articles, whitepapers, and conference talks:

  1. use an integrated solution by your machine learning framework vendor 
  2. adapt open source software to your custom infrastructure
  3. build an in-house solution to solve specific needs

Integrated Product

Relying on the existing product by your established vendor is definitively the most convenient option. AWS, Databricks, and Microsoft Azure incorporate feature store components in their general “AI” frameworks that come equipped fully integrated with user interfaces and user management services. The same goes for most vendors that offer on-premise turnkey solutions based on proprietary software.

Integrated feature stores boast little to no effort when first added to your machine learning architecture. Apart from some initial configuration, these solutions are ready for immediate usage and most work may be spent defining the workflow conventions among the users, probably your organization’s data engineering and data science teams. These quick wins may come at the expense of working around the product’s specific functionality or API that is predefined and not necessarily a perfect fit for your current workflows and processes. Uptake of new functionality into these vendor-managed products may be slower or harder to influence for a single customer in comparison to open source projects.

Furthermore, these feature store solutions are off-limits, if your organization is not already using the respective product or not able or willing to migrate.

Open Source Software

Adopting an open source solution allows your organization to hand-pick the best suited solution, possibly testing multiple alternatives before deciding on the optimal fit. Dedicated open source solutions for feature stores are still few and far between, though.

Feast (Feature Store) has been the first open source solution in the market and is still growing. At first glance, Feast comes across as an easy to use system well integrated with the Python data science stack and supporting a diverse set of third-party service connections. With Azure building their feature store solution based on this project and willing to contribute to the code base, Feast looks very promising.

An alternative solution is the Hopsworks feature store which comes integrated into a larger full-service machine learning operations framework.- Built on top of proven big data technologies like Apache Spark and Apache Hudi, the Hopsworks solution is ready to scale to the maximum. The scalability comes at the cost of complexity from an operational perspective which makes it hard to spin up a local instance for a test ride if you don’t have access to at least a small Kubernetes cluster. The overall software is well accessible both from an API and user interface perspective; good visualizations and navigation of the feature store in an intuitive web-based interface are a definitive advantage compared to Feast.


Download the Whitepaper and discover the potential of Artificial Intelligence and Machine Learning! 

ML AI for Business


In-house Development

If neither fully integrated product or open source solution meet your specific requirements, building a custom in-house solution may be a viable alternative. 

The majority of data-driven companies that have publicly discussed their feature store solutions have chosen to implement an in-house solution. Building yourself is a sizable investment in the beginning as you either need the qualified software engineers on your staff or find the right development partner. Fitting the resulting system exactly to established system context, processes, and strategy may result in significantly higher user satisfaction as well as lower effort in change management and maintenance costs.

We have already implemented machine learning architecture solutions for our customers that re-used existing systems as a feature store. In these projects, we achieved exactly the functionality required by the data science team with little custom software development. In other use cases, implementing a lean RESTful API as a harmonization layer may serve all of a teams’ needs with regard to optimizing the feature engineering stage.

Example: Integrating Feature Store and Apache Airflow

The following Python code snippet exemplifies how relatively easy interaction with a feature store is, in this case using the Hopsworks feature store and respective Python library “hsfs”. First, we create a feature group based on a Pandas DataFrame object:


fs_code1_machine learning framework
Python Code Example 1: How to create a new feature group based on a Pandas DataFrame
and upload it to the Hopsworks feature store.

The example code for creating and registering features would be isolated into a dedicated feature engineering stage of our overall workflow. Retrieving the data from the feature store in the actual machine learning development boils down to three simple statements in the code, as shown in the second code snippet:

  1. Connecting to the store (lines 1-12)
  2. Creating pointer variables (15-16) 
  3. Retrieving the actual data that is already formatted correctly for downstream processing (19-24)

fs_code2_machine learning framework

Python Code Example 2: How to join, filter, and retrieve data from a Hopsworks feature store.

In the full processing chain, creating, registering, uploading, and updating features would be extracted from the actual code that trains or applies a machine learning model. These extracted steps form a new layer of feature ingestion jobs. Feature engineering and ingestion can be managed by any capable ETL orchestration tool like for example by an instance of Apache Airflow as central ML workflow controller.

2022-03-02_Airflow-Feature-Engineering_machine learning framework

Using Apache Airflow to orchestrate a dedicated feature engineering stage in the
machine learning development and operations workflow.

This example shows how a feature store can be integrated into an existing machine learning operations framework. This blueprint works with a specialized open source feature store software as well as with any in-house feature store implementation. It demonstrates how the benefits of a feature store component and dedicated feature engineering stage in the processing pipeline may be accessible for your organization regardless of what your current architecture looks like.

Conclusion

The feature store is one of the most promising trends in the data engineering and machine learning community. Platform service providers have recognized the potential and added integrated feature stores to their machine learning infrastructure portfolios. Open source software alternatives are available both for autonomous adoption by your infrastructure team and as managed services.

The dedicated feature store implementations are still in the maturation process, though. To gain the most out of the feature store concept, a custom in-house solution may be a viable and quick win solution for your organization. 

The success of introducing a feature store depends to a large part on precise analysis, requirements engineering, and a realistic evaluation of the risks and rewards of implementation choices. An experienced partner like NextLytics can provide valuable insights in the assessment of alternative solutions and guide you to the best decision for your organization.

Do you have further questions about feature stores as an addition to your data science infrastructure or do you need an implementation partner? 

We are happy to support you from problem analysis to technical implementation. Get in touch with us today!

Learn more about Machine Learning and AI

avatar

Markus

Markus has been a Senior Consultant for Machine Learning and Data Engineering at NextLytics AG since 2022. With significant experience as a system architect and team leader in data engineering, he is an expert in micro services, databases and workflow orchestration - especially in the field of open source solutions. In his spare time he tries to optimize the complex system of growing vegetables in his own garden.

Got a question about this blog?
Ask Markus

Blog - NextLytics AG 

Welcome to our blog. In this section we regularly report on news and background information on topics such as SAP Business Intelligence (BI), SAP Dashboarding with Lumira Designer or SAP Analytics Cloud, Machine Learning with SAP BW, Data Science and Planning with SAP Business Planning and Consolidation (BPC), SAP Integrated Planning (IP) and SAC Planning and much more.

Subscribe to our newsletter

Related Posts

Recent Posts