Update from 31.10.2024 - Version 1.1:
Since we have received a particularly large number of inquiries regarding data integration, we have significantly expanded and detailed this area. These additions are outlined in the chapter “Sources and Inbound Integration”.
With SAP Datasphere gaining popularity in the market, we see an increasing demand for guidance on governance topics like space setup, layer architecture, and self service approach. Especially SAP BW customers are asking themselves, how they can design a future proof BI landscape that utilizes the radically increased possibilities that Datasphere offers over a traditional Data Warehouse. After partnering with various organizations to guide and implement Datasphere systems with a variety of focus points, we have compiled our learnings into a comprehensive Datasphere Reference Architecture.
This architecture represents a superset of best practices and follows a modular design, allowing you to select the parts that fit your needs and make sense for your system landscape. In this blog, we aim to provide a high-level explanation of its individual components with a section at the end to put all those ideas together. Stay tuned for more detailed breakdowns of individual topic areas in future posts. We are also eager to share our best practices on other important datasphere governance topics, like developer guidelines & naming conventions, object and row level security concepts and ETL & data integration strategy at a later stage.
Key Components
Sources and inbound integration
First, we look at the integration options of SAP Datasphere: from source systems to various middleware and corresponding Datasphere ETL objects, to the table types into which these data sources are loaded.
Based on our project experience, we have carried out a structured evaluation of the integration options for each source system and mapped the most advisable option. The most powerful form of integration, namely the use of replication flows with change data capture, is visually located at the top of the figure. The weakest integration with generic delta via data flow is further down. The connection via Smart Data Integration (SDI) with the Data Provisioning Agent (DPA) as a middleware is to be found between these two options. Although it also offers Change Data Capture and is therefore preferable to Data Flows, it has proven to be less reliable in practice than Replication Flows, which are based on Data Intelligence (DI).
Some sources, such as REST APIs and event streaming, can currently only be connected to Datasphere via a middleware, such as SAP Integration Suite or SAP Open Connectors.
BW Bridge is depicted as a special case outside of this evaluation construct.
Consumers & outbound integration
The Datasphere outbound process can be categorized into four major categories:
- ODBC: The most basic yet reliable and universal technique. Third-party applications and clients can pull Datasphere data from the underlying HANA database. However, it does not support application layer-specific concepts such as semantics, associations, DACs, or non-relational objects like Analytic Models.
- OData/Native: Native connections, such as SAC and Excel add-in, are first-class citizens in terms of data consumption, supporting all functionalities. OData is theoretically almost on par but lacks good out-of-the-box solutions (e.g. Power BI + Datasphere OData limitations).
- Premium Outbound Integration: Allowing replication flows to push data to various target systems. This method comes with high additional costs.
- Other: Any scenario (e.g. REST) which is not covered by the other three categories has to be implemented with the help of a middleware.
Simple Layer Architecture
For the Layer setup we follow a minimal approach with three layers:
Inbound Layer (IL):
- data ingestion from various sources
- mostly 1-1 with slight enhancements like a source system field or a load time field
- obligatory persistence layer (local/remote tables)
Propagation Layer (PL):
- harmonizations between different sources
- semantic enrichment & data access controls happen here by the latest
- contains the bulk of expensive business logic and transformations
- optional second persistence layer (view persistence) depending on performance impact
Reporting Layer (RL):
- facilitates data consumption as the access layer for reporting clients and consumers
- limited modeling (only run-time relevant logic) and no persistence
- object type used depends on consumer system (Analytic Model or exposed View)
Watch the recording of our webinar
"SAP Datasphere and the Databricks Lakehouse Approach - How to build a Future-Proof Data Platform"
Space Concept - “less is more”
Datasphere Architects will often face a decision: Do we create a separate space for topic XYZ, or not?
This question is obviously nuanced, but to make it short - the two biggest factors you should consider are:
- Is a separate set of object level authorizations required?
- Does the workload need to be managed separately?
Barring any special cases (see below), if your answer is “no” to both, then we recommend to keep it simple and avoid creating additional spaces, as they will offer little benefit and increased effort in maintenance.
Following this “less is more” approach, we have compiled a generic representation of a common space setup across our client base. These are divided into two parts:
Generic Spaces:
Layer-agnostic spaces used for authorization, monitoring, and administration.
Main Spaces:
- includes a Central IT Space with data models for central reporting and IT-managed data products,
- a few special spaces with the BW Bridge Inbound Space (which is a technical requirement for a BW Bridge system) and a ODBC Consumption Space (which is a necessary workaround for a variety of ODBC use-cases),
-
as well as multiple Business Spaces representing varying self service maturity levels and correspondingly featuring a varying depth of integration into the Simple Layer Architecture with the sharing of objects from the Central IT Space via the appropriate Layer.
Self Service Maturity Model
To Implement an effective self service strategy for your organization it is necessary to analyze your user base. Showcased here is a simple framework to do just that. We can group our users in e.g. their respective functional business teams and define three degrees of maturity to classify these teams into. Each team would get their own version of these template spaces with the corresponding authorizations according to their classification.
The “Central Reporting Consumers” are not actively working in the Datasphere system. They are solely interested in consuming the it-managed central reporting in various frontend applications.
The “Self Service Modellers” want to work within Datasphere to enrich data models, build their own KPIs and combine data models in new ways.
The “Data Product Team” is responsible for providing their data models from A-Z. Only the initial integration into the Inbound Layer and monitoring of data loads is still handled by the central IT team.
Each classification is handled differently. The higher the maturity level, the more autonomy they receive organizationally, the more technical privileges they get in the system and the more responsibilities they have to provide a form of data product for the rest of the organization.
This whole process is less so a technical challenge within Datasphere and more of an organizational and change management effort. It could just involve an analysis of the status quo, or it could be part of a larger data democratization strategy to reach a to-be goal in terms of self-service ability. It could also involve different classifications than shown here and different responsibilities, depending on your organization's user base and goals.
In any case the Datasphere Architecture has to consider these circumstances.
Datasphere Reference Architecture - Putting it all together
By combining these core ideas of:
- a simple layer architecture,
- a “less is more” approach to a space concept,
- a self service strategy that considers user base maturity classifications
- and how various types of upstream and downstream systems interact with the core architecture,
we provide a modular framework to help your organization drive a future-proof Datasphere implementation that supports your overall data strategy.
Stay tuned for detailed blogs and deep dives on individual areas of the architecture. Let us know your thoughts and questions, as well as what areas you are interested in hearing more about.
Or would you rather like to exchange thoughts personally? Sure no problem, just follow the link: https://www.nextlytics.com/meetings/irvin-rodin
We look forward to exchanging ideas with you!