SAP and its metadata
In a recent interview with The Wall Street Journal Arvind Krishna, IBM’s senior vice president of cloud and cognitive software, said “data-related challenges are a top reason IBM clients have halted or cancelled artificial-intelligence (AI) projects.”
Much of this results from the nature of SAP and its metadata. Without an easy and fast way to access and exploit the SAP data model, projects can overrun, falter and sometimes fail to deliver as data teams struggle to understand the meaning of the metadata.
Customers often comment to us that they anticipated that the data model would be made easily available. For example, after they had requested a copy of the SAP data model to use with their SAP Business Objects project, the Business Intelligence team at Hydro Tasmania were told that it was ‘not available’.
We hear about similar challenges faced by data professionals engaged in data catalog, data governance, master data and data migration projects involving SAP. In fact SAP provide nothing for data analysts to use to help them navigate the metadata which is critical to their effectiveness.
There are 5 reasons why it so difficult to access and make sense of the SAP data model. There is also an alternative to the time consuming, expensive and often inaccurate methods used to find and use SAP metadata which I will introduce later.
Size of SAP data model
A typical SAP ECC system has over 90,000 base tables. Trying to navigate this number of tables without a dedicated discovery tool is impractical and time consuming. Many of these tables have hundreds of attributes and while many may not be in use they are still included in the overall model. The data model resulting from this is huge and defies any attempt by data analysts to make sense of it. SAP BW is also large and complex, meaning it can take a long time to understand its structures.
If you were to look at SAP ECC in terms of an application hierarchy which breaks the data model down into its application components, function groups, programs etc., we have calculated that this would have over 3 million nodes.
Even if it was practical to use a data modeling tool to reverse engineer the metadata from the database, the model would be too large to navigate.
Finally it is clearly not feasible to print a copy of the entire data model.
To add to the problem associated with the size of the data model, the Database System Catalogue does not provide business descriptions for tables or columns. In addition there are no primary and foreign key relationships between the tables defined there. This means that unless someone happens to know how tables are related you will spend a lot of time and effort trying to figure it out. Knowing for example, how tables relating to “customer” are linked to “order” is critical.
This means that even if you can connect to the database or reverse engineer it with a standard data modeling tool the information gleaned is of very limited use. Rich metadata does exist, it is just not available easily.
It also means that if the vendors who sell information and data management tools for tasks such as ETL, Data Migration, Master Data, Governance etc. are proposing to connect to your SAP system, the question to ask is how do they know which tables to connect to?
They might suggest the use of templates, however unless you can be sure that their template exactly matches how your SAP system has been implemented you will still need to do a significant amount of work to compare the two versions of SAP.
Often these vendors will expect your staff to know where the data is within SAP for a specific task, whilst you might expect them to have that knowledge.
Obviously SAP is designed and built to fulfil a broad range of business processes and functions. This explains that large data model and also accounts for its complexity. In fact the model is so intricate as to be almost impenetrable without specialist knowledge.
This is why organisations routinely have to employ expensive consultants or re-task scarce internal specialist technical staff to assist their data projects. The result is that control over when and how metadata discovery is performed, together with its cost and effectiveness is passed to resources often beyond the control of the project.
As new data requirements come on stream, for example for data quality or governance (e.g. GDPR or CCPA) understanding the SAP data model becomes more important. As an illustration, for GDPR it is necessary to know where you store Personal Data so that you can be certain of data lineage, respond to requests and understand how data is used and shared.
In many systems this is probably quite straightforward. The tables have names like “customer” and attributes are also easy to understand such as “firstname”, “telephone”, “birth” etc.
But doing that in SAP is much more difficult. Say for instance, that you needed to find all instances of the string “birth” in any attribute in order to determine where that exists so that you can find where it is being used. To start with of course, that information is not in the RDBMS System Catalogue.
Therefore any scanning or modeling tools which only look there for their information will be of no value and so it will be necessary to use traditional manually based methods for discovery. For reference in our SAP system “birth” appears in over 300 tables.
Virtually all SAP systems have had customisations made to their data model. This means that any reference model or templates are of limited value as they may not represent how the application has actually been implemented and also precious time is taken up trying to compare the reality of what has been implemented with the baseline model.
The customisations can add significantly to the problem. For instance we know of one SAP customer whose data model has expanded to 117,000 tables.
In addition as the data model for an SAP implementation goes through different iterations over time, it is important to be able to keep a record of those changes. This is potentially critical for data governance, reporting and analytics and master data. Without any mechanism to accurately and quickly compare partial or whole SAP data models using technology this is another time-consuming and costly task with potential inaccurate outcomes.
Whilst there are some SAP tools which have capabilities for finding metadata, these are reserved for use by technical specialists rather than data analysts or architects.
Neither SAP nor any of the main Information or Data Management software vendors provide tools for SAP metadata discovery which are flexible enough or simple enough to be used by data specialist. This is also the case for Oracle, Salesforce and Microsoft packaged applications.
The impact of this is that much time and money is spent doing what should be a relatively straightforward task using predominantly long winded and costly manual methods and techniques instead of exploiting technology.
What you can do
Silwood Technology believe that since you have paid so much to acquire, implement and manage these applications you should be able to have access to the data model that underpins them in a practical and consumable way. We also believe that you should be able to deliver data and information projects which include data from these system more quickly and effectively.
Our software, Safyr, gives control back to your data team about how and when they make use of SAP metadata. They can use Safyr to extract rich business metadata from the application’s data dictionary and then use its search and analysis capabilities to locate the tables that are relevant quickly and easily and save them as subject areas which represent the business artefacts they need. Then they can use those results with a variety of different information platforms such as SAP PowerDesigner, Collibra, Informatica EDC, ASG Data Intelligence, Erwin, ER/Studio and in a variety of other technical formats.