5 min

Part 3: The challenges of metadata discovery

If your organisation is planning a data catalog, a governance platform, an AI initiative, or an analytics project, one of the first tasks you will face is building a comprehensive picture of the metadata held in the business applications involved. In principle, this sounds straight forward. In practice, it is frequently one of the most underestimated challenges in the entire project.
Infographic showing metadata discovery project.

This series is written for data professionals working on governance, analytics, data catalog, migration, and integration projects, particularly those whose organisations run ERP applications from SAP, Microsoft, Salesforce, or Oracle.

Each part can be read independently, but together they trace a path from the fundamentals of metadata through to the practical challenges of working with large, complex enterprise systems.

 

Contents

Part 1: Metadata Discovery and Data Intelligence

Part 2: Using Metadata with Data Intelligence Projects

Part 3: The Challenges of Metadata Discovery

Part 4: SAP and ERP Metadata: A Suitable Case for Treatment?

Part 5: Using Safyr for SAP and ERP Metadata Discovery

The Challenges of Metadata Discovery

If your organisation is planning a data catalog, a governance platform, an AI initiative, or an analytics project, one of the first tasks you will face is building a comprehensive picture of the metadata held in the business applications involved. In principle, this sounds straightforward. In practice, it is frequently one of the most underestimated challenges in the entire project.

Before we get into the detail, here are four practical recommendations worth considering before you begin talking with vendors:

  1. Compile a comprehensive list of the applications that any platform will need to support, before you start vendor conversations. Ensure that you include those which may not be in the first or second phase of implementation.
  2. Take time to make that list as complete as possible, thinking through the use cases each application supports and the data it holds.
  3. Ask each vendor to confirm and demonstrate that their solution can connect to and collect both business and technical metadata from every source on your list.
  4. For data catalog, governance and lineage projects especially, ask each vendor how their solution handles large volumes of source metadata, and specifically what facilities exist to select and curate relevant subsets. Loading tens or hundreds of thousands of uncurated data assets into a catalog makes it significantly less useful to its users.

Not all metadata is created equal

One of the less-discussed realities of metadata discovery is that there is no single approach that works across all source types. Applications have fundamentally different data models and metadata structures, shaped by decades of development, vendor design decisions, and customer customisation. Commercial ERP applications typically have large, well-structured, but heavily customised data models. Cloud data platforms tend to have more flexible schemas, augmented for specific customer needs. And increasingly, organisations want to incorporate metadata relating to unstructured assets (spreadsheets, documents, images, video) alongside the structured data from their core systems.

The scale of the metadata discovery task depends on the answers to a number of questions that are worth working through early in any project. What applications are in scope, and in what order of priority? Where does the metadata live in each application, and what mechanisms exist to accessit? Do you need technical metadata, business metadata, or both, and are both available from the source? How much metadata does each source contain, and do you need all of it? How will you manage changes to source metadata over time?

How is metadata typically collected?

The most common approaches to metadata extraction include API connectors, metadata crawlers, batch collection processes, reverse-engineering the RDBMS system catalog via a data modelling tool or SQLover ODBC, file readers for unstructured content, and specialist tools designed for specific application types.

Each approach has its place, but each also has its limitations. API connectors work well for SaaS platforms like Salesforce, where the API is designed to expose metadata in a structured way. Reverse-engineering the RDBMS system catalog is a natural choice for standard databases, but for complex ERP applications it returns only physical, technical names with no business context and no relationship information, which significantly limits its usefulness for data catalog and governance work.

This is a point worth emphasising, because it is where many projects encounter their first serious difficulty. The gap between what a standard system catalog scan returns and what a data team actually needs (business-friendly names, meaningful descriptions, and an understanding of how tables relate to one another) can be substantial. In some applications, bridging that gap requires specialist knowledge or tooling that the data team may not have.

Key questions to resolve during metadata extraction

Are you getting business names as well as technical names?

Purely technical names (the kind that a system catalog scan typically returns) may be meaningful to a database administrator but opaque toa data analyst or governance officer. If the metadata you are collecting consists of cryptic abbreviations with no business context, you will face a significant additional task in creating a semantic layer that makes it usable.It is worth establishing early whether your source applications hold business-friendly names for their tables and fields, and whether your extraction approach can retrieve them.

Do you need all of it?

Some source applications contain very large quantities of metadata. An SAP S/4HANA system, for example, typically holds over 130,000 tables and more than 1.5 million attributes. Loading all of that into a data catalog uncurated would make the catalog effectively unusable, as users would have no way of navigating to what they actually need. The ability to select andcurate relevant subsets before ingestion is therefore not a nice-to-have; it is essential. Check whether your discovery tooling (or your catalog platform) supports this.

How will you manage metadata changes over time?

Metadata is not static. Applications are upgraded, configurations change, and customisations evolve. A metadata extraction process that runs once at the start of a project and is never repeated will quickly drift out of alignment with the source system. It is worth understanding what facilities your tooling provides for detecting changed metadata and updating the catalog accordingly, and whether scheduled or triggered extraction is supported.

Choosing a metadata discovery approach that works for your most straightforward sources is the easy part. The harder question is whetherit works for your most complex ones. Those are usually the ERP applications at the heart of your business.

In the next blog, Part 4: SAP and ERP Metadata: A Suitable Case for Treatment? we look in depth at why ERP applications (and SAP in particular) present such a distinctive challenge for metadata discovery, and what that means for your project.

Previous blog
There is no previous blog.
Back to all posts
Next blog
There is no next blog.
Back to all posts