5 min

Part 2: Using Metadata with Data Intelligence Projects

Part 2 of our 'What is Metadata Discovery?' series. Explore why metadata is essential for data governance, data catalog, data analytics, AI, data lineage, data observability and other transformation projects
Infographic showing metadata discovery project.

This series is written for data professionals working on governance, analytics, data catalog, migration, and integration projects, particularly those whose organisations run ERP applications from SAP, Microsoft, Salesforce, or Oracle.

Each part can be read independently, but together they trace a path from the fundamentals of metadata through to the practical challenges of working with large, complex enterprise systems.

 

Contents

Part 1: Metadata Discovery and Data Intelligence

Part 2: Using Metadata with Data Intelligence Projects

Part 3: The Challenges of Metadata Discovery

Part 4: SAP and ERP Metadata: A Suitable Case for Treatment?

Part 5: Using Safyr for SAP and ERP Metadata Discovery

Using Metadata with Data Intelligence Projects

Data Intelligence is an umbrella term that covers a wide range of project types, each with different goals but a common dependency: they all need accurate, accessible metadata to succeed. In this blog, we look at the main Data Intelligence disciplines and explore what metadata means for each of them, and what the consequences are when it is missing or hard to find.

Data Governance

Metadata is the foundation of effective data governance. It is what allows an organisation to align data with policies, compliance requirements, and business rules. Governance initiatives depend on metadata to identify data ownership and stewardship, to enforce access controls, to capture data lineage, and to record compliance-related information, whether that is in support of GDPR, the California Privacy Rights Act (CPRA), or other regulatory frameworks. Without reliable metadata, governance becomes an exercise in approximation.

Data Catalog

A data catalog is essentially a structured inventory of data assets, designed to make data discoverable and to support collaboration across the organisation. Building and maintaining one requires continuous access to two types of metadata: technical metadata (schema details, formats, sizes, and locations); and business metadata, including definitions, tags, and business glossary terms.

A good catalog also captures usage metadata: who accesses what data, and how often. But all of this depends on being able to reliably extract metadata from source systems and keep it current as those systems evolve. That extraction process is frequently where catalog projects encounter their first technical serious obstacle.

Data Analytics and Data Warehouse

For analytics teams, access to metadata is what makes data easier to find and interpret. It enables analysts to locate the data sources they need, understand their structure and content, and link data sets to the business metrics or KPIs they are trying to illuminate. A data warehouse without well-documented metadata is a resource that can only be used by the people who built it, which is a significant limitation as organisations scale their analytics capabilities.

Data Lineage and Data Observability

Data lineage and data observability are related but distinct disciplines. Lineage is concerned with tracing the origin and movement of data: where it came from, how it has been transformed, and where it ends up. Observability is concerned with the runtime health of data pipelines: detecting anomalies, monitoring data quality at the point of processing, and alerting teams when something goes wrong. Both disciplines depend critically on metadata: lineage to document the journey, and observability to define what 'normal' looks like so that deviations can be identified.

Data Quality

Data quality initiatives are focused on assessing the accuracy and fitness for purpose of data across the enterprise, and then improving it. Metadata plays a dual role here: it provides the context needed to evaluate whether data is being used appropriately for its intended purpose, and it documents the standards and rules against which quality is measured. Without metadata, data quality assessments are largely guesswork.

Data Migration

Data migration projects move data from one application or system to another. This might mean migrating a legacy database to a cloud data platform such as Snowflake or Databricks, or moving from one instance of an ERP system to another, perhaps as part of a rationalisation programme or a move toa new version of the application.

In migration projects, metadata is what makes the difference between a smooth transfer and a costly remediation exercise. It documents the schemas and formats of both source and target systems, identifies the data relationships and dependencies that must be preserved to maintain integrity, and provides the knowledge base for mapping data between systems. Tools for data modelling, architecture, or ETL rely on this metadata to do their work accurately.

Data Integration

Integration projects, which connect data across systems in real time or batch, depend on metadata to discover schemas and formats across sources, identify keys for joining datasets (primary and foreign keys) and capture the information needed for transformations and mappings. Poor metadata discovery at the outset of an integration project is a reliable source of delays and rework later on.

The common thread across all of these disciplines is this: without the ability to discover and use source metadata quickly and accurately, you are almost certain to experience delays, inaccuracies, or under-delivery. The question is not whether metadata matters. It is whether you have a reliable way to get it.

In Part 3: The Challenges of Metadata Discovery, we examine the specific challenges involved in metadata discovery, and offer some practical guidance on how to approach them.

Previous blog
There is no previous blog.
Back to all posts
Next blog
There is no next blog.
Back to all posts