Data Lineage, Data Models and Enterprise Applications
Wikipedia defines Data Lineage as - well, actually it doesn't have a definition in the context of business data. But a good general definition is ... the tracking and management of data i.e. where it comes from, where it flows to, and how it's transformed as it travels through the enterprise.
One of the main reasons for wanting to understand data lineage is to enable the traceability of information. A clear set of data lineage information will make it possible to take any data item in the flow and determine where that piece of information came from and how (and if) it was transformed. This is particularly applicable to any BI process - “The figure on my report says ‘Total Sales by Region’ – how did we arrive at that figure?” but is equally applicable to any data governance, master data management or data integration initiative and is increasingly important for meeting data regulation requirements.
The Challenge
Tracking Data lineage in Enterprise projects, such as Data Warehouse, Application Integration and Master Data Management is critical in order for a company to be able to accurately understand the flow of data items from their sources as they move, or are transformed across and through the appropriate applications or business units.
Enterprise Applications such as SAP, Siebel, PeopleSoft, JD Edwards and Oracle EBS present unique problems in this context, because of the complexity and opaqueness of their data architectures.
Given that such packages are now the major delivery mechanism for most corporate business processes and important sources of data, it is vital that they can be integrated into any data modelling or metadata management strategy to support the data lineage requirements of any major project.
The practicalities of Data Lineage
Any data lineage solution will involve recording the data sources that are relevant to each stage of the process and then mapping these sources to each other to show how the data flows and is transformed.
This will require each of the data sources to be analyzed and the relevant data items extracted. This means getting at the metadata describing the data source.
There is no immediate quick fix to this metadata challenge - it is an ongoing process of discovery, documentation and debate between technical & business communities in the organisation to understand and document the mappings and transformations.
How Saphir supports the data lineage component of a project
There is no immediate quick fix to this metadata challenge - it is an ongoing process of discovery, documentation and debate between technical & business communities in the organisation.
If SAP, Oracle eBusiness Suite, PeopleSoft, Siebel or JD Edwards are part of the application landscape for your project, then Saphir will accelerate your ability to find the relevant tables, their relationships and location in the application hierarchy and to make that information available to other tools you may be using to support intelligence about data lineage.
Saphir extracts all the source metadata from standard and customised implementations of those major applications and provides highly intuitive search, filtering and analytical capabilities which enable technical and non technical users alike to quickly and accurately find and export the relevant subject areas for use in other tools or for viewing with its own diagrammer.
Some of the other tools and products which are commonly used are described below.
Available Data Lineage Solutions
Many companies have some element of the data lineage story as part of their solution.
Some Case tools, like ERwin from CA Inc, and PowerDesigner from Sybase, have product features that will support the data lineage process. Typically these are ‘mapping’ features for recording details of each data source and the mappings between these sources.
Repository vendors like ASG with their Data Warehouse Metadata Management Application and IBM’s Infosphere Metadata Workbench allow users to record lineage rules and mapping and have impact analysis features to help understand the effect of making a change to any of the data items.
Given the importance of data lineage to the ETL (Extract, Transform, Load) space, a number of the tools in this area also offer Data Lineage solutions. Informatica’s PowerCenter addresses the wider data integration requirement, including the recording of lineage information and rules.
All these vendors (and many others not mentioned) have features for recording lineage information, but Saphir remains the only toolset that can deliver the ‘source’ data knowledge where the data source is an ERP or CRM package. Saphir can then interface directly to the leading data modelling and repository tools to automatically populate those environments with the required metadata.
Without Saphir, these ERP packages are ‘black boxes’ with no visible and meaningful data definitions to be captured and brought into the project.







