Metadata: the foundation of data transformation
If you have ever travelled on the London Underground you may have heard the phrase “Mind The Gap” being announced on certain stations, or perhaps you’ve seen the sign on the platform edge.
The reason is that in some stations where the platform is curved the distance between the carriage and platform is wider than you might imagine. The message is to encourage you to make sure you take care not to fall between the train and the platform by not taking too small a step. Failing to do this can cause injury or worse.
This parallels some of our experience in working with customers on large data transformation initiatives. There are often wider gaps than originally envisaged in terms of how big some of the steps are that need to be taken in order to deliver effectively and on time. One of the common reasons for this is the difficulty in accessing, understanding and using the metadata in applications and systems which are part of those projects.
The result of not understanding how wide the gulf between what is expected or achievable and sometimes what is promised by vendors and reality can be cost overrun, delay in realisation of benefits and even project cancellation.
Where are the gaps?
The metadata is some systems is more difficult to utilise and share than others. Again, in our experience organisations who need to incorporate data in ERP and CRM packages from SAP, Oracle, Microsoft and Salesforce into their projects often encounter the biggest problems in finding the data they need. This is because of the inaccessibility of their metadata coupled with the complexity and size of the data models.
There are many instances where these gaps, caused by a lack of understanding of metadata, can occur in data management projects. Some examples of how these challenges manifest themselves include:
Data catalog or governance projects
The first example relates to a Data Catalog project being undertaken by a large bank. Amongst the hundreds of data sources whose metadata and lineage needed to be cataloged were a number of SAP applications including SAP BW. Despite having access to a number of SAP tools and of course the data catalog product’s own connectors and scanners they were unable to provision it with SAP metadata. As well as wasting time and effort, this was extremely frustrating and a threat to the progress and ultimate effectiveness of the project.
Business intelligence and data warehouse projects
A further example is a Business Intelligence project initiated after the implementation of SAP to replace a number of legacy applications. As well as SAP’s ERP solution the customer also purchased SAP Data Services and SAP Business Objects. Data Services, an ETL product originally from ACTA, was one of the ancillary tools, along with Information Steward and Crystal Reports, which SAP acquired along with Business Objects.
The customer believed that the products worked together seamlessly. Of course, Data Services does connect to SAP to extract data, however their challenge was to know which of the 90,000+ tables in SAP hold the data needed.
Data Services does not provide any metadata discovery capabilities and so when it was time for the BI team to try to find the tables and related tables in their new SAP system they came across a serious problem which caused significant delays, reduced the trust the business had in the data they were given and could have threatened the BI project altogether.
Data migration projects
This example was about the replacement of a great many legacy applications with a new SAP system. The challenge here was not BI but data migration. This was because the data from the legacy systems had to be moved into the new application. As soon as the customer’s data migration team started to try to become familiar with the SAP data model, they realised the scale of the problem they faced. Without an easy way to understand its complex metadata, made worse by the level of customisation to the data model, they would struggle to meet deadlines and more importantly increase the risk associated with inaccurate data migration. Unfortunately, there are no SAP tools to make the task of metadata analysis and introspection easy for data analysts and architects. Naturally their implementation partner would have been happy to take on this work, adding to the cost and extending time to delivery.
Application rationalisation or consolidation
A JD Edwards application rationalisation project in a large energy company provides the final example. The objective was to reduce 8 separate implementations of JD Edwards into 2 implementations.
The gap appeared when it became clear that each of the 8 systems had been implemented in a different way by different integration partners and that there was no simple and effective way to understand their highly customised individual data landscapes. It was critical to understand those and then be able to work out how those data models mapped onto the new applications. Existing JD Edwards and data modeling tools were incapable of providing the depth of metadata analysis required. In addition, attempting to do this manually would take more time and cost more than had been set aside in the project plan which could then have delayed delivery.
What can you do to mind the gap?
Before answering what can be done to mind the gap, it is important to summarise why it exists in the first place. You could also use the recommendations below to start analysing how you do this work now before considering an alternative approach.
Metadata challenges for ERP and CRM packages.
Large, complex and usually customised ERP or CRM systems from vendors such as SAP, Oracle, Salesforce and Microsoft do not give up their metadata easily.
With the exception of Salesforce which has other challenges, they do not hold any useful business metadata in their RDBMS system catalog. Table names are not descriptive e.g. how would you know what the table called KNA1 in SAP contains? The ‘useful’ metadata is held in the application layer, which is where you will find that KNA1 which actually means “General Data in Customer Master”. Therefore, traditional tools which can scan the database for table and attribute names reveal virtually nothing of value to either a business user or analyst in terms of what the data actually means.
These systems are also large and often heavily customised which means reference models or templates are of minimal use. For instance, an SAP ECC system has over 90,000 base tables before customisation. Even large Salesforce systems can have several thousand tables. Scanning all of these is of limited value because you are probably not using all the tables anyway. Additionally, an SAP system may have over 1000 tables related to the concept of “customer’ which means that it can be difficult to decide which are relevant for the task at hand without further details.
The usual methods for finding, analysing and sharing this type of metadata do not meet the demands of today’s enterprise or business users. Project teams usually have to rely on technical specialists with vendor tools which are not optimised for this kind of work, manual searching techniques or even guesswork.
Alternatively, organisations might engage expensive external consultants or even resort to internet searches to find appropriate metadata.
Finally, there is the challenge of making whatever metadata you find available to the other technologies you are using e.g. your data catalog, ETL or data modeling tools.
Take some time to consider or ask your team:
- What methods do you usually employ for discovering and using metadata from any source? How effective are they?
- Are you able to include data from your ERP and CRM systems in information management projects quickly and accurately? If not have you considered why this is and whether there are alternatives to your current methods?
- Do these packages act as a drag on your projects and delay the delivery of their benefits? What is the cost of that delay to your organisation?
- How does your organisation currently undertake metadata discovery or source data analysis for your CRM and ERP packages now?
- Who does that work? Are they internal technical specialists or data analysts? Do you have to use 3rd party external resource?
- What is the cost of this work? What tools and techniques do they use?
- How long does it take to find the tables and related tables which contain the data needed?
- How accurate are the results?
- How easy is it to check accuracy and how much rework is needed to correct mistakes?
- How far reaching and what is the cost of any rework?
- Finally, ask your vendors of data catalog, ETL, data modeling or master data management tools what solutions they provide for discovering, analysing and most importantly, making use of the rich business metadata that does exist in your ERP or CRM packages.
Evaluate Safyr for metadata discovery and analysis
You can use Safyr to give you deeper insight into the business and technical metadata in your ERP and CRM packages. This will help you to overcome the metadata challenges they present and accelerate the delivery of your projects whilst at the same time managing costs and accuracy. Using Safyr typically helps you to reduce the time taken for this task by about 90%.
Then you can navigate, analyse, scope the metadata to create subsets that represent the business topics you are interested in. Finally, those subsets can be shared automatically with a variety of different technology platforms and formats eliminating the need for rekeying or manual integration.
Silwood Technology Limited
(Note: Image of London Underground attributed to By WillMcC – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4379199)