Well…er…Yes, and No. We get this comment lots of times. It’s sometimes hard to explain the difference. Strictly speaking, we do ‘metadata discovery’. I like analogies.
Imagine I’ve moved to a new town and I want to know the location of the nearest Supermarket, Post Office, ATM, etc. What do I do? I can ask people who know or I can walk or drive around in my car and see where things are or I can look at Google Maps. Note: other Digital Map providers are also available :)
Each has their advantages and disadvantages. Asking people can often get you the quickest answer, but of course they have to be on hand, and you may have to filter their advice (“what’s the best restaurant around here?” might not give you the answer that will help if you’re on a budget or do not like their favourite type of meal). Walking or driving around will eventually get the whole picture in your head, you might find what you are looking for and you will see other things on the way – but it can be time consuming (and expensive). Google Maps lets you work out where things are ‘virtually’. You can find things at a dramatically quicker rate than actually driving around and it will provide directions. It’s no replacement for the real thing, but if the area to be explored is large, then it’s going to save a lot of time. And of course if the area was very large, say a whole country, driving around would be impractical. The analogy doesn’t fully work, but data discovery is similar. Imagine we are trying to do some Data Discovery on applications such as SAP Business Suite, or PeopleSoft, or Oracle eBusiness Suite… First step might be to ask someone who knows. “Where can I find the Purchasing Document information in the ERP?” And if you ask the right person, they may know. But you have to keep asking for each new enquiry. You might also try to find the documentation if it exists or you could use ‘informed guesswork’ and trial and error. Before I go too much further, perhaps I should mention that in Information Technology parlance there are at least three uses for the phrase Data Discovery.
Firstly there is Data Discovery for Analytics, whereby the someone make use of tools such as Tableau, Qlik, Panorama and a whole host of others to try to turn raw data from one or more sources into insight and actionable information for solving business problems. Secondly there is Data Discovery for Quality assessment and management, this typically involves using tools such as Trillium, DQ Global and others to scan the actual data in the database with the objective of working out how it hangs together by looking for unique identifiers, duplicates, missing mandatory fields, potential Foreign Keys, etc. Although they are targeted at different uses and business challenges, both of these types of Data Discovery can give insight and understanding where none existed before. In fact it may be that one might use a Data Discovery for Quality tool prior to loading data into a Data Discovery for Analytics product so as to avoid decisions being made from inaccurate data. Many of these products are highly effective and proficient and produce high quality results. Sometimes however using them with Enterprise systems such as those from SAP, Oracle and increasingly Salesforce is simply not practicable because of the sheer number of database Tables and the complexity of the data models underpinning those applications, which are also hidden from casual viewing. So in much the same way that is it simply not practical to drive round every road in a large city looking for a particular store there needs to be a better more effective way of finding what you need for the Quality and Analytics Data Discovery tools to work from. As an example and as we’ve explained on this blog before, SAP has 90,000+ tables and a very complex and mostly obscure data model. Therefore it is not realistic to profile or analyse an entire system with terabytes or petabytes of data spread over so many ‘buckets’ and it is very difficult and time consuming to find what you are looking for. And that’s where Data Discovery for Application Metadata (or perhaps Source Data Discovery ), which is what we do comes in.
Metadata Discovery allows the user to ‘scope’ the likely set of tables and relationships that are required. Its similar to data discovery, but rather than doing that on the data, its doing it on the metadata. And whereas Data Discovery for Quality is about the actual rows and columns in the database, and Data Discovery for Analytics is about the content of those tables, rows and columns, Metadata Discovery is about providing the context of that data: “Where is the Customer Master data stored?” and “Which Tables are used by the Customer Payments function?” so that those other Discovery solutions can be more effective on large, complex and customised applications. So what we do is like ‘Google Maps’ for ERP (meta)data. It provides a practical mechanism for working out where things are stored in the ERP from the comfort of your own PC. Take a look at our website for more information.