Data, data everywhere. Can you find and manage it all easily and quickly?
It’s probably no surprise that you are noticing an increased focus on the management, governance and exploitation of the data your organisation collects and stores. Why is this?
Firstly governments and regulatory bodies across the globe are imposing ever more stringent rules on how you look after data. Think GDPR, CCPA in California, MiFID for finance in Europe, PIPEDA in Canada, HIPAA for healthcare in the USA and more. Fines for non compliance are rising and data leaks can be very damaging to your brand reputation in both the short and long term.
Secondly in common with most others, your organisation probably spends a lot of time and money collecting, processing and storing data. This can include data in internal systems such as that in your ERP or CRM packages, files, email, other home grown applications, unstructured data and even machine or sensor data. It may also include data sourced from outside your organisation, for example market or finance information.
Thirdly more and more organisations are turning to their data to give them greater intelligence into how effective they are or could be, provide competitive advantage and also deliver insight into what opportunities might exist for the future.
In fact given the amount you probably spend on your IT infrastructure it would be irresponsible not to try to gain as much value from the data it produces as possible.
Of course in the past you may have used Business Intelligence or reporting tools for this. These give some data analysts and managers access to some of the data for the purposes of reporting and analysis. In most cases these tools have been used by a relatively small group within an organisation. In addition they have rarely given a holistic view of data across the organisation and so have focused on discrete areas such as finance, sales, production etc. All this is understandable given the technologies available, the methods for ingesting data into say a data warehouse or data silo as well as the cost and time involved in setting up the environment.
So what has changed?
There is now a growing acknowledgement of the need to put more data (as long as it is timely and accurate) into the hands of staff in order to help them to be more effective and make better decisions. This will help organisations be more agile and customer focused. In addition to start on the journey toward compliance with regulations requires your organisation to have inventory of what data it holds, where it is, what is done with it, who it is shared with, who has access to it etc.
To do this requires the adoption of some new technologies, some of which are in their early stages of development.
For example there is a growing roster of data catalog, enterprise metadata management or data governance platforms which promise to make it easier for staff to know what data is stored, its meaning and provenance and where it can be accessed. Data lakes are designed to make it easier to pull data together with the intention of using advanced data science or analytics tools to provide insights. Ubiquitous and easy to use data analysis tools are promoted as being able to remove the reliance on spreadsheets and allow more and more staff to find their own insights and make their own decisions based on well curated data.
Underpinning all this are tools to scan and identify metadata, quickly identify, connect, extract and load data or provide access to quantities and varieties of fast moving data where it resides for analysis.
These new or upgraded tools and technologies offer enterprises the ability to quickly and easily include all data types in these solutions.
So all is good, right? Well yes and no
It is true that many data sources can be incorporated into these new platforms without too much manual intervention. Metadata and data from home grown RDBMS, files, data warehouse, machine and sensor systems, even social media and other external sources can be scanned and utilised by them quite easily.
One problem you might encounter however is the sheer scale of the challenge here. Many organisations have hundreds or even thousands of potential sources. Cataloging all these and then governing the data is a challenge. Finding insight from so much data will probably require the blending of some form of machine learning with human intelligence and intuition.
Another problem with extending data catalogs or analytics tools across the enterprise is that some sources do not give up their useful metadata easily which also makes it difficult to use their data effectively. Examples of these are the large, complex and usually customised ERP or CRM systems from vendors such as SAP, Oracle, Salesforce and Microsoft. Packages such as this typically do not hold any valuable metadata in their RDBMS. Tables names are not very descriptive eg what does the table named KNA1 in SAP refer to? The useful metadata is held in the application layer, which is where you will find that KNA1 which actually means “General Data in Customer Master”. This means that scanning the database for table and attribute names reveals virtually nothing of value to a business user or analyst in terms of what the data means.
These systems are also large. For instance an SAP Business Suite system has over 90,000 base tables before customisation. Even large Salesforce systems can have several thousand tables. Scanning all of these into a data catalog is of limited value because you may not need them. An SAP system may have over 1000 tables related to “customer’ so how would a business user decide what is relevant for what they are seeking. In fact you may not be using all the tables. Similarly loading the data into a data lake without having knowledge about what it means will reduce the value of any insights you might be able to make.
Traditional methods for finding, analysing and sharing this type of metadata do not meet the needs of today’s speed conscious users. These are mostly manual, or rely on technical specialists with vendor or third party tools which are not optimised for this kind of work, especially for example trying to locate all instances of Personal Data from an SAP system for GDPR or bulk loading metadata into a data catalog.
Alternatively organisations might engage consultants or even resort to internet searches to find appropriate metadata. Then of course there is the problem associated with making it available in the data catalog or other platforms. All of this can delay the implementation of data catalog and data analytics solutions, causes cost overruns, risks inaccuracies and can in some cases put the whole project in jeopardy.
We specialise in metadata discovery, analysis and integration for the SAP, Oracle, Salesforce and Microsoft ERP and CRM packages. Our product Safyr offers a faster and more effective approach. It automates the discovery of rich metadata, including customisations and then makes it available for analysis and sub setting. The results can be then be shared with a wide range of other tools and technologies. We can also do the same for other packages, given certain parameters being met.
This means that you can be confident of including metadata from these systems in your solutions quickly and easily – as long as we have a pre-built mechanism for doing so (e.g. as we have for Collibra, Informatica EDC, ASG, Adaptive, Datum, Erwin or ER/Studio) or they can import one of more of the standard export formats we support.
For other ‘difficult’ sources, it is worth scouring the market for a solution. The last thing you need is to have to wait or omit some key data sources because their metadata is hidden and difficult to access.
Recommendation
If your organisation has some of these packages or other sources whose metadata may not easily be found, then before purchasing or implementing any of these solutions, I would recommend asking the vendors to prove how they will ensure that their metadata and data can be quickly and easily provisioned into the platform they are proposing.
Leave a Reply
Want to join the discussion?Feel free to contribute!