Why finding metadata for personal data in ERP and CRM systems for CCPA and GDPR is hard.
The State of California’s California Consumer Privacy Act and EU General Data Protection Regulation, usually referred to as CCPA and GDPR, together with regulations imposed by other governments and states, have placed more responsibility on organisations for managing and releasing the personal data and information they hold about individuals. (This is an edited and updated version of a blog first posted in 2018).
Depending on the regulation these data items can refer to customers, employees, citizens, business partners etc. As time goes on more and more countries or individual states will implement similar regulations on companies operating under their jurisdiction.
One of the prerequisites for being able to identify and manage personal data is to know where it resides across your organisation’s IT systems. In order to do this it is critical to be able understand the metadata in these systems. In many cases they will give up the metadata relevant to personal data relatively easily.
For major application packages however, including SAP ECC, SAP S/4HANA, JD Edwards, Salesforce, Microsoft Dynamics 2012, Oracle E-Business Suite and Siebel finding where they store personal data for GDPR and CCPA is more of a challenge.
There are a couple of main reasons for this. Firstly the names for tables and attributes in the database are often very opaque. For example you will find a table called LFA1 in the SAP database which has attribute names such as LIFNR or KUNNR amongst its 150 or so attributes. These are not very helpful if you are trying to find personal data items for an inventory or data governance program.
Secondly the size of the database means it is difficult to navigate and find what you need quickly and easily. Using SAP as an example an SAP ECC or S/4HANA system will have over 90,000 tables and in excess of 1 million attributes. Searching through this quantity of information for personal data items for CCPA and GDPR is a significant challenge without a specialist product.
The following is a worked example which describes how to use Safyr® to ‘scope’ the potential tables that store ‘relevant’ personal data in an SAP system. In this case we are looking for tables which store ‘Date of Birth’ information. However, the process would work for any data which comes under the general definition of personal data for CCPA and GDPR.
Of course many SAP systems have been customised so rather than providing a reference model, Safyr is more effective and useful because it extracts metadata from the application as implemented – including customisations.
Worked example: finding personal data in SAP for CCPA and GDPR
The screenshot below shows a list of tables from a typical SAP system which has been extracted into Safyr, in this case just short of 100,000 tables, which is around the number in most such systems.
We can do a search across all these tables to find any that contain a field with the ‘business name’ of the field containing the string ‘Date of Birth’. Remember that it is unlikely that SAP developers or implementers will have used just one standard definition for this so you will also need to look for “birthdate”, birth date” and “birthday” and so on. All of these variations exist in SAP.
In the screenshot below you can that this has reduced the list of nearly 100,000 tables to just 90. So there are 90 tables that have a field with the description containing the text string ‘date of birth’.
You can see that on the right of the list in the screenshot above is a column called Row Count which gives the number of records in each table. Quite a few have zero – and this is not unusual in a SAP system as SAP delivers a full set of features and tables that may or may not be actually used by a given customer.
It is easy to further refine the query on ‘date of birth’ to filter out any of the tables with no data and the results can be seen below. There are just 5 SAP tables which contain the description ”date of birth’ and contain data. This is likely to very different in another SAP system, depending on what features and modules of SAP the customer uses and what customisations have been made.
Having found a set of tables that contain likely Personal Data attributes, the results can be recorded using what Safyr calls a Subject Area. This is like a folder where you can group any number of tables. This can be refined further by identifying the ‘date of birth’ fields. It’s easy to select the tables and add them to a Subject Area – and there is an option to ‘mark’ those fields that meet the selection criteria used (in this case ‘data of birth’ fields).
So the result shown below is a group of tables that contain a field with the string ‘date of birth’ in the ‘business’, name and containing data which has been stored as a Subject Area called ‘Date of Birth for CCPA/GDPR. The ‘Marked Fields’ column shows how many fields on each table meet the search criteria. In the example above, table PA0002 has 3 such fields.
If you want to look at the details for an individual table, all you need to do is to select the table and the individual fields for that tables will be displayed. As seen in the image below by selecting “Show Fields – Marked Fields, you can see the 3 “Date of Birth” fields and the different technical field names.
This process could be repeated for other Personal Data fields until we had assembled a set of Subject Areas that represent all the Personal Data categories that need to be assessed for CCPA and GDPR. Safyr then has features for merging these Subject Areas to create a consolidated list of Personal Data items. This brings together the Personal Data fields for each of the categories (Birth, Address, Credit Card Number….) into one integrated set.
Having identified and marked the Personal Data fields using the method described above, you might want to make that information easily available to a wider audience. Safyr Subject areas can be exported to a broad number of other products including Collibra, Alation, Infogix, Informatica EDC, DAG Metadacenter, erwin, Idera ER/Studio and ASG Data Intelligence. Apart from those, the most popular export format for CCPA and GDPR is Excel.
In the screenshot below the ‘Date of Birth for CCPA/GDPR Subject Area has been exported from Safyr into an Excel file and filtered so that you are only seeing the fields marked with “Date of Birth”. You can see the technical and logical name for each table where they occur and the descriptions, More information is included in the other spreadsheet tabs.
Identifying candidate Personal Data attributes is merely one of the key steps of any CCPA and GDPR strategy. In the case of large application packages like SAP it can be a very challenging first step. Safyr is not a full CCPA or GDPR solution, however you can use it to accelerate the delivery of a data inventory of the personal data assets in your key ERP and CRM packages from SAP and as well as Oracle, Salesforce and Microsoft.
And a final thought. Unlike Y2K, CCPA and GDPR are not a one-time job. There is a responsibility on each organisation to monitor and manage their storage and use of personal data on an on-going basis. Executive sponsorship, a data governance strategy, staff awareness and training, appropriate policies and the correct use of technology is critical to maintaining momentum. More information on the CCPA is available here. The full text of the EU GDPR is here.
If you would like to learn more please click here to book a call to discuss how Safyr can help you to identify the Personal Data in your ERP or CRM systems. You can download a free trial of Safyr here.