Polytechnic University of Valencia Congress, CARMA 2016 - 1st International Conference on Advanced Research Methods and Analytics

Font Size: 
Big Data Matching Using the Identity Correlation Approach
Mary Smyth, Kevin McCormack

Last modified: 27-06-2016



The Identity Correlation Approach (ICA) is a statistical technique developed for matching big data where a unique identifier does not exist. This technique was developed to match the Irish Census 2011 dataset to Central Government Administrative Datasets in order to attach a unique identifier to each individual person in the Census dataset (McCormack & Smyth, 20151). The unique identifier attached is the PPS No. (Personal Public Service No.2). By attaching the PPS No. to the Census dataset, each individual can be linked to datasets held centrally by Public Sector Organisations. This expands the range of variables for statistical analysis at individual level. Statistical techniques developed here were undertaken for a major European Structure of Earnings Survey (SES) compiled by the CSO using administrative data only,  and thus eliminating the need for an expensive business survey to be conducted (NES, 20073,4,5). A description of how the Identity Correlation Approach was developed is given in this paper. Data matching results and conclusions are presented here in relation to the Structure of Earnings Survey (SES)6 results for 2011.

Full Text: PDF