11.5 million files.
2 terabytes of data.
200,000 offshore entities.
The biggest data leak ever
The Panama Papers, a massive leak of confidential documents from the Panamanian law firm Mossack Fonseca in 2016, involved an extensive investigation into offshore accounts and tax evasion. The leaked documents consisted of millions of files, including emails, financial spreadsheets, and other records. Calimere Point provided critical data analytics specifically matching account names - a robust and repeatable regulatory response – delivered fast.
The Data Picture
Superficially, matching a company’s client list (we call this the “Private List”) against the Panama Papers is a straightforward exercise.
However, a number of practical issues immediately become apparent once the data is examined. Both the Panama Papers and the Private Lists have their own data nuances. Critically, the data nuances may not be consistent within a given data set nor between the data sets.
The data needs to be normalised to ensure we are matching names on a consistent basis.
Industry: Financial Services - Wealth Management
Challenges
Outside the high profile casualties our clients have been faced with the apparently easy question, “How many of my clients are in the 11.5 million files?”. Because the IRS wants to know. And my regulator(s).
Finding one name in the Panama List is relatively easy. What if you have to check one thousand names? Or ten thousand? Or a million?
And do it quickly.
Any matching process would have to take these nuances into account. The traditional approach to this is some form of “Fuzzy Matching”. But fuzzy matching comes with its own issues. The amount of fuzziness will be determined by a number of criteria:
1) Balance between false positives and false negatives
2) The increased run time from the fuzzy process
3) Black box (eg numerical threshold based) versus rules based
It’s heuristic, or put more prosaically, it’s try, see and modify. It is important for clients (and their regulators and tax authorities) to understand that such an exercise is empirical and has no right answer. Sophisticated clients recognise that they are accepting and agreeing to the logic of a process and there is not a black and white answer to all combinations. Data matching is an iterative process that may involve multiple rounds of matching and refinement to improve accuracy. The specific techniques and algorithms used can vary depending on the requirements of the data matching system and the quality of the data being matched.
For some countries and business units it is highly unlikely that client data is allowed to cross borders. This has a significant practical effect: it means that any matching process cannot be coordinated on a powerful central server, which is in direct conflict with an exercise that is complex, will incorporate computer intensive fuzzy matching and is on a large data set.
If the matching process needs to be deployed at several different locations, differences and limitations of hardware and IT expertise need to be considered.
Punctuation (including full stops for abbreviations, commas to separate names, and hyphens) needs to be cleaned. And then accents (or more generally diacritics) need to be considered. And diacritics shine a spotlight on character encoding, it’s not always going to be UTF-8.
These can come in multiple flavours, and could be abbreviated or written in full. GmbH or Gesellschaft mit beschränkter Haftung? Which also comes with multiple cases and an umlaut.
What would 007 be? Bond; James Bond; Mr (optional full stop) Bond or just Agent Bond? We cannot rely upon names being in a consistent order.
Data Solution
Rule-based Analytics Application
Calimere Point developed an analytical application that contains a series of rules which addresses the nuances in the data, carries out a series of matching processes of differing strength and categorises such matches. The solution was reviewed by a Big Four Auditor, and classified as best-of-breed.
Data Visualisation
Calimere Point produced a robust and repeatable regulatory response – delivered fast.
Our complete Panama Paper’s solution was delivered at a fraction of the cost than alternate solutions assessed by our client.
The portability of our solution allowed our client to deliver a co-ordinated and coherent global response to regulators – resulted in a huge efficiency gain
Our Panama Paper’s client data alignment solution became a valuable asset for our client, and was used in generating a single view of client across their complex and geographically diverse business post Panama.