Matching Account Names – The Panama Papers

11.5 million files.

2 terabytes of data.

200,000 offshore entities.

The biggest data leak ever

The Panama Papers, a massive leak of confidential documents from the Panamanian law firm Mossack Fonseca in 2016, involved an extensive investigation into offshore accounts and tax evasion. The leaked documents consisted of millions of files, including emails, financial spreadsheets, and other records. Calimere Point provided critical data analytics specifically matching account names - a robust and repeatable regulatory response – delivered fast.

The Data Picture

Superficially, matching a company’s client list (we call this the “Private List”) against the Panama Papers is a straightforward exercise.
However, a number of practical issues immediately become apparent once the data is examined. Both the Panama Papers and the Private Lists have their own data nuances. Critically, the data nuances may not be consistent within a given data set nor between the data sets.

The data needs to be normalised to ensure we are matching names on a consistent basis.

Industry: Financial Services - Wealth Management

Challenges

Classic dirty data matching issue

Outside the high profile casualties our clients have been faced with the apparently easy question, “How many of my clients are in the 11.5 million files?”. Because the IRS wants to know. And my regulator(s).
Finding one name in the Panama List is relatively easy. What if you have to check one thousand names? Or ten thousand? Or a million?

And do it quickly.

Fuzziness

Any matching process would have to take these nuances into account. The traditional approach to this is some form of “Fuzzy Matching”. But fuzzy matching comes with its own issues. The amount of fuzziness will be determined by a number of criteria:
1) Balance between false positives and false negatives
2) The increased run time from the fuzzy process
3) Black box (eg numerical threshold based) versus rules based

Iterative Process

It’s heuristic, or put more prosaically, it’s try, see and modify. It is important for clients (and their regulators and tax authorities) to understand that such an exercise is empirical and has no right answer. Sophisticated clients recognise that they are accepting and agreeing to the logic of a process and there is not a black and white answer to all combinations. Data matching is an iterative process that may involve multiple rounds of matching and refinement to improve accuracy. The specific techniques and algorithms used can vary depending on the requirements of the data matching system and the quality of the data being matched.

Local data privacy laws

For some countries and business units it is highly unlikely that client data is allowed to cross borders. This has a significant practical effect: it means that any matching process cannot be coordinated on a powerful central server, which is in direct conflict with an exercise that is complex, will incorporate computer intensive fuzzy matching and is on a large data set.
If the matching process needs to be deployed at several different locations, differences and limitations of hardware and IT expertise need to be considered.

Data Nuances - Punctuation

Punctuation (including full stops for abbreviations, commas to separate names, and hyphens) needs to be cleaned. And then accents (or more generally diacritics) need to be considered. And diacritics shine a spotlight on character encoding, it’s not always going to be UTF-8.

Data Nuances - Company types and suffixes

These can come in multiple flavours, and could be abbreviated or written in full. GmbH or Gesellschaft mit beschränkter Haftung? Which also comes with multiple cases and an umlaut.

Data Nuances - Word order

What would 007 be? Bond; James Bond; Mr (optional full stop) Bond or just Agent Bond? We cannot rely upon names being in a consistent order.

Data Solution

Rule-based Analytics Application

Calimere Point developed an analytical application that contains a series of rules which addresses the nuances in the data, carries out a series of matching processes of differing strength and categorises such matches. The solution was reviewed by a Big Four Auditor, and classified as best-of-breed.

Data Visualisation

The logical data flow is represented as a living picture in many of our applications: this aids understanding, auditing, and is highly flexible.

Benefits

Calimere Point produced a robust and repeatable regulatory response – delivered fast.

A highly visible regulatory and tax authority investigation was prompted by the Panama Papers data leak.
It was necessary for us to deliver robust data services, even from desktop computers in local offices, despite the complexity of the data issue.

11.5 million files.

2 terabytes of data.

200,000 offshore entities.

The biggest data leak ever

The Data Picture

Industry: Financial Services - Wealth Management

Challenges

Data Solution

Rule-based Analytics Application

Data Visualisation

Calimere Point produced a robust and repeatable regulatory response – delivered fast.

A highly visible regulatory and tax authority investigation was prompted by the Panama Papers data leak.
It was necessary for us to deliver robust data services, even from desktop computers in local offices, despite the complexity of the data issue.

Related Data Solutions

Derisking One of Europe’s Largest SQL-to-Cloud Migrations

Forward Congruency: Modernising Legacy Systems for SAP S4 HANA Migration

Logistics Automation

Strategic Data Alignment

Generative AI for Investment Bank Equity Research

Generative AI in Automating Counterparty Credit Risk

11.5 million files.

2 terabytes of data.

200,000 offshore entities.

The biggest data leak ever

The Data Picture

Industry: Financial Services - Wealth Management

Challenges

Data Solution

Rule-based Analytics Application

Data Visualisation

Calimere Point produced a robust and repeatable regulatory response – delivered fast.

A highly visible regulatory and tax authority investigation was prompted by the Panama Papers data leak. It was necessary for us to deliver robust data services, even from desktop computers in local offices, despite the complexity of the data issue.

Related Data Solutions

Derisking One of Europe’s Largest SQL-to-Cloud Migrations

Forward Congruency: Modernising Legacy Systems for SAP S4 HANA Migration

Logistics Automation

Strategic Data Alignment

Generative AI for Investment Bank Equity Research

Generative AI in Automating Counterparty Credit Risk

Peter Griffiths

Co-Founder & CEO

Dominique Nelson-Esch

Chief Marketing Officer

A highly visible regulatory and tax authority investigation was prompted by the Panama Papers data leak.
It was necessary for us to deliver robust data services, even from desktop computers in local offices, despite the complexity of the data issue.