We do have quite a bit of that sort of data, just also from merging entities across sanctions lists. (eg: One fun asset that we have and should talk more about is a pairwise match file of person and companies that’s generated off the main OS data.)
What I’m trying to chase down at the moment a bit is a more domain-inspired typology of name matching error types.
For example:
- A screening system should consider John B. Roberts and John A. Roberts to be different people. Mainly if we know they’re in America…
- LLC ORION and ORION OOO are the same Russian company,
- Ben Netanyahu and Benjamin Netanyahu - is that a match? Does that get too broad?
- …
We often get screening false positives (unfortunately: we rarely get false negatives!) sent in by people, and I think those bits can serve as a harness on the API to make sure we at least don’t do the same mistake twice ![]()