Call for false positives: help us build out a great set of name tests

Hey all! We’re working on some refinements for our yente matching API and in order to do good quality assurance on this I’m collecting some ground truth for complex name matching problems here:

If anyone has an idea for an example we should test for which they can share without revealing customer PII, please post it here :slight_smile:

The UK Information Commissioner’s Office publishes some dummy data here: https://ico.org.uk/media/1432969/exampledataset.csv - the way they generated it is explained here: https://ico.org.uk/media/for-organisations/documents/2021/2618998/how-to-disclose-information-safely-20201224.pdf - as these people are made up they should all be false positives (but there could be coincidental name matches)

Hello. I noticed that one of the examples was that a title such as “Mr.” could be added to a name in error (…or perhaps deliberately in the hope of evading a check or control). A similar possibility would be the addition of post nominals at the end of a name. In the UK these can be honours such as OBE Orders, Decorations and Medals - UK Honours System as well as academic qualifications (such as BA, BSc and PhD) professional qualifications (such as ACA, CEng), political offices (MP) and religious societies (SSF). In addition, men can use “Esq.” as a post nominal in place of “Mr”.

Ooooh this is a fun resource, thank you for sharing it! As you know: A list of people who don’t exist is actually a very valuable asset for us in terms of testing the overall scoring system - we can just assume that every row in this is a negative match against every sanctions list. These sort of “external truth” things have gotten us a lot of mileage already :slight_smile:

Regarding prefix removal on names, we’re actually doing a lot of build up on name reference data, including a prefix list here: rigour/resources/names/stopwords.yml at main · opensanctions/rigour · GitHub .. We need to make this more broadly findable at some point (check the org types file in the same folder).

I have raised a pull request here: Update stopwords.yml by confirmordeny · Pull Request #28 · opensanctions/rigour · GitHub - hope this is of some help.

1 Like

This is fantastic! You are, according to the data, The Worshipful!

Thank you for your kind words. I have been called many things but that’s a first!

I have submitted another pull request this time for OBE, MBE etc. Update symbols.yml - UK/British Orders of Knighthood by confirmordeny · Pull Request #30 · opensanctions/rigour · GitHub