Hey all! We’re working on some refinements for our yente matching API and in order to do good quality assurance on this I’m collecting some ground truth for complex name matching problems here:
If anyone has an idea for an example we should test for which they can share without revealing customer PII, please post it here
Hello. I noticed that one of the examples was that a title such as “Mr.” could be added to a name in error (…or perhaps deliberately in the hope of evading a check or control). A similar possibility would be the addition of post nominals at the end of a name. In the UK these can be honours such as OBE Orders, Decorations and Medals - UK Honours System as well as academic qualifications (such as BA, BSc and PhD) professional qualifications (such as ACA, CEng), political offices (MP) and religious societies (SSF). In addition, men can use “Esq.” as a post nominal in place of “Mr”.
Ooooh this is a fun resource, thank you for sharing it! As you know: A list of people who don’t exist is actually a very valuable asset for us in terms of testing the overall scoring system - we can just assume that every row in this is a negative match against every sanctions list. These sort of “external truth” things have gotten us a lot of mileage already
Regarding prefix removal on names, we’re actually doing a lot of build up on name reference data, including a prefix list here: rigour/resources/names/stopwords.yml at main · opensanctions/rigour · GitHub .. We need to make this more broadly findable at some point (check the org types file in the same folder).