How to deduplicate custom datasets with the default OpenSanctions dataset?

jordangoulet13 · November 17, 2025, 7:16pm

I’ve created a custom dataset (World Bank Leadership) and deployed it to yente. It
works, but I’m seeing duplicate entities in search results.

For example:

My dataset has: “Ajay Banga” (wb-lead-120ea…)
OpenSanctions default has: “Ajaypal Singh Banga” (Q4699676)

{
“id”: “wb-lead-120ea365ea249e476b8507d585cfdb7e24fa21dd”,
“caption”: “Ajay Banga”,
“schema”: “Person”,
“datasets”: [“worldbank_leadership”],
“properties”: {
“name”: [“Ajay Banga”],
“topics”: [“gov.igo”, “role.pep”],
“sourceUrl”: [“https://www.worldbank.org/ext/en/who-we-are/leadership/ajay-banga”]
},
“target”: true,
“first_seen”: “2025-11-17T18:09:43”,
“last_seen”: “2025-11-17T18:09:43”
}

// Entity 2: OpenSanctions Default
{
“id”: “Q4699676”,
“caption”: “Ajaypal Singh Banga”,
“schema”: “Person”,
“datasets”: [“wikidata”],
“properties”: {
“name”: [“Ajaypal Singh Banga”],
“wikidataId”: [“Q4699676”],
“country”: [“zz”]
},
“target”: true
}

These are the same person but appear as separate results.

Question: How do I deduplicate my custom dataset with the default OpenSanctions
catalog?

pudo · November 19, 2025, 3:20pm

Hey! That’s a bigger project

We’re using our own (open source) framework to deduplicate the OpenSanctions data internally, called nomenklatura. The way it works is to basically run a comparison on the entities in the dataset (nomenklatura xref ), choose which ones to auto-merge and which ones to verify (nomenklatura dedupe) and then using the resulting SQL lookup table (the so-called resolver) to re-write and merge the entities of both datasets into one.

We wrote a blog post about the process a long time ago, and there’s some superficial instructions in the nomenklatura readme file. We’d love any help documenting this further.

Doing this is a. lot. of. work. - so I’d also make sure that you can’t just live with multiple hits on the PEP….

Topic		Replies	Views
API Returning different IDs for the same entity – How to handle this? Support & Questions	3	90	February 20, 2025
Using own custom dataset in yente? Support & Questions yente , support	1	138	February 19, 2025
Yente Manifest setup Support & Questions	7	66	October 2, 2025
Is there way to provide tip-offs relating to content on the database? FAQ	2	62	December 23, 2025
Deep, fuzzy matching person and company names with Project Eridu Research & Development	24	534	November 24, 2025

How to deduplicate custom datasets with the default OpenSanctions dataset?

Related topics