We’re running into an issue with our OpenSanctions API integration, and I wanted to see if anyone else has encountered this or found a workaround.
Basically, when we query a name like *Johan Björnsson", we get a match. But when we run the exact same query later, we sometimes get a different entity with a different ID (e.g., “Johan Bjornsson”), even though they appear to be the same person with only minor differences in the details. This is a problem for us because our users keep seeing and rejecting the same results, but we can’t reliably block them from appearing again since the IDs keep changing.
From what we can tell, it looks like new IDs are being assigned when minor changes occur, and previous IDs aren’t always linked in a way we can track. Is there a way to modify our integration so we get more consistent results over time? Specifically, how can we prevent already rejected entities from resurfacing under a different ID?
Thanks for raising this! What you’re describing is most likely the result of new entities independently appearing in two sources, and then only being merged into a single cluster a bit later. Until discovered as duplicates, there would be two copies of the entity - each from a different source - in the data. Here’s what’s happening and how to handle it:
Why do different IDs appear?
Each new entity gets a source-specific ID (e.g., ofac-123). If duplicates are later detected, they are merged under a new cluster ID (e.g., NK-456). Previous IDs are stored in the referents field. Check out our doc page on Identifiers and De-duplication for more information.
Why doesn’t the API always return a consistent ID?
Our system updates entities dynamically, merging duplicates over time. This ensures accuracy but can result in ID changes. The referents field helps track these transitions.
How to prevent rejected entities from resurfacing?
Track all known IDs (including those in referents). If a new result matches an old rejected ID, you can filter it out before presenting it to users.
We’re constantly improving entity matching to reduce these inconsistencies. Let us know if you need help implementing these strategies!
Thank you, I have a follow-up question. Does the information published by your team in the OFAC dataset contains the same number of records as the OFAC website for both the SDN and NON SDN lists? Is it possible that both (ofac and opensanctions) contain the same number of records?