and discovered that the sanctions column appears to be entirely missing for every row. For example, when I try to use the sanctions field to compare PEPs with the Consolidated Sanctions dataset and compute Jaccard similarity, the lack of values makes that impossible.
This is surprising because, on the OpenSanctions website, I can see specific PEP records that clearly have sanctions designations.
Is targets.simple.csv expected to include sanctions values for PEPs?
If not, which OpenSanctions export should I use to get PEP records with actual sanctions designations?
If yes, is this a bug in the current PEP export or a problem with the dataset generation pipeline?
I need to join the PEP dataset with the Consolidated Sanctions dataset for overlap analysis. Without valid sanctions data in the PEP export, I cannot reliably identify sanctioned PEPs or compute overlap metrics.
If useful, I share the exact CSV header and a small sample where sanctions is null.
Edit: if I intersect the two datasets “Consolidated sanctions” and “PEPs” (both the target.simple.csv) I obtain that 3.367 people are in both datasets. Does this mean that those are the PEPs subject to sanctions?
Edit: if I intersect the two datasets “Consolidated sanctions” and “PEPs” (both the target.simple.csv) I obtain that 3.367 people are in both datasets. Does this mean that those are the PEPs subject to sanctions?
Yes, we usually refer to those individuals as “the Russian government” (lots of others, too, of course).
Regarding the sanctions designation: they wouldn’t be in the PEPs collection because those sources don’t contain the designations texts. The correct thing to do would be to filter the default dataset for entities with both the topics sanction and role.pep - but topics are not included in the targets.simple.csv (only in the JSON). So doing a join on both CSV files using the ID is a pretty clever work-around
Thank you very much for your feedback! I am currently working on a university project. The aim is to extract second-order intelligence from sanctions data — such as entity similarity, circumvention patterns and network-level risk — that goes beyond what raw list access provides, using standard BI methods applied to the consolidated sanctions and PEPs datasets. Two enrichment layers are provided by the maritime and securities sanctions datasets. The goal is to build something that provides as many insights as possible from the available data!