How to retrieve source URLs and evidence metadata for entity properties via API?

Hi OpenSanctions team,

I’m integrating with the official OpenSanctions API to retrieve sanctioned entities and PEP data for due diligence. I’m using the yente endpoints (e.g., /entities/{id}, /match)

For compliance and audit trails, I need source-level provenance for the data shown on an entity:

  1. The original source URLs for each dataset entry that contributed to the entity (e.g., us_ofac_sdn, gb_hmt_sanctions, us_cia_world_leaders).

  2. Any credibility or confidence scores the project maintains for specific statements.

  3. Source document titles or descriptions.

    If there’s an API-native way (within yente) to fetch source URLs, document titles, and dates per statement/property on an entity?

    Thank you

1 Like

Hey Jordan,

Sounds like you’re planning to do some really cool stuff with our data - I’d love to learn more at some point :slight_smile: (just our type of nerdy). Maybe some leads of where to look:

  • For dataset source URLs, check out the metadata catalog - this has a data section for each source dataset with a url. In yente, the metadata is retrievable via /catalog in a slightly expanded form that includes if the dataset is up to date in the screening API. This also has other details about the source document and it’s publisher.
  • For field-level attribution stuff, you need to look into our statements data format. This is a ultra detailed format that includes dataset attribution per value, and some helpful metadata fields: timestamps, but also original_value (what the value looked like pre parsing), and we’re currently in the process of populating origin - a new field that documents a more narrow source spec for the value. This can be an OpenAI model name if we used some LLM parsing, the word patch if we applied a data override manually, and the source URL or file name if it’s coming form an API where each entity has a separate URL.
  • The statements data is ca. 3x larger than the normal data format we ship (entities.ftm.json), so we don’t load it into on-prem yentes (nerds interested in data lineage are rare). But our hosted API has a /statements/ endpoint which lets you access and filter a current version of the table - and filter it by column (eg. ?canonical_id=). Top secret: this endpoint isn’t metered - as long as you use a valid API key to access it, it’s on the house.

Hope this provides some starting points!

– Friedrich

Hi Friedrich,

Thank you so much for the detailed response! Really appreciate the
insights into the statements data format.

I’ve started exploring the /statements endpoint with ?canonical_id=
filtering, and it’s fantastic to see the field-level attribution. I
noticed the new origin field you mentioned is being populated - that’s
going to be super helpful.

Quick follow-up question: For getting the actual source document URLs, I
see that some statements have prop: “sourceUrl” entries, but not all
entities seem to have these. Is the best approach to:

  1. First check for sourceUrl statements with matching entity_id
  2. Fall back to the catalog’s dataset-level URL if no entity-specific
    source exists
  3. Use the origin field once it’s more widely populated

Or is there another pattern I should follow? Essentially, I’m trying to
show our compliance team “this specific sanction entry came from this
specific source” - even if that source is a bulk file rather than an
individual URL.

Best,
Jordan

What would be the best way to get into touch with someone from your team to talk about collaboration? Thankyou.

Hi Jordan,

The approach you’ve outlined corresponds perfectly to what we’re aiming for. We set sourceUrl only when there is one that is a) not a raw API endpoint but something that an analyst might reasonably want to browse to, b) it’s more narrow than the dataset-level URL. Quite a few of the lists we import come in the form of a single file, so deep links aren’t available. Is there a world where you can reference the OpenSanctions website in such cases?

Best,

– Friedrich

p.s. To discuss a collaboration reach out to sales@opensanctions(dot)org, or set up a meeting via our public calendar.