I’m working with a self-hosted OpenSanctions Yente server and several custom datasets. I understand how statements work for the public OpenSanctions datasets using the API, but I’m unclear on the intended approach when it comes to custom datasets.
Any recommended patterns or best practices for exposing statement-level data from a custom dataset running on self-hosted Yente
If there’s existing documentation, examples, or design rationale around this that I’ve missed, I’d really appreciate being pointed in the right direction.
Thanks in advance — and thanks for all the work on OpenSanctions!
The very short of it is: yente doesn’t know about statement data. For doing what we usually want the self-hosted API to do (screen some entities, maybe do a search, traverse the knowledge graph), statement data seemed like overkill. Each release of the statements is 10GB and requires building an ad-hoc local graph store to really use. Of course, we’re now discovering things that would be nice to have in the API (like filtering properties in combined entities by source, or filtering names on language) - but a migration still seems like distant roadmap.
Regarding tools for building the statement data: the basic model is now in followthemoney.statement, but some of the tooling for building graphs out of lives in nomenklatura.store (we use the LevelDB implementation there).
As for the /statements API: it’s basically a SQL endpoint that runs against a table that our ETL jobs are pushing data into (see nomenklatura.db).