Yente 5.5 — more robustness, better monitoring

Note: Requires Elasticsearch 9.x — full index rebuild required

This release is mostly a maintenance and hardening release: yente now requires an Elasticsearch 9.x server, verifies the integrity of downloaded entity data via checksums, exposes index freshness as an OpenTelemetry gauge, and ships container images with signatures and bill-of-materials attestations for downstream supply-chain scanners.

If you haven’t moved your search cluster off ES 8 yet, do that before upgrading yente — see the upgrade guide. The transition path is: 8.x → 8.19.x → 9.x, then upgrade yente. v5.4 was the last release that talked to ES 8 servers.

The changes in more detail:

  • Entity data integrity checks. When the catalog metadata advertises a checksum for an entities resource, yente now verifies the downloaded data against it during indexing. A mismatch raises a clear error instead of silently indexing a possibly truncated/corrupt file. This behavior can be turned off via the new YENTE_VERIFY_CHECKSUM=false setting if needed. Thanks @jbothma for driving this work.
  • Bounded match compute. The /match endpoint now retrieves and scores at most YENTE_MAX_MATCH_CANDIDATES candidates per query (default 500). Previously a large limit could pull in thousands of candidates per query, causing slow responses and occasional OOMs under load. The cap is set high enough that real-world matching results don’t change.
  • HTTP 414 for oversized URLs. Requests whose URL (path + query string) exceeds YENTE_MAX_URL_LENGTH (default 60000 bytes) are now rejected with a 414 response. This is to avoid a bug in uvicorn that silently eats long get requests.
  • Index freshness as an OTel gauge. A new indexed_dataset_version_time gauge exposes each indexed dataset’s last_export as a Unix timestamp, read from the index _meta written at index time and refreshed after each catalog reload. Wire it into your monitoring to alert on stale indices. See the monitoring docs.
  • Supply-chain artifacts on tag push. Every release now produces, alongside the multi-arch image: per-platform CycloneDX 1.6 + SPDX 2.3 SBOMs.

As usual, this release contains updates across the whole stack — followthemoney 4.8.2 → 4.9.2, nomenklatura 4.9.0 → 4.10.0, rigour 2.1.0 → 2.1.2, fastapi to 0.137.1, uvicorn to 0.49.0, cryptography 48 → 49, plus the usual CI action and dev-dependency bumps

Full Changelog: Comparing v5.4.0...v5.5.0 · opensanctions/yente · GitHub