Dataset query DSL
New query language for filtering datasets by name, collection, or tag. Supports both a dict-based AST and a compact string syntax with | (or), & (and), - (not) operators and parentheses:
(#issuer.west | #list.sanction) - lt_fiu
The DSL is exposed via evaluate_query(), parse_query(), and match_datasets() from followthemoney.dataset. The sieve CLI command gains -d/--datasets to filter entity streams using this syntax.
Schema changes
Person.biography— new text property for biographical descriptions.Vehicleis now matchable — vehicles can be used in entity matching/deduplication (fixes `Vehicle` not matchable? · Issue #1066 · opensanctions/yente · GitHub).Thing.wikipediaUrlexcluded from matching — Wikipedia URLs are no longer considered when comparing entities, reducing false positives.
CLI: migrate from EntityProxy to ValueEntity
All CLI commands that read JSONL entity streams (aggregate, sorted-aggregate, sieve, map, map-csv) now deserialize as ValueEntity instead of EntityProxy. This fixes incorrect merging of temporal metadata and dataset/referent fields during aggregation.
map/map-csvaccept-d/--datasetto set dataset metadata on emitted entities.- Breaking: the legacy
import-viscommand has been removed.
Other changes
- Code style aligned with explicit emptiness checks (
len(x) == 0instead ofif not x) incompare.py. - Significant new test coverage for CLI commands,
ValueEntity, dataset queries, address types, and mapping. - Python 3.11 compatibility fix.
- Dependency updates: banal, CI actions, Jackson (Java), npm packages. Node.js 24 adopted for JS builds.
Downloads: Release v4.8.0 · opensanctions/followthemoney · GitHub