Hello. First of all: Thank you for all the work you have put into this project. I am very new to both yente and zavod; i would appreciate any hints to get started.
I have both tools running locally.
Q1: I created a custom yente manifest that points to local files i was able to create with zavod.
datasets:
- name: cz_national_sanctions
title: "CZ National Sanctions (Zavod)"
path: /Users/sbz/code/opensanctions/data/datasets/cz_national_sanctions/entities.ftm.json
version: "20251016"
- name: de_bka_wanted
title: "DE BKA Wanted (Zavod)"
path: /Users/sbz/code/opensanctions/data/datasets/de_bka_wanted/entities.ftm.json
version: "20251016"
- name: default
title: "Default: All (CZ + DE BKA)"
datasets:
- cz_national_sanctions
- de_bka_wanted
Does this look correct? It seems to be working fine so far.
Q2: Can zavod automatically create catalog files?
The default yente manifest looks like this:
catalogs:
- url: "https://data.opensanctions.org/datasets/latest/index.json"
scope: default
resource_name: entities.ftm.json
Can zavod create such a index.json file that is automatically updates every time i do a zavod run --latest datasets/example/example.yml ?
Q3: ZAVOD_RESOLVER_PATH setting
The docs state that
ZAVOD_RESOLVER_PATHmust be set to the path to a nomenklatura resolver JSON lines file. It can be an empty file. e.g.data/resolver.ijson
I did not set this setting but i still seems to work? Is there a default resolver.ijson config i should be using?
Q4: post zavod run tasks
Are there any zavod commands i should execute after a zavod run for deduplication or other data improvements, or is it done automatically on the run?
Q5: Transliteration support
Am i correct in seeing that transliteration support (Example: .fmt data file contains the name"Berti" but i search for “Берти”) works ONLY on the /match endpoint and not on the /search endpoint?
Thats all for now, i would greatly appreciate your guidance.