I’ve been working on Ohio politicians in Wikidata.
One representation in the EveryPolitician site under Subnational government positions list that has me wondering how the positions across data sources should be represented. Are they normalized or should they have separate entries?
For example the Plural position for the State of Ohio Senator is Member of the Ohio Senate (United States) – EveryPolitician while this could be mapped to member of the State Senate of Ohio (United States) – EveryPolitician for the Wikidata representation.
Just to be clear the position split appears across most Senate positions for the US states split of Plural and Wikidata. From a user standpoint it would make sense to have a single position and indicate any source variations within the position view.
Hey Wolfgang!
Thanks for raising this!
Positions from different sources are intended to resolve to a single entity. The mechanism that makes this work is the Wikidata QID: when h.make_position is called with a wikidata_id, the position’s entity ID is set directly to that QID rather than a hash of the name. Two crawlers independently emitting the same QID will produce the same entity, which gets deduplicated automatically during export, with both name variants preserved on the merged entity.
The split between the EveryPolitician and Wikidata representations of Ohio Senate positions indicates that one or both crawlers are not yet supplying the QID for those positions. We run periodic cycles of position deduplication and normalization for exactly these cases: when two position entities refer to the same real-world role, they get merged and any source-specific name variants are retained as alternate labels.
I have just manually merged the two positions you pointed out. During the next export run, they will become the same entity. I’ll also take a closer look at the rest of the US states.
Thanks again for digging into this.
1 Like