False negatives

What should we do if we find false negatives while using the site? By that I mean one entity listed two or more times - so a match could have been made but was not made.

The answer may be nothing because fixing individual entries manually is time consuming and takes time away from automation. I just thought I should ask.

Hi! It’s a very fair question, and maybe I can respond a bit more broadly:

  • First off, we wouldn’t necessary refer to this as “false negatives”, instead perhaps “unmerged duplicates”. I want to clarify this because it provides context to the many contributions you’re making to rigour. There’s similar but distinct processes happening here. The “false positive/negative” thing mainly applies to people using our matching tool to see if a given name they are interested in is mentioned on any watchlists. So it’s a match of some random data points we have basically no control over with the database we do control. That’s mainly what I’m working on with rigour right now: cleaning the uncontrolled inputs enough that we can then correlate them with our watchlists.
  • The other process is our internal data cleansing process leading into the database. This also relies on some of the tooling in rigour, but we also do a lot of bespoke stuff per source (check eg. here).
  • Internal deduplication between the lists (and sometimes inside one list) is something we do part automatic (maybe 5-10%) and part with human review. Given that it’s a moving target, we do prioritize the sections that receive human attention a bit: key NATO sanctions lists get more attention than eg. the GEM or UANI lists that have no regulatory significance.
  • I’m a little too much of a snob to want to open up merge decisions to the public (besides that being technically hard). The legal and data source context involved in making those decisions makes this a bad idea IMO.
  • Of course, we’d love to know if anyone notices any unmerged duplicates in the database. So I’ve just created a public spreadsheet that will send us a notification and let us track which suggestions we’ve already reviewed and accepted/suggested.

Hope this all makes sense :slight_smile:

Thank you. That’s all very interesting. It is also helpful to better understand what Rigour is used for.