About the Use Cases & Case Studies category

Community-driven examples of how different organizations use OpenSanctions data (e.g., compliance teams, investigative journalists, academics).

You can see our Showcase, as well as our Bibliograpghy of research using OpenSanctions data.

If you would like to submit your research, article, or use case for us to feature on our site or in our newsletter, please contact us at support@opensanctions.org.

Hopefully this may fit this category, though if not where would be a good topic?

Our team at Senzing uses OpenSanctions data in several ways among customer use cases. Recently we saw a need for synthetic data, and I wanted to introduce the approach here and ask if this might help others with their use cases?

Consider a data model with: risk data (sanctions lists), link data (UBO relations), and event data. We have excellent sources for the first two categories, although event data is often highly confidential. There are some open examples, such as the OCCRP data for the “Azerbaijani Laundromat” case.

We thought it might be a good idea to simulate patterns of fraud and generate synthetic datasets to cover needs for event data. In other words, we can combine techniques from graph analytics and from queueing theory to build parameterized models for the OCCRP data about money transfers. Then we can generate more simulated money transfers.

The general architecture, so far, has been:

  1. Use a slice of the OpenSanctions plus Open Ownership datasets, along with entity resolution used to merge across them. This produces a kind of “structural graph” of UBO, for people and companies and the links among them.
  2. Analyze patterns of fraud (aka. tradecraft) from data sources such as OCCRP to build parameterized simulations.
  3. Sample subgraphs from the “structural graph”, then simulate money laundering transactions as a “temporal graph” atop it.

Also, given that fraud-related transactions are a small portion of the overall B2B international money transfers annually, we generate a much larger ratio of “legit” transactions, sampling other entities from the “structural graph” based on OpenSanctions to serve as “legit” companies in the simulation.

If we structure this open source project carefully, the components could be sub-classed to customize the simulation for other patterns. For example, I have friends at the CISO level for large banks, who would like to use their confidential patterns in a simulation, and would be interested to participate.

We had a discussion earlier today at GraphGeeks.org where I learned A LOT about graph analytics which could be leveraged. Admittedly, it’s not my main field. There’s other pre-existing work we can probably use. Also, I’ve been posting some insights along the way on Bluesky to see who might be interested.

We have a GitHub repo started at https://github.com/DerwenAI/kleptosyn/tree/main/kleptosyn
and the project has the rather clumsy name of kleptosyn (so far)

All criticisms and suggestions are highly welcomed. We recognize that synthetic data has limitations and perils, and are trying to be careful. Perhaps by combining large parts of open datasets with small parts of simulated data, we can help illustrate a wider range of applications and use cases which might otherwise be blocked by confidential data sources, such as money transfers, SARs, log files in mission-critical systems, and so on.

An example of the UBO “structural graph”, using betweenness centrality to rank interesting entities within each subgraph:

A visualization of the OCCRP money transfers in the “Azerbaijani Laundromat” case:

take-aways:

  • this is a relatively sparse graph with diameter = 4
  • 423 nodes out of the 437 total are in the periphery

questions:

  • does the flow hierarchy show that few edges participate in cycles?
  • do the many 021U triads indicate “burst in beneficiaries” AML tradecraft pattern?

OCCRP analysis and the subsequent investigative journalism articles mention the 4 top shell companies involved in the “Azerbaijani Laundromat”, in order of centrality: LCM, Hilux, Polux, MetaStar.

NetworkX identifies 423 nodes out of a 437 total as “peripheral”, so what kind of patterns occur involving the other intermediate 10 shell companies?

The leaked transaction data came from an Estonian branch of Danske Bank, which is one of these intermediary nodes identified: “DANSKE BANK A/S EESTI FILIAAL” https://thebanks.eu/banks/13002

The others of interest then are the graph’s “center”: “RIVERLANE LLP”, “GLOBECOM TRADE L.P”

And the remaining 7 shell companies of interest about tradecraft patterns are:

  • “MOLONEY TRADE LLP”
  • “DEUTDEFFXXX”
  • “WILLROCK UNITED LLP”
  • “HARDWARE SYSTEM LLP”
  • “BONDWEST LLP”
  • “DELFRONT IMPORT LLP”
  • “REDPARK SALES CORP”