May 16, 2025

How we are gathering our data

DD
Dermot DoyleDynaccurate CEO
post cover image

In our execution of the MDIP project, we us a mixture of approaches and tools, both technical and legal. We can divide these approaches into three headings.

Identification:

  1. Identify likely holders of medications data.
  2. Review websites for publicly available data.
  3. Write and formally request under Directives 2019/1024 and 2014/25/EU.
  4. Draw down existing data records where available.

Processing:

  1. Create parsers to ingest each record into a database in a common format.
  2. Write sources to Blockchain.
  3. Match drugs to ATC codes.
  4. Match ATC codes to pharmacogenomic flags using AI technology in semantic interoperability.

Publishing:

  1. Create the online database for end-user queries.
  2. Publish the database of medications with known pharmacogenomic flags.
  3. Publish artefacts and findings from the project.

Holders of medications data

There are two main public sector entities that hold the relevant medications data – Medications Regulators and Health Interoperability organisations. All regulators in the European Union have a searchable database, but this feature is only useful for a direct lookup. More importantly are published data extracts, some of which are posted on a regular basis, and some of which are directly downloadable as search results from the searchable database GUI.

Publicly available data – formats and structures

For data downloads, of those public bodies which provide downloadable documents, they are typically either a hyperlink on a website page, or as mentioned above, extracted as search results from the online searchable database. However, in some cases, a workaround is needed to extract the full database from a GUI, such as using a special prompt, or applying a very broad filter value. When the data is available for download, it's typically a .csv file or other excel compatible file.

The more sophisticated organisations provide an API that gives the largest amount of information, and this is the best tooling for interoperability, although an end user must be able to work with an API.

Still others do a 'Database Dump' which will be several files corresponding to the database content. This issue with this approach is that the files have to be reconstituted, often with special knowledge of the database data model, and this is very tedious and requires expert abilities.

Directives 2019/1024 and 2014/25/EU

For stakeholders who wish to obtain data held by a ministry, agency, university or other publicly funded body, the above Directives are powerful tools to achieve this. Directive (EU) 2019/1024 (also known as the Open Data Directive) compels a public body to provide data that they hold on request. It covers:

(i) Governmental authorities governed by public law whose purposes are to attain a general public interest, has a legal personality and is financed for the most part by a governmental authority, are subject to management supervision by those authorities or whose board is state dominated or

(ii) Is active in high value public tender work (as defined in Directive 2014/25/EU). A university is to be regarded as a public body for the purposes of the Directive and associated implementing laws around the various member states.

Requests made under these Directives need to be acknowledged and actioned within 20 working days, with an additional 20 working days for complex requests. There are only limited criteria under which an authority can refuse such requests.

For the purposes of our project, we currently believe there's no credible grounds for a public authority to withhold an index of medications, given that no personal data captured.

Processing tools

We use a mixture of technical tools for processing the data. These are:

  1. Creating parsers to ingest each record into a database in a common format. These have to be created per record received.
  2. Writing sources to Blockchain. We use Microsoft Azure Blockchains services to achieve this. This is a demonstrator approach to display – primarily to public bodies – how to achieve provenance and immutability regarding medications data. This can be both a quality control approach, as well as security approach, as compromised medications data can and does impact medical care.
  3. Matching drugs to ATC codes. This is achieved using DyMap, a proprietary system developed by Dynaccurate for large scale medications mappings.
  4. Match ATC codes to pharmacogenomic flags using AI technology in semantic interoperability. Again we use DyMap for this task, however, we only prepare the mappings. The actual confirmation for mappings – i.e. confirming which drugs have pharmacogenomic flags – is carried out by our Canadian partner Pillcheck.

Publication

The final part of our project will be to publish the collated data in an online database available here at www.mdip.eu. The website will be fully update at that point, hosting the database GUI, data artefacts gathered and a project report. A good part of the effort in this involves building the database and GUI, which – along with developing parsers – takes up most of our development time.