So, I'm in Copenhagen for the International conference on Systems Biology (ICSB). The conference hasn't even started yet and I'm already having an (intellectual) ball. There was a workshop this afternoon I attended on the journey from bioinformatics to medical informatics run by a group called DNAdigest - a bunch of open data/open science enthusiast by the sounds of it.
First up was a chap by the name of Soren Brunak - apparently one of the founding fathers of bioinformatics in Denmark. Technically, I think the talk was about the intention to align fine grained phenotypic information with genomic overlays.
First off he asked us to consider that we do, in fact do large amounts of human experimentation - another description of hospitals. The trouble is that we don’t collect the data. Denmark has an advantage here, in that it has an opt-out system with regards to the collection of health related data. Even better, there is a standard international vocabulary ICE10 if I recall correctly, that is used to describe patient symptoms. Better still, the data is not anonymised, meaning that you can correlate incidence of disease with income level or environmental factors from where people live or work. My first thought here was obviously that this is a data scientists wet dream. There’s so much potentially relevant data here that is otherwise lost in the usual anonymization process.
My second thought was, I hope, equally obvious. This is a complete nightmare from the perspective of a privacy advocate. There’s significant amounts red tape that Soren’s lab has to go through to get the data and by all accounts, the data is stored by Soren’s group more securely than at the hospitals, but I just have a large amount of difficulty imagining the amount of trust that the population of Denmark have in their health system to allow the aggregation of such data in the first place let alone allowing it to be accessed by researchers. I certainly don’t have this amount of confidence in the NZ health system. I doubt they could coordinate to actually collect the data in the first place.
There are of course, problems with dealing with this sort of data. Soren described systems biology as the movement from a thinking about single genes to thinking about all the genes. A description that quite frankly, I like. One of the problems though is that in the messy environment of the real world, removed from the lab where you can isolate the effect of a single gene or its response to a single drug, there are often multiple diseases interacting with multiple drugs. Comorbidities, I believe he called them.
And there’s a number of natural language processing problems extracting the phenotypic data from doctors records. They think a lot of that has been sorted though. And they haven’t got the all the genomic data yet. Despite that, there is still utility in the processing of the data. Correlations between … I don’t know the word here - certain sub-genres of diseases and other diseases are being discovered. Between certain types of schizophrenia and say kidney disease. This could, conceivably suggest better treatments for people exhibiting similar comorbidities. Or alternatively suggest places to looking for genetic clues when ceratin comorbidities aren’t present.
There’s also the possibility of looking for adverse drug reactions (ADRs) amongst all this data. Often when a drug is tested it’s interaction is tested with some of the other drugs that it is likely to be prescribed with to look for ADRs. In reality though, drugs are often given with drugs that they haven’t been tested with. It’s actually an argument that I have heard before from woo-meisters as to exactly why drugs made by big pharma are all bad mmmkay? People who seem to think that every possible of interaction of a drug with every possible other drug and a given human should be tested before it’s considered safe.
In Denmark where there are almost 7500 different approved drugs, is a lot of testing. Using the data from doctors records - especially from mental patients who are often given significant drug cocktails they are attempting to identify possible combinations of drug interactions for certain types of patients that haven’t previously been identified. The theory being that there might be drugs that work well for certain people for certain diseases that could actually be dangerous for others with other people with similar conditions that are interacting with another set of diseases.
And then there’s the whole trying to work in temporal data into the equation. Which disease arrived first? Making the patient susceptible to what other conditions. All very interesting stuff. It doesn’t sound like they’ve managed to integrate much in the way of sequence data yet, but there’s still some interesting correlations (stressing correlation rather than causal relationships) popping up.
And for my money, that was the 2nd most interesting talk of the workshop this afternoon. I’ll try and write about the one using linkage analysis, sequencing and mouse models to determine the genetic causes of deafness in an extended family this evening. Very cool stuff from a woman by the name of Mette Nyegaard. Apparently there’s a significant correlation between deafness and kidney disease.
For now though I have to go find someplace for dinner.