Saturday, August 31, 2013

It has begun

The conference kicked off yesterday. I would have wrote about it last night, but afterwards I went and found some pizza for dinner, came back to the hotel, did the jet lag zombie thing and fell asleep. Still, it's getting better, I didn't wake up until just after 5 this morning and there was some cold pizza waiting for me.

First observation. It's a lot bigger than I thought it would be. I have no idea why but on the basis of no evidence what so ever, I had assumed that there would be 2, maybe 300 people. There's at least 500, probably closing in on 600.

Second observation, related to some of the tea room conversation I had just before I left NZ. Casual glance at the speaker list ~ 1 in 10 speakers are female. Casual glance at the audience, I'm guessing ~ 2 in 5 are female. I could be wrong with those numbers, but not sufficiently wrong to prevent me from asking what gives? Disappointing.

The first keynote speaker, Stuart Kauffmann was apparently in the building but had gone missing so we launched straight into the science. It took me a few minutes to realize that we had launched straight in with a talk on nucleosome mediated epi-genetics. Cis-regulation of genes (not transcription factor based). Interesting at the mechanism level yes, but it didn't really grab me.

Marc Vidal, the second speaker talked about exploring the unexplored regions of the human interactome. Of which funnily enough there is quite a bit. He was doing this multiple ways - text mining literature for protein protein interactions  was a big one, though I thought the fact that interactions mentioned only once weren't as trustworthy as interactions mentioned multiple times was a bit of stating the obvious.
So the immediate thing that I already have out of this conference is text mining. I've been aware of it's existence obviously, but not so much of it's utility applied to the field of systems biology.

Kauffmanns talk about personalized medicine had good points and bad. The bad being that he'd prepared it on the plane and it was a bit ... rambling or vague at times. Our health care system, or at least the US health care system is still in the mindset where you find one drug to treat one conditions while trying to minimize the side effects. So this talk for me, paralleled Sorens talk from the workshop. Kauffmann was talking more about extracting the data directly from the biology, something that I suspect is still a little out of our grasp, given the magnitude of the system involved (people). Sorens was a more realistic, data driven approach, extracting information from clinical reports at a coarser grain than desired by Kauffmann.

Of the industry keynotes, the chap from SGI, Eng Lim Goh was both entertaining and informative. Briefly (and humorously) comparing systems biologists to the NSA in our desire to collect all the information in case there is something that we don't know about that we don't know is in the data we currently collect. The amount of data SGI deals with is staggering - they deal with NASA and paypal and all sorts of other big data people as well. Though if I read one slide right, their 2nd biggest customer is the people who are working with the wheat genome. Also from the very random cool factoid box, when the square kilometer array gets turned on, it's going to be generating data equivalent to youtube. Every day.

Interesting if not mind bending keynotes. And a couple good of random conversations at the drinks afterwards. Productive first day. Can't help but feeling I won't be getting to the meat of the conference today. Not sure what I'm doing this afternoon, but I'm definitely hitting the Temporal Phenomena across biological timescales session this morning. For now, it's either ty and get another hours sleep or coffee. Can't decide.

Friday, August 30, 2013

While it's fresh-ish

There were four talks at the workshop that I went to yesterday. All of them were good - I was in what was essentially a classroom listening to talks for close to four hours and didn't once get drowsy, a very good sign. I wrote about one of them yesterday, one was from Jennifer Becq from Illumina, largely about exactly how fast they can go from taking a sample to finding variants in a persons genome - thinking finding the mutations that might be responsible for someones cancer. It's very fast, as in less than a week. Very cool stuff they're doing there.  And one was from Fiona Nielsen of DNAdigest, the people who were running the workshop. They've got some nice ideas about how to make it easier to share data, increasing the power of all the data that is currently locked up in various public institutions.

The stand out talk of the workshop for me though, was from Mette Nyegaard of Aarhus University. There is an extended family with a genetic predisposition towards deafness. Some of the family are born deaf. Some become deaf when they are six. Some become deaf in their twenties. She started off with linkage analysis to identify where in the genome the problem is - she started with 50 known genes and 80 loci (where the general area is known but not the actual gene is located) involved with hearing loss. Her analysis narrowed this down to 1 loci. Which was a nice start.

There were multiple false starts with candidate genes being misidentified - sequencing and analysis eventually knocking out various contenders. There were some interesting bits in here that I don't fully understand, I need to read up on her work, but there were candidate genes that looked iffy, which once sequenced were shown to be completely normal - there was a pseudo gene duplicated from the same gene, in the same region that was making it look damaged. And another one that appeared to be responsible but then turned out to be present in the greater population with no ill effect.

She finally tracked it down to a 18 base pair deletion which had initially been interpreted as a frame shift mutation. That 18bp deletion though appears to be a sorting signal responsible for moving the protein from the cell surface to the lysosome  (where it would normally be degraded). Even with all the work she's done, it's not guaranteed that this signal is the cause of the deafness. This is the only likely candidate in the coding region of the loci. If this doesn't pan out - then it will be off to look at the regulatory regions of the loci, to see if there's anything abnormal there. 

The next step is apparently an animal model (the signal part of the protein is highly conserved across many species) . Which could be tricky - another reason to be impressed is that none of this research has been externally funded in any way shape or form. It's taken a long time because of that. Animal models however require money. Imagine though if she did get an animal model, you could get a time course of expression in the cells in the ear where the problem occurs. And then possibly go further than identifying the mutation that causes the problem, but the process behind it.

Sigh. I don't really feel like I've done justice to her talk here, it was yesterday and I need coffee. I'm going to have to do some reading. It was a thoroughly good talk though. I'd like to try and make this clearer, but it'll have to wait. The conference proper starts today. And I have a sneaking suspicion that as of tomorrow I'm going to be inundated with things I'd like to write about.

Adventures in Daneland

So, I'm in Copenhagen for the International conference on Systems Biology (ICSB). The conference hasn't even started yet and I'm already having an (intellectual) ball. There was a workshop this afternoon I attended on the journey from bioinformatics to medical informatics run by a group called DNAdigest - a bunch of open data/open science enthusiast by the sounds of it.

First up was a chap by the name of Soren Brunak - apparently one of the founding fathers of bioinformatics in Denmark. Technically, I think the talk was about the intention to align fine grained phenotypic information with genomic overlays.

First off he asked us to consider that we do, in fact do large amounts of human experimentation - another description of hospitals. The trouble is that we don’t collect the data. Denmark has an advantage here, in that it has an opt-out system with regards to the collection of health related data. Even better, there is a standard international vocabulary ICE10 if I recall correctly, that is used to describe patient symptoms. Better still, the data is not anonymised, meaning that you can correlate incidence of disease with income level or environmental factors from where people live or work. My first thought here was obviously that this is a data scientists wet dream. There’s so much potentially relevant data here that is otherwise lost in the usual anonymization process.

My second thought was, I hope, equally obvious. This is a complete nightmare from the perspective of a privacy advocate. There’s significant amounts red tape that Soren’s lab has to go through to get the data and by all accounts, the data is stored by Soren’s group more securely than at the hospitals, but I just have a large amount of difficulty imagining the amount of trust that the population of Denmark have in their health system to allow the aggregation of such data in the first place let alone allowing it to be accessed by researchers. I certainly don’t have this amount of confidence in the NZ health system. I doubt they could coordinate to actually collect the data in the first place.

There are of course, problems with dealing with this sort of data. Soren described systems biology as the movement from a thinking about single genes to thinking about all the genes. A description that quite frankly, I like. One of the problems though is that in the messy environment of the real world, removed from the lab where you can isolate the effect of a single gene or its response to a single drug, there are often multiple diseases interacting with multiple drugs. Comorbidities, I believe he called them.

And there’s a number of natural language processing problems extracting the phenotypic data from doctors records. They think a lot of that has been sorted though. And they haven’t got the all the genomic data yet. Despite that, there is still utility in the processing of the data. Correlations between … I don’t know the word here - certain sub-genres of diseases and other diseases are being discovered. Between certain types of schizophrenia and say kidney disease. This could, conceivably suggest better treatments for people exhibiting similar comorbidities. Or alternatively suggest places to looking for genetic clues when ceratin comorbidities aren’t present.

There’s also the possibility of looking for adverse drug reactions (ADRs) amongst all this data. Often when a drug is tested it’s interaction is tested with some of the other drugs that it is likely to be prescribed with to look for ADRs. In reality though, drugs are often given with drugs that they haven’t been tested with. It’s actually an argument that I have heard before from woo-meisters as to exactly why drugs made by big pharma are all bad mmmkay? People who seem to think that every possible of interaction of a drug with every possible other drug and a given human should be tested before it’s considered safe.
In Denmark where there are almost 7500 different approved drugs, is a lot of testing. Using the data from doctors records - especially from mental patients who are often given significant drug cocktails they are attempting to identify possible combinations of drug interactions for certain types of patients that  haven’t previously been identified. The theory being that there might be drugs that work well for certain people for certain diseases that could actually be dangerous for others with other people with similar conditions that are interacting with another set of diseases.

And then there’s the whole trying to work in temporal data into the equation. Which disease arrived first? Making the patient susceptible to what other conditions. All very interesting stuff. It doesn’t sound like they’ve managed to integrate much in the way of sequence data yet, but there’s still some interesting correlations (stressing correlation rather than causal relationships) popping up.

And for my money, that was the 2nd most interesting talk of the workshop this afternoon. I’ll try and write about the one using linkage analysis, sequencing and mouse models to determine the genetic causes of deafness in an extended family this evening. Very cool stuff from a woman by the name of Mette Nyegaard. Apparently there’s a significant correlation between deafness and kidney disease. 

For now though I have to go find someplace for dinner.