"If nothing in biology makes sense except in the light of evolution, ...the modern view of disease holds no meaning whatsoever." -Nick Lane

Thursday, April 29, 2010

Bioinformatic analysis of Ephemeroptera

This semester I have been creating a workflow using Taverna to help Dr. Ogden at UVU with his analysis of mayflies and dragonflies.
Taverna is a platform used to create scientific workflows.  There is a drag and drop interface where existing services can be drug to the workflow and integrated with your experiment.  Taverna integrates with MyExperiment which makes it easy to publish and share your workflows. Dr Ogden has a particular problem he is trying to solve with his research that has not been addressed yet using this platform so many custom components have to be developed using Java in order to complete this project.  This semester, I got something working but there is still a lot of work to do.
My workflow uses queries genbank and gets all of the sequences for the Ephemeroptera order.  It then extracts the taxa and the element name from each one.  There are a lot of duplicates so it then sorts these by date and gets the most recent, also if one sequence is marked as complete and another as partial it will use the complete one.  For each element, if there are enough taxa for it, it will create a dataset for that element.  Wingless for example only has 3 taxa so that one is skipped.  
Then it does a multiple alignment for each dataset.  After that, it creates a merged dataset, for all of the taxa that appear in enough of the datasets (if a given taxa doesn't appear in enough of the individual datasets it is left out.)
The next step is to do a phylogenetic analysis on the final merged dataset.  I am still experimenting with how to do this.  Taverna likes to do everything with webservices but this dataset is large so that might not be practical.  So, it would probably be best to fire off a mr bayes or paup block on the local machine.  Anyway, that part is relatively easy anyway.
Over the summer I will be experimenting with creating a slick web interface which I will design using Open Lazlo technology which is a rich web interface programming language that I have been learning on the side. 

My workflow needs to be smart enough to automatically create a merged dataset for any order specified but also flexible enough that the user can customize the way that the data set is created and analyzed.  Currently, a lot of time is spent manually creating these datasets and this project will not only speed up this process but make the research more traceable and consistent. 
Dr Ogden was impressed by my demo and believes that this project may be publishable.  Currently, there are not workflow components out there this fill this need.  If I make this workflow and the components flexible enough, then other researchers may be able to easily integrate them into their own research projects and this could potentially be a big contribution to the bioinformatics community!



Monday, April 26, 2010

Alu elements and uniquely human traits

Just finished a paper about alu proliferation in primates and the connection to higher cognition in humans.  I hope you like it!  Download it here

Monday, April 5, 2010

Did population bottlenecks 1mya and before the great leap forward accelerate human evolution?

Here is a possible question for my paper:
Did population bottlenecks 1mya and before the great leap forward accelerate human evolution?

Han and Xing argue in their 2005 paper, Under the genomic radar, that the capacity for alu proliferation can lurk in the primate genome for millions of years.  This is because a low activity variant known as a stealth driver may occasionally spawn a high activity daughter sequence.  These daughter sequences and it's spawn are likely to be purged by natural selection but the parent stealth driver would be untouched.  It would not be active enough itself for natural selection to even be able to see it.  In this sense Alu sequences have evolved to subvert natural selection.  If Hedges is correct, then even this highly active daughter sequence might be able to subvert natural selection if there is a population bottleneck.  In such a bottleneck, these highly active daughter sequences would be permitted to wreak their havoc while any disastrous insertions would have to be selected out one by one.  Therefore, population bottlenecks promote alu proliferation.  (See my more detailed summary of these 2 articles in my previous post: Did we evolve to evolve?)

In their 2009 article "Mobile elements reveal small population size in the ancient ancestors of Homo sapiens" Huff, Rogers, et al up at the University of Utah compared the complete genomes of 2 individuals.  They say that back in the 80s it was predicted that regions of the genome with a rare insertion event (such as an alu insertion) are twice as old as regions without such an event.  This is true because a region with such an event has 2 sub-regions one before and one after.  They observed that regions with an insertion event had twice the nucleotide diversity which confirms this idea.  They determined that the average time since the most recent common ancestor for regions with an insertion event was just under a million years (half this for other regions.)  By studying these regions they were able to determine which high confidence that the effective population size a million years ago was less than 26,000 and probably closer to 18,500.  This surprisingly low number may be explained by a series of population bottlenecks or different competing populations of homo replacing one another.
Now, this paper used alu sequence data as a marker to determine that there was a population bottleneck a million years ago.  It doesn't say anything about the possibility that this bottleneck facilitated increased alu proliferation which is what I would expect from reading Xing and Hedges.  Whether or not I can connect these 2 things is something that I will have to determine. 
In another 2009 paper from the U "Mobile elements create structural variation" Xing, Huff, et al show how alu proliferation can contribute to structural variation in the human genome.  They compared sequences in the HuRef sequencing project to those from the human genome project.  They focused on insertions and deletions that were particular to the HuRef genome so that they could identify structural variations that are polymorphic in humans.  They found that as many as a third of these polymorphic alu insertions occurred within genic regions.  However,all but 3 were not within exons.  But as I discussed in previous posts, alu sequences within introns have been shown in some cases to regulate the expression of the parent gene so I believe these variations could potentially be significant.  I have a lot of literature supporting this stuff.  Also, as Wray (see previous post) argues phenotypic variation arising from regulatory changes may be more directly shapable by natural selection because they tend to be co-dominant. 
My next step is to find the connection between the literature about bottlenecks facilitating proliferation and this stuff about alu sequences generating structural variation in the genome.  Is there something I can test here using sequence data available on genbank?  The accession numbers for the alus studied in the structural varation paper are on there, what can I do with them?
After I have connected those things, then I want to show that selection for higher cognition molded the human brain from this novel variation.
To sum up what I have to work with so far:
Rogers et al: Analyzing alu sequence insertions can tell us about past population bottlenecks
Han and Xing: The stealth driver model  predicts that these bottlenecks promote alu proliferation
Xing Huff: Alu proliferation creates structural variation (This is the part that I think I will be able to do some sequence analysis.  All of the alus studied in the paper are in genbank)
I have lots of material explaining how natural selection can take advantage of this varation (exonization, alternative splicing, A-I editing, etc) to build the human brain.

In connecting all of these things, I want to answer the question at the top of this post. Did past bottlenecks, accelerate alu proliferation and more importantly, was this extra variation needed to produce the human brain?  Or in other words is it selection pressure which determines the outcome or is natural selection constrained by available materials?