"If nothing in biology makes sense except in the light of evolution, ...the modern view of disease holds no meaning whatsoever." -Nick Lane

Thursday, April 29, 2010

Bioinformatic analysis of Ephemeroptera

This semester I have been creating a workflow using Taverna to help Dr. Ogden at UVU with his analysis of mayflies and dragonflies.
Taverna is a platform used to create scientific workflows.  There is a drag and drop interface where existing services can be drug to the workflow and integrated with your experiment.  Taverna integrates with MyExperiment which makes it easy to publish and share your workflows. Dr Ogden has a particular problem he is trying to solve with his research that has not been addressed yet using this platform so many custom components have to be developed using Java in order to complete this project.  This semester, I got something working but there is still a lot of work to do.
My workflow uses queries genbank and gets all of the sequences for the Ephemeroptera order.  It then extracts the taxa and the element name from each one.  There are a lot of duplicates so it then sorts these by date and gets the most recent, also if one sequence is marked as complete and another as partial it will use the complete one.  For each element, if there are enough taxa for it, it will create a dataset for that element.  Wingless for example only has 3 taxa so that one is skipped.  
Then it does a multiple alignment for each dataset.  After that, it creates a merged dataset, for all of the taxa that appear in enough of the datasets (if a given taxa doesn't appear in enough of the individual datasets it is left out.)
The next step is to do a phylogenetic analysis on the final merged dataset.  I am still experimenting with how to do this.  Taverna likes to do everything with webservices but this dataset is large so that might not be practical.  So, it would probably be best to fire off a mr bayes or paup block on the local machine.  Anyway, that part is relatively easy anyway.
Over the summer I will be experimenting with creating a slick web interface which I will design using Open Lazlo technology which is a rich web interface programming language that I have been learning on the side. 

My workflow needs to be smart enough to automatically create a merged dataset for any order specified but also flexible enough that the user can customize the way that the data set is created and analyzed.  Currently, a lot of time is spent manually creating these datasets and this project will not only speed up this process but make the research more traceable and consistent. 
Dr Ogden was impressed by my demo and believes that this project may be publishable.  Currently, there are not workflow components out there this fill this need.  If I make this workflow and the components flexible enough, then other researchers may be able to easily integrate them into their own research projects and this could potentially be a big contribution to the bioinformatics community!



No comments:

Post a Comment