One way that alu sequences are recruited into the functional genome is through a process called exonization. 
Lei and Day explain that alu elements actually contain enhancer sequences that facilitate this process.
Ha¨sler and Katharina Strub explain that many alus are found within coding regions of genes.  They get there through exonization.  This can happen when an alu is inserted in the middle of an intronic region resulting in a new exon.  There are several possibilities for alternative splicing in this scenario. Perhaps only the alu sequence itself is included in the new exon or the portion of the intron upstream or downstream could potentially be included.  Not all potential splice sites are employed at the same frequency.  This is evidence of selection shaping how these genes are alternatively spliced.  Another interesting thing is that in every case, some kind of alternative splicing occurs in every gene that contains an exonized alu.  The addition of a new exon into a gene would probably involve a frameshift and would be presumably deleterious.  So, perhaps unless the original gene is preserved through alternative splicing any new configuration is eliminated through selection.
The exonization process depends on multiple subsequent mutations.  These have been mapped out in some cases by Singer and Shmitz in their paper: From Junk to Gene.  I wasn't able to look at the whole article yet, because I am waiting for the interlibrary loan but they actually map out the exact sequence of steps involved in the exonization of some particular alu sequences which is pretty amazing.
Since exonization events depend on multiple successive mutations and almost all of the time there will be no benefit for the organism there would need to be a large pool of proliferating alus to draw from in order to explore all of the possibilities. The human genome contains over a million such sequences.
Another area I have been reading about is in the role of alus in RNA editing.  Eisenberg explains that in addition to sometimes changing the products of mRNA transcripts, RNA editing may subtly effect the stability of the RNA molecule which in the end would effect expression.
Mattick links alu sequences to the function of memory formation in the human brain.  He points out that the ADAR2 protein which is involved in RNA editing binds to alu sequences.  ADAR2 is linked to cell signaling pathways involved in memory formation.  Mattick believes that this may be a way that an organisms environment and an individual's experiences can shape the way memories are formed.  
 Gommans and Maas describe RNA editing in terms of evolvability.  They argue that selection may favor systems with high levels of diversity and that the drive toward complexity may arise from the need to keep selfish elements under control.  They argue that the more complex an organism is, the more resilient it is against the effects of RNA editing.  This inevitably leads to tolerance and proliferation of alus that induce RNA editing.  This new editing increases the complexity of the organism even more which results in a feedback loop of ascending complexity.  
The debate about how much of the noncoding genome is functional is fundamentally misguided.  In the case of transposable elements in general and alus in particular, all are in some sense evolutionarily successful or they would not be there at all.  But are they functional?  I suppose it depends on your perspective.  I am reminded of Lane's musings on the troubled birth of the individual. While they generally share the same interests, even the cells in our own body cannot cooperate without mechanisms in place to force them into compliance.  All of the genes and other elements in the genome also have a strong incentive to cooperate because if the organism dies then none of them are passed on.  But, all elements in the genome must balance these common interests with their own individual interests and unlike rebellious cancerous cells which will never break free, selfish genetic elements are free to evolve and explore ways to pass themselves on.  To survive, all elements in the genome must find ways to pass themselves on. So, if a transposable element is recruited by natural selection to perform some function, this element has not lost some kind of battle.  From it's perspective, this is the best thing that can happen as being recruited to a task grants the element job security.  The same thing happens when a protein coding gene is duplicated.  The newly spawned gene must contribute something useful or risk being thrown out by natural selection.  Whether you think of the individual in terms of a colony of cells or a group of genes, nothing noteworthy, beautiful or complex can arise without some kind of conflict among the constituent parts.  Natural selection does not just magically push life toward complexity.  Conflict in the genome isn't about "parasitic" sequences "invading" the genome.  Rather, it is about millions of elements cooperating, if reluctantly, to form a cohesive whole.  Without conflict, natural selection would have nothing to work with at all.
"If nothing in biology makes sense except in the light of evolution,  ...the modern view of disease holds no meaning whatsoever." -Nick Lane
Monday, March 29, 2010
Monday, March 15, 2010
Did we evolve to evolve?
As I discussed in previous posts, ALU elements are unique to primates and the fact that their proliferation rates and transcription patterns are different in humans than other primates suggests that they may be responsible some uniquely human traits.  In "Under The Genomic Radar: The stealth model of alu amplification" Han and Xing, put forward some ideas about why some subfamilies of ALUs suddenly proliferate after millions of years with no activity.  They explain that the AluYa and AluYb alu subfamilies actually date back 18-26my.  However, the proliferation of these sequences has only occurred in the human lineage.  Since the parent sequences date much further back in the primate genome, it seems that the capacity for proliferation has been there for quite some time.
The traditional model has been that master ALU sequence continually spawns daughter sequences that proliferate throughout the genome. However, this simplistic model fails to account for the fact that proliferation rates vary widely over time and across taxa. Han and Xing explain their stealth driver model as an alternate explanation. They believe that a low activity sequence can lurk in the genome over long periods of time. Occasionally, this sequence may spawn a daughter sequence that is much more active, these daughter sequences are actually responsible for the vast majority of the activity.
These highly active daughter sequences are highly likely to be detrimental to the organism. Natural selection may weed these out, but the parent stealth driver would be invisible to natural selection so it would be allowed to persist to spawn another high activity daughter sequence in the future.
In this sense, ALU elements themselves have in fact evolved to evolve. However, there is no reason to think that they have our interests in mind. These stealth drivers lay low because they don't want to kill their hosts and end up dead themselves. But, if they lay low forever they will be drowned out by their competitors.
But when do they come out of their hiding place? According to Hedges in "Differential alu mobilization and polymorphism among human and chimp lineages" it may be that these sequences end up proliferating when natural selection is unable to weed them out. This would occur during a population bottleneck. I found this idea very intriguing because as Wray pointed out in my last post, some uniquely human characteristics may be regulated by ALU transcription in the human brain. For example, regulation of the prodynorphin gene which has roles in memory, emotional status and perception. If Hedges idea is true, then a population bottleneck 3 million years ago could have been responsible for the fact that certain subfamilies that have not expanded in other primate lineages ended up proliferating in the human lineage. Hedges explains that any deleterious ALU insertion would be selected out by natural selection. But, in a population bottleneck, natural selection may not be able to get rid of the high activity master ALU sequence itself. So, although deleterious changes would be weeded out as they appear, the sequences would continue to proliferate. The vast majority of those that persist would be either neutral or very slightly deleterious. The human lineage is known for its bushiness. This extra variation may have driven increased speciation events. Is it possible that these extra raw materials may have given rise to some unique human traits? I believe the answer might be yes.
I have a lot more on my list to read on this topic. My research has led to some papers coming from the U which is pretty cool. My goal is to shed some light on the unique path that human evolution has trod and perhaps show that whether or not it is always beneficial to the organism as a whole, selfish elements in our genome have evolved to evolve. Perhaps the friction among these rival elements are the spark that ignited human evolution.
The traditional model has been that master ALU sequence continually spawns daughter sequences that proliferate throughout the genome. However, this simplistic model fails to account for the fact that proliferation rates vary widely over time and across taxa. Han and Xing explain their stealth driver model as an alternate explanation. They believe that a low activity sequence can lurk in the genome over long periods of time. Occasionally, this sequence may spawn a daughter sequence that is much more active, these daughter sequences are actually responsible for the vast majority of the activity.
These highly active daughter sequences are highly likely to be detrimental to the organism. Natural selection may weed these out, but the parent stealth driver would be invisible to natural selection so it would be allowed to persist to spawn another high activity daughter sequence in the future.
In this sense, ALU elements themselves have in fact evolved to evolve. However, there is no reason to think that they have our interests in mind. These stealth drivers lay low because they don't want to kill their hosts and end up dead themselves. But, if they lay low forever they will be drowned out by their competitors.
But when do they come out of their hiding place? According to Hedges in "Differential alu mobilization and polymorphism among human and chimp lineages" it may be that these sequences end up proliferating when natural selection is unable to weed them out. This would occur during a population bottleneck. I found this idea very intriguing because as Wray pointed out in my last post, some uniquely human characteristics may be regulated by ALU transcription in the human brain. For example, regulation of the prodynorphin gene which has roles in memory, emotional status and perception. If Hedges idea is true, then a population bottleneck 3 million years ago could have been responsible for the fact that certain subfamilies that have not expanded in other primate lineages ended up proliferating in the human lineage. Hedges explains that any deleterious ALU insertion would be selected out by natural selection. But, in a population bottleneck, natural selection may not be able to get rid of the high activity master ALU sequence itself. So, although deleterious changes would be weeded out as they appear, the sequences would continue to proliferate. The vast majority of those that persist would be either neutral or very slightly deleterious. The human lineage is known for its bushiness. This extra variation may have driven increased speciation events. Is it possible that these extra raw materials may have given rise to some unique human traits? I believe the answer might be yes.
I have a lot more on my list to read on this topic. My research has led to some papers coming from the U which is pretty cool. My goal is to shed some light on the unique path that human evolution has trod and perhaps show that whether or not it is always beneficial to the organism as a whole, selfish elements in our genome have evolved to evolve. Perhaps the friction among these rival elements are the spark that ignited human evolution.
Monday, March 8, 2010
ALU elements and human evolution
One of the most exciting areas around noncoding sequences are the discoveries being made around ALU elements expressed in the brain.  This article is a good review of the current research.  A 2006 paper by Hasler and Strub "ALU elements as regulators of gene expression" in the journal Nucleic Acids Research explains that ALU elements arose exclusively in the primate lineage and are implicated in shaping the evolution of the primate brain.
ALUs have been shown to be involved in alternative splicing. They have also been shown to edit mRNAs in other ways as well. The paper talks about a process called exonization where a previously intronic sequence is recruited into the coding region of a gene. When this area includes an ALU element, there is potential for this exon to be alternatively spliced. Computational studies have confirmed that this does indeed happen.
Since ALUs evolved very recently, they all share much similarity. This makes it easier to do bioinformatic studies on them. Also, the fact that they are exclusive to the primate lineage is exciting because they may be the secret that sets humans apart.
ALUs have been shown to be involved in alternative splicing. They have also been shown to edit mRNAs in other ways as well. The paper talks about a process called exonization where a previously intronic sequence is recruited into the coding region of a gene. When this area includes an ALU element, there is potential for this exon to be alternatively spliced. Computational studies have confirmed that this does indeed happen.
Since ALUs evolved very recently, they all share much similarity. This makes it easier to do bioinformatic studies on them. Also, the fact that they are exclusive to the primate lineage is exciting because they may be the secret that sets humans apart.
Junk DNA: how much really is junk?
Junk DNA:
The question of how much of our junk DNA may turn out to be functional is a fascinating topic. I wrote a paper on this topic about 5 years ago based largely on the work of John Mattick. Mattick argues that because most of the genome is actually transcribed, and because it seems to be transcribed in a programmatic way (ie some noncoding areas are transcribed in a pattern dissimilar to adjacent genes) then most of the genome is probably functional. When I wrote the paper, I agreed with this idea but I have a more nuanced position today I think. A few weeks before I finished my report, a paper was published in nature showing that a mouse that had many noncoding regions deleted from its genome was clinically equivalent to other mice. Among the noncoding regions that were deleted from the mouse's genome were areas referred to as "ultra conserved regions." Some UCR's are more conserved in vertebrates than some of the most constant proteins. If these sequences were conserved so pristinely by natural selection, they must have a crucial function. But, this experiment seemed to show that was not the case. But, there is still a mystery here. If these sequences were not crucial for the survival of the mouse, why were they conserved over the course of 100 million years of evolution?
Going back to my notes of Power, Sex, Suicide by Nick Lane, on page 187 he talks about how the extra DNA in eukaryotes may function as raw material for new genes. Bacteria were under selection for trim genomes because fast division was important. It's not that eukaryotes are under selection for bigger genomes so that they can later evolve new genes. Natural selection cannot and does not plan ahead. Therefore, the tendency toward bigger genomes is more likely because mutations that add something, anything to the genome are more likely to be neutral than ones that delete something. Therefore, if eukaryotes are not under selection for small genomes, then they will tend to collect junk over time.
Why our 'junk DNA' may be useful after all:
Pearson, Aria New Scientist; 7/14/2007, Vol. 195 Issue 2612, p42-45, 4p
Finally a balanced and reasonable article on the subject of junk DNA. Pearson points out that we have actually known about functional noncoding DNA since the early 70s. In Mattick's writings, he accuses the scientific establishment of completely ignoring noncoding regions for decades. From what I've been reading, I don't think that this is actually the case. The reason why science has focused on coding regions in because they are better understood and mutations in them are more easily pinpointed.
The article compares junk DNA to bloatware on new computers. Some of it may appear to be doing something and appear to be functional but whether or not it is doing us any good is debatable. This is an interesting analogy and I can't help but think it must be true in many cases.
Many researchers believe that most noncoding RNAs that are transcribed are really just noise that is generated by the transcription of nearby genes. However, Gingeras argues that this can't explain a lot of non coding transcription because many non coding RNAs are transcribed in areas where there are no genes nearby. I wonder if a lot of this is like the bloatware analogy. Perhaps something is happening but that doesn't mean that whatever is happening is necessarily benefiting the organism.
One interesting observation that Mattick makes is that long noncoding RNAs transcribed in the mouse brain are transcribed differently than the genes that they are closest to. This suggests that their transcription is not an accident but must be controlled programmatically.
What this article shows is that the debate is not about whether or not any sequences outside of protein coding sequences are functional. We have known that many are for decades. The debate is about how much will end up being functional. Mattick argues that more than 50% of the genome is functional and perhaps up to 80% or 90% based on the amount of the genome that is transcribed. Other researchers put the estimate at below 5% of the genome and there are plenty in between.
This is a fascinating debate. To me it seems unlikely that Mattick is right. He cannot explain why some species such as the puffer fish can get away with such a trim genome or why some relatively simple organisms such as some species of amoeba have so much junk DNA that their genomes are actually larger than humans.
But, at the same time, even if the most conservative biologists are right, if less than 5% of the genome is functional, that is still a lot of functional non coding DNA that we don't understand yet!
This paper talks about the experiment that deleted ultra conserved regions from the mouse. One explanation that was given was that perhaps the effect of these regions is subtle. If the sequence made the mouse just 1% more likely to survive then it would be preserved. I don't like this explanation. If the effect was really that subtle, then it would be more likely to be able to evolve over time. If these sequences are more preserved then protein coding genes then subtle effects do not explain their preservation.
Another explanation given by Kelly Frazer is that redundancy could be built in and there were other regions that compensated for the ones that were deleted. I don't like this explanation either. If there really is this much redundancy then how could the region be more conserved than protein coding genes? I don't see how natural selection could keep these regions so pristine if there is redundancy in the system.
This is a great mystery to me because I don't agree with any of the explanations put forward. There are many potential angles for my thesis here.
Sean Carroll:
Scientific American May 2008: Regulating Evolution
This article in Scientific American by one of my favorite authors Sean Carroll (Endless Forms Most Beautiful) explores the evolution of "enhancers" which he refers to as "switches." Eukaryotes uniquely promote transcription of their genes through these switches which can appear long before, long after or even with in introns! They are hard to detect experimentally which is why many genetic mutations have been determined to be regulatory in nature even though the exact mutation remains elusive.
The article explores some examples of phenotypes that can be modified by these regulatory sequences without affecting how the gene is expressed in other parts of the body or in other life stages.
One of the main source articles that Carroll references is Wray's paper in nature which I review below:
Gregory Wray: March 2007 Nature Reviews Genetics The evolutionary Significance of cis-regulatory mutations
Wray argues that cis-regulatory and protein coding mutations may be phenotypically distinct. He gives 2 reasons for this.
1. Each allele in a diploid organism is transcribed independently. Therefore, mutations in regulatory regions tend to be co-dominant whereas structural mutations may not be visible to natural selection till genetic drift takes place to the point where there are significant numbers organisms homozygous for the mutation.
2. Cis-regulatory mutations may only affect the organism in particular tissue or life stage whereas structural mutation will affect organism everywhere protein is expressed. There is potential for alternative splicing to cushion this affect but clear examples of this are rare.
One example he discusses is lactose tolerance in adults in northern Europeans. The switch that enabled this was actually located inside a intron in the gene that was affected. This is interesting because Mattick talks a lot about the potential for introns to evolve function. This is a great example of one that did.
Another example comes from comparisons of microarrays of gene expression in the brains of humans and chimps. The levels are expression are different for more than 10% of the genes that are expressed. The author points out that this is actually probably an underestimate. It is unknown where in the genome the regulatory sequences that affected this expression is encoded. Mattick has argued that trans-regulation may be at work, that is regulatory sequences no where near the genes being expressed.
Another example is the increased expression of prodynorphin in humans relative to other primates. This gene is involved in emotional status and perception of pain. A mutation in the regulatory sequence of this gene is responsible for the increased expression in humans.
I have a lot of material to digest here. There are a lot of angles I can take this. Mattick may be wrong that most of the genome is functional, but from what I am reading that may not matter. Even if only a small percentage is functional that doesn't change the fact that noncoding sequences may be the main drivers of eukaryotic evolution.
The question of how much of our junk DNA may turn out to be functional is a fascinating topic. I wrote a paper on this topic about 5 years ago based largely on the work of John Mattick. Mattick argues that because most of the genome is actually transcribed, and because it seems to be transcribed in a programmatic way (ie some noncoding areas are transcribed in a pattern dissimilar to adjacent genes) then most of the genome is probably functional. When I wrote the paper, I agreed with this idea but I have a more nuanced position today I think. A few weeks before I finished my report, a paper was published in nature showing that a mouse that had many noncoding regions deleted from its genome was clinically equivalent to other mice. Among the noncoding regions that were deleted from the mouse's genome were areas referred to as "ultra conserved regions." Some UCR's are more conserved in vertebrates than some of the most constant proteins. If these sequences were conserved so pristinely by natural selection, they must have a crucial function. But, this experiment seemed to show that was not the case. But, there is still a mystery here. If these sequences were not crucial for the survival of the mouse, why were they conserved over the course of 100 million years of evolution?
Going back to my notes of Power, Sex, Suicide by Nick Lane, on page 187 he talks about how the extra DNA in eukaryotes may function as raw material for new genes. Bacteria were under selection for trim genomes because fast division was important. It's not that eukaryotes are under selection for bigger genomes so that they can later evolve new genes. Natural selection cannot and does not plan ahead. Therefore, the tendency toward bigger genomes is more likely because mutations that add something, anything to the genome are more likely to be neutral than ones that delete something. Therefore, if eukaryotes are not under selection for small genomes, then they will tend to collect junk over time.
Why our 'junk DNA' may be useful after all:
Pearson, Aria New Scientist; 7/14/2007, Vol. 195 Issue 2612, p42-45, 4p
Finally a balanced and reasonable article on the subject of junk DNA. Pearson points out that we have actually known about functional noncoding DNA since the early 70s. In Mattick's writings, he accuses the scientific establishment of completely ignoring noncoding regions for decades. From what I've been reading, I don't think that this is actually the case. The reason why science has focused on coding regions in because they are better understood and mutations in them are more easily pinpointed.
The article compares junk DNA to bloatware on new computers. Some of it may appear to be doing something and appear to be functional but whether or not it is doing us any good is debatable. This is an interesting analogy and I can't help but think it must be true in many cases.
Many researchers believe that most noncoding RNAs that are transcribed are really just noise that is generated by the transcription of nearby genes. However, Gingeras argues that this can't explain a lot of non coding transcription because many non coding RNAs are transcribed in areas where there are no genes nearby. I wonder if a lot of this is like the bloatware analogy. Perhaps something is happening but that doesn't mean that whatever is happening is necessarily benefiting the organism.
One interesting observation that Mattick makes is that long noncoding RNAs transcribed in the mouse brain are transcribed differently than the genes that they are closest to. This suggests that their transcription is not an accident but must be controlled programmatically.
What this article shows is that the debate is not about whether or not any sequences outside of protein coding sequences are functional. We have known that many are for decades. The debate is about how much will end up being functional. Mattick argues that more than 50% of the genome is functional and perhaps up to 80% or 90% based on the amount of the genome that is transcribed. Other researchers put the estimate at below 5% of the genome and there are plenty in between.
This is a fascinating debate. To me it seems unlikely that Mattick is right. He cannot explain why some species such as the puffer fish can get away with such a trim genome or why some relatively simple organisms such as some species of amoeba have so much junk DNA that their genomes are actually larger than humans.
But, at the same time, even if the most conservative biologists are right, if less than 5% of the genome is functional, that is still a lot of functional non coding DNA that we don't understand yet!
This paper talks about the experiment that deleted ultra conserved regions from the mouse. One explanation that was given was that perhaps the effect of these regions is subtle. If the sequence made the mouse just 1% more likely to survive then it would be preserved. I don't like this explanation. If the effect was really that subtle, then it would be more likely to be able to evolve over time. If these sequences are more preserved then protein coding genes then subtle effects do not explain their preservation.
Another explanation given by Kelly Frazer is that redundancy could be built in and there were other regions that compensated for the ones that were deleted. I don't like this explanation either. If there really is this much redundancy then how could the region be more conserved than protein coding genes? I don't see how natural selection could keep these regions so pristine if there is redundancy in the system.
This is a great mystery to me because I don't agree with any of the explanations put forward. There are many potential angles for my thesis here.
Sean Carroll:
Scientific American May 2008: Regulating Evolution
This article in Scientific American by one of my favorite authors Sean Carroll (Endless Forms Most Beautiful) explores the evolution of "enhancers" which he refers to as "switches." Eukaryotes uniquely promote transcription of their genes through these switches which can appear long before, long after or even with in introns! They are hard to detect experimentally which is why many genetic mutations have been determined to be regulatory in nature even though the exact mutation remains elusive.
The article explores some examples of phenotypes that can be modified by these regulatory sequences without affecting how the gene is expressed in other parts of the body or in other life stages.
One of the main source articles that Carroll references is Wray's paper in nature which I review below:
Gregory Wray: March 2007 Nature Reviews Genetics The evolutionary Significance of cis-regulatory mutations
Wray argues that cis-regulatory and protein coding mutations may be phenotypically distinct. He gives 2 reasons for this.
1. Each allele in a diploid organism is transcribed independently. Therefore, mutations in regulatory regions tend to be co-dominant whereas structural mutations may not be visible to natural selection till genetic drift takes place to the point where there are significant numbers organisms homozygous for the mutation.
2. Cis-regulatory mutations may only affect the organism in particular tissue or life stage whereas structural mutation will affect organism everywhere protein is expressed. There is potential for alternative splicing to cushion this affect but clear examples of this are rare.
One example he discusses is lactose tolerance in adults in northern Europeans. The switch that enabled this was actually located inside a intron in the gene that was affected. This is interesting because Mattick talks a lot about the potential for introns to evolve function. This is a great example of one that did.
Another example comes from comparisons of microarrays of gene expression in the brains of humans and chimps. The levels are expression are different for more than 10% of the genes that are expressed. The author points out that this is actually probably an underestimate. It is unknown where in the genome the regulatory sequences that affected this expression is encoded. Mattick has argued that trans-regulation may be at work, that is regulatory sequences no where near the genes being expressed.
Another example is the increased expression of prodynorphin in humans relative to other primates. This gene is involved in emotional status and perception of pain. A mutation in the regulatory sequence of this gene is responsible for the increased expression in humans.
I have a lot of material to digest here. There are a lot of angles I can take this. Mattick may be wrong that most of the genome is functional, but from what I am reading that may not matter. Even if only a small percentage is functional that doesn't change the fact that noncoding sequences may be the main drivers of eukaryotic evolution.
Monday, March 1, 2010
Junk DNA and the foundation of eukaryotic complexity
I've been considering some new ideas for my capstone thesis.  In 2004 for my science writing class, I wrote a paper about the possible function of Junk DNA.  You can download and read my paper here.  I was reluctant to post it because when I reread it now I don't necessarily agree with all of the conclusions that I came to.  Now, I think that the idea that most junk DNA is regulatory is probably just wrong.  I argued that biological complexity scales better with genome size than with gene count and that perhaps some junk DNA has a regulatory function. I explored the potential roles that micro RNAs, RNA interference and introns might play in the specification of complexity in higher eukaryotes.  My main reference was the work of John Mattick who argued that the genome may contain regulatory networks he called Endogenous Controlled Multitasking.  I just did a search on him on nature.com and he has done a lot more work since I last looked at him mostly in the area of micro RNAs.  This week, I am going to spend some time catching up on this stuff.  I think it has a great potential to turn into my thesis.  Even though my original idea might have been wrong, this is still a hot area. Perhaps a small percentage of junk DNA is functional.  It doesn't all have to be functional for this to be an interesting area for study!
Subscribe to:
Comments (Atom)
 
