Oligonucleotide
composition is not constant along genomes, and we have
computed oligo-skews for all up-to-date sequenced prokaryotes to show
this phenomena. Two versions of oligo-skews had been generated:
Oligo-skew: version 1
Those ones may consider the simple oligo-skews. To compute them,
sequenced prokaryotic genomes were scanned with
a sliding window, and frequencies for all oligonucleotides with
selected
length were computed. Those frequencies were compared to frequencies
obtained from the complete genome. Euclidean
distance was computed. Graphs are
clickable to allow amplifying the clicked area (10x). The second image
is also clickable, and by clicking on it,
distances in each point in the area, and the distances for individual
ORFs in
the same area are shown.
The picks shown
in the graphs are likely to be originated by Horizontal Transfer events
(HT). Those foreign sequences will contain a
differential
oligonucleotide composition. When searching for oligo-skews of species
often involved in HT events, the
graphs show many picks which may be considered as the
result of multiple HT events. On the contrary, genomes with low HT
events (as
for example endosymbionts) show oligo-skews with fewer or no picks.
The service
allows to search for the position of specific genes within
the genome, so that it may be checked whether the gene is located in an
area with distinctive oligonucleotide composition.
Example of Oligo-skews v1:
Escherichia
coli K12
Oligo-skew: version 2
The original idea under the
development of this version of oligo-skews is quite simple: in case the
picks observed in version 1 oligo-skews for one specific genome (p.e. E. coli above) are due to
Horizontal Transfer events, the sequence responsible for that picks are
likely to show an oligonucleotide composition similar to other genomes
from other taxonomic groups (similarity to non E. coli strands).
The following procedure was
performed to generate version 2 oligo-skews:
- Oligonucleotide frequencies for all sequenced prokaryotic
genomes was computed.
- The selected genome was scanned with
a 20.000 bp sliding window, and frequencies for all oligonucleotides
within the window were computed.
- The frequencies in each window were compared to frequencies
in all sequenced genome, and the Euclidean
distances obtained were shorted from
lower to higher values.
- For each list of shorted genomes, position of the selected
genome was computed. Consequently, for each window a number (position)
was obtained.When the resulting number for a window is "1", means the
oligonucleotide composition in the window and in the complete genome
studied are the most similar ones. When the resulting number for a
window is "100", means there are 99 genomes showing an oligonucleotide
composition more similar to the one in the window, than the
oligonucleotide composition of the genome we are studying.
- A graphical representation was generated.When the resulting
number for a window is "1", a dot is draw in the top of the
graph. When the resulting number for a window is "100", the dot is draw
100 pixels away from the top.
Due to the very intensive computing required to generate version 2
oligo-skews, they have been computed only for tetranucleotides.
Example of Oligo-skews v2:
Escherichia
coli K12
Window size: 20,000 bp; Tetranucleotides.
For better understanding this graph,
it must be pointed out that dots in the very top of the image mean the
oligonucleotide composition in the window are similar to the
oligonucleotide composition in E.
coli K12. On the contrary, the picks mean there are other
genomes shown a oligonucleotide composition more similar to the one in
the window.
To search version 2 oligo-skews,
users must find version1 oligo-skews. In the bottom of the same page
showing version 1 oligo-skews, a link is available to see both
oligo-skew versions simultaneously.
NOTES:
- To our understanding, the oligonucleotide composition
variations along genomes shown in version 1 oligo-skews may be due to
two basic phenomena: Horizontal Transfer events, and background. The
background is also observable when generating oligo-skews for random
DNA sequences, but variations are very small in those "perfect"
sequences. The background is likely to be higher in real genomes.
- Versions 2 oligo-skews may be modified each time new
strains are added to the list of sequenced genomes. In case the newly
added sequence is phylogenetically related to the genome we are
searching with version 2 oligo-skew, the position of many dots may
change. On the contrary, when oligonucleotide composition of the newly
added sequence is very different to the one in the genome we are
searching (and in each window), the version 2 oligo-skew will not
change.
- When searching version 2 oligo-skews from a specific
strain, it is crucial to know how many phylogenetically related genomes
are included in the list of sequenced prokaryotes. When searching an
oligo-skews from a strain with no sequenced relatives, the version 2
oligo-skew is likely to be very flat (for each window, the most similar
genome is the one being searched). On the contrary, when a few
strains of the same specie or other related ones have been sequenced
(as for example E. coli
above), it is not expected to obtain such a flat image (for each
window, the most similar genome will be any of the genomes from the
same species we are searching).
- A sharp peak in version 1 oligo-skew may not be present in
version 2 oligo-skew. The sequence responsible of the pick may show a
oligonucleotide composition quite different to the one in the complete
genome, but the genome being studied may be most similar one to the
sequence responsible of the pick.
|