Oligo-skews

Oligonucleotide composition is not constant along genomes, and we have computed oligo-skews for all up-to-date sequenced prokaryotes to show this phenomena. Two versions of oligo-skews had been generated:


Oligo-skew: version 1

Those ones may consider the simple oligo-skews. To compute them, sequenced prokaryotic genomes were scanned with a sliding window, and frequencies for all oligonucleotides with selected length were computed. Those frequencies were compared to frequencies obtained from the complete genome. Euclidean distance was computed. Graphs are clickable to allow amplifying the clicked area (10x). The second image is also clickable, and by clicking on it, distances in each point in the area, and the distances for individual ORFs in the same area are shown.

The picks shown in the graphs are likely to be originated by Horizontal Transfer events (HT). Those foreign sequences will contain a differential oligonucleotide composition. When searching for oligo-skews of species often involved in HT events, the graphs show  many  picks which may be considered as the result of multiple HT events. On the contrary, genomes with low HT events (as for example endosymbionts) show oligo-skews with fewer or no picks.

The service allows to search for the position of specific genes within the genome, so that it may be checked whether the gene is located in an area with distinctive oligonucleotide composition.
includes some statistical data for each genome

Example of Oligo-skews v1: Escherichia coli K12

Genome NC_000913


Oligo-skew: version 2

The original idea under the development of this version of oligo-skews is quite simple: in case the picks observed in version 1 oligo-skews for one specific genome (p.e. E. coli above) are due to Horizontal Transfer events, the sequence responsible for that picks are likely to show an oligonucleotide composition similar to other genomes from other taxonomic groups (similarity to non E. coli strands).

The following procedure was performed to generate version 2 oligo-skews:

  • Oligonucleotide frequencies for all sequenced prokaryotic genomes was computed.
  • The selected genome was scanned with a 20.000 bp sliding window, and frequencies for all oligonucleotides within the window were  computed.
  • The frequencies in each window were compared to frequencies in all sequenced genome, and the Euclidean distances obtained were shorted from lower to higher values.
  • For each list of shorted genomes, position of the selected genome was computed. Consequently, for each window a number (position) was obtained.When the resulting number for a window is "1", means the oligonucleotide composition in the window and in the complete genome studied are the most similar ones. When the resulting number for a window is "100", means there are 99 genomes showing an oligonucleotide composition more similar to the one in the window, than the oligonucleotide composition of the genome we are studying.  
  • A graphical representation was generated.When the resulting number for a window is "1",  a dot is draw in the top of the graph. When the resulting number for a window is "100", the dot is draw 100 pixels away from the top.
Due to the very intensive computing required to generate version 2 oligo-skews, they have been computed only for tetranucleotides.


Example of Oligo-skews v2: Escherichia coli K12
Window size: 20,000 bp; Tetranucleotides.

Version 2 oligo-skew for E. coli K12

For better understanding this graph, it must be pointed out that dots in the very top of the image mean the oligonucleotide composition in the window are similar to the oligonucleotide composition in E. coli K12. On the contrary, the picks mean there are other genomes shown a oligonucleotide composition more similar to the one in the window.

To search version 2 oligo-skews, users must find version1 oligo-skews. In the bottom of the same page showing version 1 oligo-skews, a link is available to see  both oligo-skew versions simultaneously.

NOTES:

  • To our understanding, the oligonucleotide composition variations along genomes shown in version 1 oligo-skews may be due to two basic phenomena: Horizontal Transfer events, and background. The background is also observable when generating oligo-skews for random DNA sequences, but variations are very small in those "perfect" sequences. The background is likely to be higher in real genomes.
  • Versions 2 oligo-skews may be modified each time new strains are added to the list of sequenced genomes. In case the newly added sequence is phylogenetically related to the genome we are searching with version 2 oligo-skew, the position of many dots may change. On the contrary, when oligonucleotide composition of the newly added sequence is very different to the one in the genome we are searching (and in each window), the version 2 oligo-skew will not change.
  • When  searching version 2 oligo-skews from a specific strain, it is crucial to know how many phylogenetically related genomes are included in the list of sequenced prokaryotes. When searching an oligo-skews from a strain with no sequenced relatives, the version 2 oligo-skew is likely to be very flat (for each window, the most similar genome is the one being searched). On the contrary,  when a few strains of the same specie or other related ones have been sequenced (as for example E. coli  above), it is not expected to obtain such a flat  image (for each window, the most similar genome will be any of the genomes from the same species we are searching).
  • A sharp peak in version 1 oligo-skew may not be present in version 2 oligo-skew. The sequence responsible of the pick may show a oligonucleotide composition quite different to the one in the complete genome, but the genome being studied may be most similar one to the sequence responsible of the pick.