Sequence comparison

By comparing genetic sequences, we can tell how they are related. While there are fancy programs (such as BLAST and ALIGN) to make these comparisons, even a very simple comparison will revel a lot.

An easy way to compare two sequences is to make a big grid with one DNA sequence along the X axis and the other along the Y axis. Wherever the corresponding DNA bases from the two sequences match, you put a dot. Now, if the sequences are nearly the same, there will be a diagonal line where they match up. The closer the match, the stronger the line.

There will also be random dots where they coincidentally match. If you mark every match, about 1 in 4 positions will be a random match. If you only count spots where, say, 5 in a row match, then the number of random matches will be much lower.

I've written a short C program to do this matching. It puts a dot if 5 bases in a row match. This page shows the results of various matchings.


(The exact sequences used were HIV-1=HIVBRUCG, HIV-2=HIVV2RODX, BLV=BLVCG, SIV=SIVAGMTYO, visna=VLVCG, HTLV-1=HTVPRCAR.)


Conclusions

From these comparisons, several things are clear. Most importantly, HIV-1 is much closer to SIV than to visna, HTLV-1, or BLV. This illustrates that HIV came from SIV (or they both came from some closely related virus). Second, HIV does not show long sequences that closely match visna, HTLV-1, or BLV. This shows that HIV was not formed by splicing together parts of these viruses.


Ken Shirriff: shirriff@eng.sun.com
This page: http://www.righto.com/theories/seq_comp.html Copyright 2000 Ken Shirriff.