Exercise 3: The Origin of the HIV-1 Pandemic

In trying to understand why the HIV-1 pandemic started, one of the key questions has been, "when did it start?" From early analyses of limited molecular sequence data to more recent analyses of much more abundant data, the following hypotheses have been proposed:
  1. HIV-1 has been circulating in humans for a long time, perhaps thousands of years.
  2. HIV-1 is a new pathogen in humans, having jumped from a primate species in the last 100 years or so.
  3. HIV-1 was introduced into humans inadvertently during a massive polio vaccine trial in central West Africa in the late 1950s, in which chimpanzee tissue was used to develop the vaccines.
Test these hypotheses by calculating when the HIV-1 pandemic strains originated. You can do this because, although the sequences tend not to evolve in a clock-like manner over the entire tree, the sequences of the the HIV-1 pandemic isolates do appear to evolve in a clock-like manner.

Using the tree you generated, view the lengths of branches that lead from the HIV-1 Group M sequences (subtypes A-K) to their common ancestor. You will need to use the zoom function under the General tab to view these branches and their lengths clearly. Adjust the font as needed.

Question 8: Calculate the mean sum from the tip of a branch to the common ancestor of Group M. (To view the branch lengths on the tree select "Substitutions per site" from the dropdown box next to "Display" under "Show Branch Labels")
> 


Question 9: Assuming that the substitution rate of HIV-1 is approximately 10-5 substitutions per nucleotide site per generation, use the mean sum of branch lengths to calculate how many HIV-1 generations have elapsed since the Group M sequences diverged from a common ancestor.
> 


Question 10: An HIV-1 generation lasts about 2 days. This is the time it takes for a virus particle to infect a cell and produce new virus particles ready to infect new cells. From this generation time, calculate how many years ago the Group M sequences diverged from a common ancestor.
> 


Question 11: Approximately in which year did the HIV-1 pandemic sequences originate assuming the average sampling year was 1990?
> 


Question 12: Which of the above hypotheses about when the pandemic sequences originated does your result support?
> 


You have completed this tutorial. Answers to the questions can be found here.

Background Reading

Much more information about HIV can be found on this Wikipedia page. Information on the origin of both HIV-1 and HIV-2 can be found here.

You can also read the included PDFs: