Assembling to Reference

In order to assemble the two A. dumetorum sequences that did not work previously (because the overlapping parts of the sequence were poor quality and were trimmed off), we will assemble the partial sequences against a reference. Click on the Unused Reads sequence list from your Assembly, and holding down the control/command key click the dum3 consensus sequence, which we will use as a reference. If you did not extract this consensus sequence in the previous exercise, do so now.

Click Align/Assemble→Map to Reference. Ensure that the dum3 consensus sequence is set as the reference, and choose Assemble by, then select 1st part of name, separated by underscore. Set the other options as in the screenshot below.


You should now have two new contig assemblies, one for dum2 and one for dum4. Open the dum2 assembly. You should now be able to see why these didn't assemble using de-novo assembly, as there is a region of 4 bp where there is no good quality sequence that overlaps between the F and R sequences. A region of double peaks, which has been trimmed in both sequences (denoted by the crossed out sequence), begins here - this is likely to represent an indel, where one of the two alleles contains a deletion.


Add an annotation to the consensus sequence to highlight the indel by selecting the 4 bp gap in the consensus sequence and clicking Add Annotation. Set the annotation type to Polymorphism, and name to Indel. Click OK and you should now see this annotation added to the consensus sequence. Click Save then extract the dum2 consensus sequence to a new file.

Repeat this process for the other reference assembly containing the dum4 sequences.


Exercise 2d: Analysing consensus sequences