Once we have a set of DNA coding sequences aligned by codons and their corresponding gene phylogeny, we can test for protein-coding adaptation along any specified branch using the branch-sites test for positive selection.
For this assignment, you will write a program that accepts two inputs: 1 a file containing a set of DNA coding sequences, aligned by codon
in FASTA (Links to an external site.) Links to an external
site. format, and
2 a file containing the corresponding gene phylogeny, in
Newick (Links to an external site.) Links to an external site. format, with branch lengths and a branch label “#1” indicating the branch to test for positive selection.
The program will use PAML (Links to an external site.)
Links to an external site.
to run the branch-sites test for positive selection, given the input alignment and tree with specified branch label. The program will print the log-likelihood of the null model (no positive selection) and the alternative model (yes positive selection on the indicated branch) to the screen.
You will be responsible for filling in the main portion of the runBranchSitesTest function, which will need to calculate the log-likelihood of both the null and alternative models.
To complete this assignment:
1 Log on to the course UNIX server and navigate to your local GIT repository.
2 Examine the file testPosSel.py; this is the file that you will need to edit to complete this assignment.
3 Read through the file to make sure you understand what it is doing. 4 Edit the testPosSel.py file by implementing the required
runBranchSitesTest function.
5 Test your code by running the provided test script, examining the
output and comparing it to the provided correct output.
6 Once you are confident that your program is working correctly, commit it in to the central GIT repository for thorough testing.
When your code passes the GIT testing suite, you can be sure it is working correctly. There is nothing else to turn in for this assignment.
I couldn’t think of a clean way to create a ‘breakout’ script for this assignment, so you’ll have to work through the entire assignment directly. But as always, breaking the assignment into a simpler series of problems can help you start making progress and testing as you go. Hint: PAML doesn’t seem to like not having the alignment and tree files in the current working directory. So, the first thing your program might need to do is to copy the user-supplied files into the current working directory. I have provided you with a few “TEMP_…_TEMP” files you can use for this program.
Hint: This assignment has a number of steps, and you really want to be able to focus on doing each step, and then testing it, before moving on to the next step. To facilitate this development procedure, you first need to get your function to ‘run’ without syntax errors, even if the results are wrong. To do this, I would do the following:
1 define and initialize the values of null_lnL and pos_lnL to 0.0 (or some other ‘dummy’ values) at the start of your function.
2 comment out the ‘clean up’ line [os.system(“rm 2NG.* 4fold.nuc TEMP_*_TEMP.* lnf rst rst1 rub”)], so that temporary files are not automatically deleted. Then you can look at them to make sure they are correct as you go.
Now you can run your program, and it will produce output (although with the wrong values). Next, plan a series of small steps that you could implement to get you closer and closer to the correct output. Maybe something like this will get you started:
1 First, calculate the correct null_lnL value:
1 copy the alignment and tree files to the local directory
2 write the correct control file for PAML
3 run PAML using the control file
4 parse the PAML rst file to extract the null_lnL value Implement and test the first sub-step above, and get it working. Then implement and test the second sub-step, etc.
Hint: Use 1 for the value of fix_blength in
the codeml_ctl_string. I’d also recommend using 3 for the initial value of omega when setting up the positive-selection model (note: omega must be fixed at 1 for the null or “neutral” model).