Why the results of codeml method with same data vary a lot?

Recently, one gKaKs user notice the problem: the results of codeml method in ten times with same data vary a lot, I think it may be caused by “the random seed used in the codeml”. In this situation, yn00 or KaKs_Calculator is suggested.

Also, I check some data I used, it seems in most time, the results are stable only a few have slightly changes, and I guess the short alignment and almost the same sequences may affect the results.

More detail of this problem is showing as follow (Thanks to Vid Jelen at University of Ljubljana, Slovenia):


Hi Mr. Zhang,

Thank you very much for your help.

… After that I used the "command.sh" bash script to run the analysis 10 times, so I could see how does the Ka/Ks come up on average. I parsed the dN/dS value from each of the 10 "kaks_rel.txt" files for each gene and put them into a tab delimited file "results.txt". The values seem to vary too much to be useful.


My biggest 2 questions would be if I used the script correctly and why do the results vary so much?

Regards,

Vid


Hi,

Here are the results I got for 1 strain of all 10 kaks_rel.txt files parsed into the all_results.txt file.

By varying too much I mean the Ka/Ks values for certain genes differ a lot in the 10 runs. (look at the 8th line -> evm.TU.chr7_742YN.268_chr7_742YN-840379-841914... in the 10 runs the lowest is 0.4302 and the highest is 99.0) I cannot say for sure whether this gene is under purifying or positive selection.

And there are quite a few genes like this one where there appear 0.001 or 99.0 values in at least one of the 10 runs. (Also why 0.001 and 99.0? Are those the border or even in a way infinite values for codeml?)


Regards,

Vid


HI Vid,

It seems really wierd. I never pay attention on it previously, could you please check the alignment in the DNA2AA_alignment.log, if the alignment for the certain gene is the same, then, it must be the problem caused by codeml, then, you can chose other parameter, like YN00 or KaKs_Calculator; if the alignment is different, could you please send me all the data, i want to figure out the problem.

Thanks for your valuable questions.


Hi,

The alignments in DNA2AA_alignment.log seem to be the same on each of the 10 runs. I guess the problem lies with codeml then.

I will do 10 runs with yn00 to see if that turns out to be more stable.

Regards,

Vid


Command used for 10 times calculation.

#!/usr/bin/bash
for i in {1..10}
do mkdir -p kaks_analysis/$i
perl gKaKs_v1.3.pl -query_seq="CDS.fasta" -gff="changed_annotations.gff3" -hit_seq="strain.fasta" -spe=6 -chrom="all" -is_genome="T" -fastacmd="/usr/bin/fastacmd" -blast2seq="/usr/bin/bl2seq" -blat="blat" -program="codeml"
mv tmp* kaks_analysis/$i
mv temp* kaks_analysis/$i
mv rst* kaks_analysis/$i
mv rub kaks_analysis/$i
mv kaks* kaks_analysis/$i
mv codeml* kaks_analysis/$i
mv DNA* kaks_analysis/$i
mv probl* kaks_analysis/$i
mv new* kaks_analysis/$i
done