DIVER Help


Sequence file

The input sequence file is an alignment of nucleotide sequences. We accept two types of sequence files, nexus and phylip (without restrictions on name length). The sequence data set must be in sequential format. It is recommended that the sequence name is comprised of alphabetical characters, digits or underscore. The program will replace any other characters with underscore "_".

Example for Nexus file in sequential format:

#NEXUS 

BEGIN DATA;
	DIMENSIONS  NTAX=6 NCHAR=80;
	FORMAT DATATYPE=DNA  MISSING=? GAP=- ;
MATRIX
[                      10        20        30        40        50        60        70        80   ]
[                      .         .         .         .         .         .         .         .    ]

ES9v3ed_1     ATGAAAGTGAAGCCGAGCAGGAGGAATTATCAGCACTTGTGGTGGGGCATCATGCTCCTTGGGATGTTAATGATCTGTAA   [80]
ES9v1e2_1     ATGAAAGTGAAGCCCAGCAGGAGGAATTATCAGCACTTGTGGTGGGGCATCATGCTCCTTGGGATGTTAATGATCTGTAA   [80]
ES9v1e4_1     ATGAAAGTGAAGGGGAGCAGGAGGAATTATCAGCAGTTGTGGTGAAGCATCATGCTCCTTGGGATGTTAATGATCTGTAA   [80]
ES9v1e11_1    ATGAATGTGAAGGGGAGCAGGAGGAATTATCAGCAGTTGTGGTGAAGCATCATGCTCCTTGGGATGTTAATGATCTGTAA   [80]
ES9v1e3_1     ATGAAAGTGAAGCGGAGCAGGAGGAATTATCAGCACTTGTGGTGAAGCATCATGCTCCTTGGGATGTTTTTGATCTGTAA   [80]
ES9v1e6_1     ATGAATGTGAAGGGGAGCAGGAGGAATTATCAGCACTTGTGGTGAAGCATCATGCTCCTTGGGATGTTTTTGATCTGTAA   [80]
;
END;
Example for phylip file in sequential format:
6 80
ES9v3ed_1     ATGAAAGTGAAGCCGAGCAGGAGGAATTATCAGCACTTGTGGTGGGGCATCATGCTCCTTGGGATGTTAATGATCTGTAA
ES9v1e2_1     ATGAAAGTGAAGCCCAGCAGGAGGAATTATCAGCACTTGTGGTGGGGCATCATGCTCCTTGGGATGTTAATGATCTGTAA
ES9v1e4_1     ATGAAAGTGAAGGGGAGCAGGAGGAATTATCAGCAGTTGTGGTGAAGCATCATGCTCCTTGGGATGTTAATGATCTGTAA
ES9v1e11_1    ATGAATGTGAAGGGGAGCAGGAGGAATTATCAGCAGTTGTGGTGAAGCATCATGCTCCTTGGGATGTTAATGATCTGTAA
ES9v1e3_1     ATGAAAGTGAAGCGGAGCAGGAGGAATTATCAGCACTTGTGGTGAAGCATCATGCTCCTTGGGATGTTTTTGATCTGTAA
ES9v1e6_1     ATGAATGTGAAGGGGAGCAGGAGGAATTATCAGCACTTGTGGTGAAGCATCATGCTCCTTGGGATGTTTTTGATCTGTAA

Outgroup file
A text file contains a list of outgroup sequence name(s) which must match the sequence name(s) in sequence file. It must be one sequence name per line.

Group file
A text file contains two tab delimited columns (1) a list of groups (each sequence must have a group name) and (2) the corresponding sequence names which must match the sequence names in sequence file. For example, groups could be different sample times points or tissues/compartments.

Example:

group1	ES9v1e2_1
group1	ES9v1e4_1
group1	ES9v1e11_1
group2	ES9v1e3_1
group2	ES9v1e6_1
if there is an MRCA sequence in your sequence file, and you want to calculate the divergence from the MRCA, your should include MRCA sequence name in your group file. You must type "MRCA" (not case-sensitive) in group field and the name of the sequence in sequence name field (second column):
MRCA	name_of_MRCA

Perform bootstrap and Number of bootstrap data sets
You can ask PHYML to generate bootstrapped pseudo data sets from the original data set. PHYML then returns the bootstrap tree with branch lengths and bootstrap values, using standard NEWICK format.

Substitution model
A nucleotide substitution model. The default choice is GTR (e.g., Lanave et al. 1984, TavarŽ 1986, Rodriguez et al. 1990) substitution model. The other models are JC69 (Jukes and Cantor, 1969), K80 (Kimura, 1980), F81 (Felsenstein, 1981), HKY85 (Hasegawa et al., 1985) and TN93 (Tamura and Nei, 1993). The rate matrices of this models are given in Swofford et al. (1996).

Transition/transversion ratio
With DNA sequences, it is possible to set the transition/transversion ratio, except for the JC69 and F81 models, or to estimate its value by maximising the likelihood of the phylogeny. The later makes the program slower. The default value is 4.0. The definition of the transition/transversion ratio is the same as in PAML (Yang, 1994). In PHYLIP, the "transition/transversion rate ratio" is used instead. 4.0 in PHYML roughly corresponds to 2.0 in PHYLIP.

Proportion of invariable sites
The default is to consider that the data set does not contain invariable sites (0.0). However, this proportion can be set to any value in the 0.0-1.0 range. This parameter can also be estimated by maximizing the of the data conditioned on a phylogeny. The latter makes the program slower.

Number of substitution rate categories
The default is having all the sites evolving at the same rate, having one substitution rate category. Alternatively a discrete-gamma distribution can be used to account for variation in substitution rates among sites, where the number of categories that defines this distribution is supplied by the user. The larger this number, the better is the goodness-of-fit as compared to the continuous distribution. The default is to use four categories, in this case the likelihood of the phylogeny at one site is averaged over four conditional likelihoods corresponding to four rates and the computation of the likelihood is four times slower than with a single rate. Values for number of categories fewer than four or greater than eight are not recommended. In the first case, the discrete distribution is a poor approximation of the continuous one. In the second case, the computational burden becomes high and an higher number of categories is not likely to enhance the accuracy of phylogeny estimation.

Gamma distribution parameter
This value is used to specify the degree of variability in evolutionary rates among sites. A gamma distribution is used to specify this variation. The shape of this distribution is defined by two parameters, α and β. The accepted convention in phylogenetic analyses is to set α = β. Here you may enter this shape parameter. The higher its value, the lower the variation of substitution rates among sites (this option is used when having more than 1 substitution rate category). The default value is 1.0, corresponds to moderate variation. Values less than 0.7 correspond to high variability. Values between 0.7 and 1.5 corresponds to moderate variation. Higher values correspond to lower variatiability among sites. This value can be specified by the user or it can be estimated by maximizing the likelihood of the data conditioned on a phylogeny.

Starting tree(s)
Used as the starting tree(s) to be refined by the maximum likelihood algorithm. The default (and for the time-being only) option is to use a BIONJ distance-based tree.

Optimize starting tree(s) options
There are three ways to optimize the starting tree(s):
1. optimize the topology, the branch lengths and rate parameters (transition/transversion ratio, proportion of invariant sites, gamma distribution parameter).
2. keep the topology and optimize the branch lengths and rate parameters (it is not possible to optimize the tree topology and keep the branch lengths).
3. opt for no optimization and the likelihood(s) of the specified (starting) tree(s) is(are) returned.

Distance file
DIVER accepts two types of distance arrays to calculate divergence and diversity (matrix and single column). The data must be tab delimited.

Examples for matrix:
lower-triangular:

	taxa1	taxa2	taxa3	taxa4
taxa1
taxa2	0.0056
taxa3	0.0027	0.0138	
taxa4	0.0078	0.0023	0.0123
upper-triangular:
	taxa1	taxa2	taxa3	taxa4
taxa1		0.0056	0.0027	0.0078
taxa2			0.0138	0.0023
taxa3				0.0123
taxa4			
square:
	taxa1	taxa2	taxa3	taxa4
taxa1	0.0000	0.0056	0.0027	0.0078
taxa2	0.0056	0.0000	0.0138	0.0023
taxa3	0.0027	0.0138	0.0000	0.0123
taxa4	0.0078	0.0023	0.0123	0.0000
Example for column:
taxa1	taxa2	0.0056
taxa1	taxa3	0.0027
taxa1	taxa4	0.0078
taxa2	taxa3	0.0138
taxa2	taxa4	0.0023
taxa3	taxa4	0.0123

Divergence and Diversity Results
DIVER allows the user to specify whether divergence and diversity are calculated as tree-based (patristic) distances or genetic distances (not conditioned on a tree topology).

Tree output files
DIVER outputs tree files in Newick format.

We would like to have your comments after your using this tool. Please give your feedback to webmaster.