InSites, developed by Wenjie Deng and Jim Mullins at University of Washington, is a web interface to detect phylogenetically informative and private sites among a set of aligned nucleotide or amino acid sequences. An informative site is defined as a site where two or more sequences share the same base (or residue) that is different from a reference sequence. A private site is defined as a site where only one sequence has the base (or residue) that is differnt from a reference sequence. InSites can detect two types of informative sites:
Input
Alignment sequence file: A nexus or phylip file in sequential format or a fasta file. The sequence name must NOT contain any blank spaces. If you incluede a reference sequence in your alignment, the reference sequence must be listed first in the alignment. If there is no reference sequence included, the program will calculate consensus sequence and use the consensus sequence as a reference to detect informative sites.
Example of nexus file:
#NEXUS BEGIN DATA; DIMENSIONS NTAX=11 NCHAR=100; FORMAT DATATYPE=DNA MISSING=? GAP=- ; MATRIX [ 10 20 30 40 50 60 70 80 90 100 [ . . . . . . . . . . Consensus ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Dec03_1 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGCAACACC [100] Dec03_2 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAAAATTAGGTTAAGGCCAGGGGGAAAGCAAACTTATAGGATGAAACACC [100] Dec03_3 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Dec03_4 ATGGGTGGGAGAGCATCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Dec03_5 ATGGGTGGGAGAGCATCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Mar04_1 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGCAACACC [100] Mar04_2 ATGGGTGGGAGAGCGTCAATATTAAGAGGCTTAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Mar04_7 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAAAATTAGGTTAAGGCCAGGGGGAAAGCAAACTTATAGGATGAAACGCC [100] Mar04_8 ATGGATGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGGGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] Mar04_9 ATGGGTGCGAGAGCGTCAATATTAAGAGGTTTAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC [100] ; END;
Example of phylip file:
11 100 Consensus ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Dec03_1 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGCAACACC Dec03_2 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAAAATTAGGTTAAGGCCAGGGGGAAAGCAAACTTATAGGATGAAACACC Dec03_3 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Dec03_4 ATGGGTGGGAGAGCATCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Dec03_5 ATGGGTGGGAGAGCATCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Mar04_1 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTATATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGCAACACC Mar04_2 ATGGGTGGGAGAGCGTCAATATTAAGAGGCTTAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Mar04_7 ATGGGTGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGAGAAAATTAGGTTAAGGCCAGGGGGAAAGCAAACTTATAGGATGAAACGCC Mar04_8 ATGGATGCGAGAGCGTCAATATTAAGAGGGGGAAAATTAGATACATGGGGGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC Mar04_9 ATGGGTGCGAGAGCGTCAATATTAAGAGGTTTAAAATTAGATACATGGGAGAGAATTAGGTTAAGGCCAGGGGGAAAGAAAACTTATAGGATGAAACACC
Group file (optional): A tab-delimited file with two columns in which the first column defines a group name and the second column lists a sequence name belonging to that group. For example, groups could be sequences derived from different sample time-points or tissues/compartments.
Example of a group file defining two groups based on sampling data of 10 sequences:
Dec03 Dec03_1 Dec03 Dec03_2 Dec03 Dec03_3 Dec03 Dec03_4 Dec03 Dec03_5 Mar04 Mar04_1 Mar04 Mar04_2 Mar04 Mar04_7 Mar04 Mar04_8 Mar04 Mar04_9
Output
Aligned informative sites: An output file where all detected informative sites are aligned.
Example of aligned informative sites:
1334579
85120394
Consensus CGGGGGAA
Dec03_1 ....T..C
Dec03_2 ....TAC.
Dec03_3 ........
Dec03_4 GA......
Dec03_5 GA......
Mar04_1 ....T..C
Mar04_2 G.TT....
Mar04_7 .....AC.
Mar04_8 ........
Mar04_9 ..TT....
Tab-delimited informative sites & summary: A tab-delimited output file containing all detected informative sites and summary of informative sites.
Example of tab-delimited informative sites & summary:
Total_informative Informative(noGaps) 8 15 31 32 40 53 79 94
Consensus 0 0 C G G G G G A A
Dec03_1 2 2 . . . . T . . C
Dec03_2 3 3 . . . . T A C .
Dec03_3 0 0 . . . . . . . .
Dec03_4 2 2 G A . . . . . .
Dec03_5 2 2 G A . . . . . .
Mar04_1 2 2 . . . . T . . C
Mar04_2 3 3 G . T T . . . .
Mar04_7 2 2 . . . . . A C .
Mar04_8 0 0 . . . . . . . .
Mar04_9 2 2 . . T T . . . .
Alignment 8 8
A 0 2 0 0 0 2 8 8
C 7 0 0 0 0 0 2 2
G 3 8 8 8 7 8 0 0
T 0 0 2 2 3 0 0 0
Total 10 10 10 10 10 10 10 10
Aligned private sites: An output file where all detected private sites are aligned.
Example of aligned private sites:
359
5008
Consensus GGAA
Dec03_1 ....
Dec03_2 ....
Dec03_3 ....
Dec03_4 ....
Dec03_5 ....
Mar04_1 ....
Mar04_2 .C..
Mar04_7 ...G
Mar04_8 A.G.
Mar04_9 .T..
Tab-delimited private sites & summary: A tab-delimited output file containing all detected private sites and summary of private sites.
Example of tab-delimited private sites & summary:
Total_private Private(noGaps) 5 30 50 98
Consensus 0 0 G G A A
Dec03_1 0 0 . . . .
Dec03_2 0 0 . . . .
Dec03_3 0 0 . . . .
Dec03_4 0 0 . . . .
Dec03_5 0 0 . . . .
Mar04_1 0 0 . . . .
Mar04_2 1 1 . C . .
Mar04_7 1 1 . . . G
Mar04_8 2 2 A . G .
Mar04_9 1 1 . T . .
Alignment 4 4
A 1 0 9 9
C 0 1 0 0
G 9 8 1 1
T 0 1 0 0
Total 10 10 10 10