This app is for analyzing and visualizing DNA methylation data.
Get started by loading a multiple-sequence alignment in FASTA format. The first sequence in the alignment should be an (unconverted) reference sequence. The remaining sequences must be bisulfite-converted.
Sequence names must be unique within the alignment.
At CpG sites, your converted sequences may use the standard IUPAC symbol Y to represent a position that’s observed as both C and T in your sample population.
While such sites will be plotted and counted towards the total number of CpG sites, they are not considered methylated when calculating the methylation level of each sequence, because there is no simple way to specify the relative ratio of C to T.
Y may also be used at CpH (non-CpG cytosine) sites in your converted sequences, and in this case such sites are considered conversion failures when calculating the conversion rate. The asymmetry of YpG vs. YpH handling is the result of differences in what it means to be conservative when judging the methylation level vs. the quality of the conversion.
All other ambiguous nucleotides are ignored.
The heatmap can optionally group your sequences together by a given field, such as sample, tissue, culture, patient, or any other property you provide.
In order to group sequences, each sequence name in your FASTA file must be tagged using a special syntax which assigns the sequence a value for a field name. That syntax looks like this:
>sequence_name [field=value] [subject=B] [identity=Cluster 1]
The first tag must be separated from the rest of the sequence name by a space. You may use any field name and value you want — even spaces are ok — as long your tag is surrounded by square brackets () and split by an equals sign (=). The field names will automatically show up in the grouping dropdown within the Heatmap options panel.
Take a look at the plain text of the example FASTA to see a full alignment where each sequence has two tagged fields, “subject” and “identity”.