Approaching train Methylation Station All aboard for the CpG islands! 🏝

This app is for analyzing and visualizing DNA methylation data.

Get started by loading a multiple-sequence alignment in FASTA format. The first sequence in the alignment should be an (unconverted) reference sequence. The remaining sequences must be bisulfite-converted.

Sequence names must be unique within the alignment.

All sequences should read 5′ → 3′ left to right, so that CpG sites appear as CG in your file. Sequence contexts other than CpG, such as those occurring in plants, are not currently considered.

Try it out for yourself by loading an example alignment (or download the FASTA).

Ambiguous sites

At CpG sites, your converted sequences may use the standard IUPAC symbol Y to represent a position that’s observed as both C and T in your sample population.

While such sites will be plotted and counted towards the total number of CpG sites, they are not considered methylated when calculating the methylation level of each sequence, because there is no simple way to specify the relative ratio of C to T.

Y may also be used at CpH (non-CpG cytosine) sites in your converted sequences, and in this case such sites are considered conversion failures when calculating the conversion rate. The asymmetry of YpG vs. YpH handling is the result of differences in what it means to be conservative when judging the methylation level vs. the quality of the conversion.

All other ambiguous nucleotides are ignored.

Grouping sequences

The heatmap can optionally group your sequences together by a given field, such as sample, tissue, culture, patient, or any other property you provide.

In order to group sequences, each sequence name in your FASTA file must be tagged using a special syntax which assigns the sequence a value for a field name. That syntax looks like this:

>sequence_name [field=value] [subject=B] [identity=Cluster 1]

The first tag must be separated from the rest of the sequence name by a space. You may use any field name and value you want — even spaces are ok — as long your tag is surrounded by square brackets ([]) and split by an equals sign (=). The field names will automatically show up in the grouping dropdown within the Heatmap options panel.

Take a look at the plain text of the example FASTA to see a full alignment where each sequence has two tagged fields, “subject” and “identity”.

Other resources

Other tools for methylation analysis and visualization include BISMA and BiQ Analyzer’s diagram tool.


{{ }} {{ app.alignment.sequences[0].seq.length | number }}bp alignment of {{ app.alignment.sequences.length | number }} sequences

Black circles are methylated CpG sites. White circles are unmethylated CpG sites. Half-filled black circles are mixed methylation sites (Y in the alignment). Non-reference (novel) CpG sites are not shown. Small red circles are bisulfite-conversion failures (currently hidden). Half-filled red cirles are partially converted sites. Site locations refer to alignment base positions.

Resize your browser window to make the image wider. Increase the font size in your browser to make the whole image larger.

Diagram options

Summary heatmap

Reference CpG sites are shown across the top. Groups of converted sequences are shown along the left side. Percentages and colors indicate the methylation level of a particular site within a group of sequences. The final column is the mean per-site methylation level for the given group.

Heatmap options