Methylation Station All aboard for the CpG islands! 🏝

Open an alignment

Export diagram as SVG Export summary spreadsheet Export analyzed data

This app is for analyzing and visualizing DNA methylation data.

Get started by loading a multiple-sequence alignment in FASTA format. The first sequence in the alignment should be an (unconverted) reference sequence. The remaining sequences must be bisulfite-converted.

Sequence names must be unique within the alignment.

All sequences should read 5′ → 3′ left to right, so that CpG sites appear as CG in your file. Sequence contexts other than CpG, such as those occurring in plants, are not currently considered.

Try it out for yourself by loading an example alignment (or download the FASTA).

Ambiguous sites

At CpG sites, your converted sequences may use the standard IUPAC symbol Y to represent a position that’s observed as both C and T in your sample population.

While such sites will be plotted and counted towards the total number of CpG sites, they are not considered methylated when calculating the methylation level of each sequence, because there is no simple way to specify the relative ratio of C to T.

Y may also be used at CpH (non-CpG cytosine) sites in your converted sequences, and in this case such sites are considered conversion failures when calculating the conversion rate. The asymmetry of YpG vs. YpH handling is the result of differences in what it means to be conservative when judging the methylation level vs. the quality of the conversion.

All other ambiguous nucleotides are ignored.

Grouping sequences

The heatmap can optionally group your sequences together by a given field, such as sample, tissue, culture, patient, or any other property you provide.

In order to group sequences, each sequence name in your FASTA file must be tagged using a special syntax which assigns the sequence a value for a field name. That syntax looks like this:

>sequence_name [field=value] [subject=B] [identity=Cluster 1]

The first tag must be separated from the rest of the sequence name by a space. You may use any field name and value you want — even spaces are ok — as long your tag is surrounded by square brackets ([]) and split by an equals sign (=). The field names will automatically show up in the grouping dropdown within the Heatmap options panel.

Take a look at the plain text of the example FASTA to see a full alignment where each sequence has two tagged fields, “subject” and “identity”.

Other resources

Other tools for methylation analysis and visualization include BISMA and BiQ Analyzer’s diagram tool.

😟 {{ app.error }}

🤔 Hmmm, there don’t appear to be any CpG sites in this alignment! Did you open the right file?

{{ app.fasta.name }} {{ app.alignment.sequences[0].seq.length | number }}bp alignment of {{ app.alignment.sequences.length | number }} sequences

Black circles are methylated CpG sites. White circles are unmethylated CpG sites. Half-filled black circles are mixed methylation sites (Y in the alignment). Non-reference (novel) CpG sites are not shown. Small red circles are bisulfite-conversion failures (currently hidden). Half-filled red circles are partially converted sites.

Methylation level for each reference CpG site is shown using the following symbols, representing an evenly quantized scale from always methylated to never methylated: . Non-reference (novel) CpG sites, mixed methylation (YpG) sites, and failed conversions are not shown.

Site locations refer to alignment base positions {{ diagram.signals.siteLabelOffset > 0 ? 'plus' : 'minus' }} {{ diagram.signals.siteLabelOffset | abs | number: 0 }}. Sites are numbered in order from left to right, starting at {{ 1 + diagram.signals.siteLabelOffset }}. Percentages are the mean per-site methylation level. Mixed methylation sites are considered unmethylated for this calculation.

Resize your browser window to make the image wider. Increase the font size in your browser to make the whole image larger.

Diagram options

Hide failed conversions

Hide non-reference CpG sites

Hide mixed methylation (YpG) sites

Hide sequence labels

Sort by percent methylated

Group sequences

Summarize groups

Label CpG sites with

offset by

Summary heatmap

Reference CpG sites are shown across the top. Groups of converted sequences are shown along the left side. Percentages and colors indicate the methylation level of a particular site within a group of sequences. The final column is the mean per-site methylation level for the given group.

Heatmap options

Group sequences

Order by