Mullins Lab Computational Biology Software


Our lab uses molecular, computational, and virus biology techniques to provide insights into the relationship between HIV and its human hosts in an effort to fight the AIDS pandemic.

Below is a catalog of software we’ve produced during this work to help us and others study retroviruses. Applications marked “requires login” are currently only available to lab members and our collaborators. Please email us if you’d like to learn more about them.


We provide technical and scientific support for those using our software, so please email us if you have question or are running into trouble. Our participation in a Center for AIDS Research (CFAR) provides funding for these services, so please acknowledge the CFAR in your work if you use our software.



Study divergence, diversity, informative sites, and other phylogenetic features of nucleotide and amino acid sequences using automatically constructed maximum likelihood trees

Methylation Station

Analyze and visualize DNA methylation data in the blink of an eye


An open-source software toolkit for building databases of proviral integration sites


A platform for the collection, storage, retrieval, and analysis of experimental data for microbiology workflows and the principal data store for HIV and SIV sequencing experiments conducted in the Mullins Lab. Currently housing over 50,000 viral nucleotide sequences, together with comprehensive metadata about their creation including PCR protocols, gel images, de-identified subject clinical data, and more.


A database application supporting large-scale tissue culture of primary cells in a microplate format and multi-locus qPCR analysis of HIV genomes. It provides start-to-finish workflow management, reduces or eliminates many possible sources of data loss, enables review of work in progress, and serves as a data repository for downstream analysis.



Query, download, and visualize HIV-1 integration sites from the proviral reservoir, built with ISDB


Search and display information on viral primers used in our lab


Retrieve and analyze HIV immunology data, such as ELISpot and HLA assays

Web Tools


Score V3 loop co-receptor motifs


BLAST multiple viral sequence databases, including public and local data, from the web


Quantitate mixed base composition in Sanger sequencing chromatograms

Integration Sites

Determine the location of HIV integration sites in the human genome


Quantify the amplifiable viral templates under the conditions of your limiting diluation assay using a variant of the Minimum χ² (MC) method which allows specification of the probabilities of a false negative and false positive PCR

Viral Growth Rate Calculation

Compare and evaluate competitive viral fitness


Track the scientific literature by keyword frequency in a wide array of journals


Test for the detection of latent reservoirs


Test for the detection of continued viral replication under antiviral therapies


Extract branch lengths from a Newick tree and calculate divergence and diversity


Identify and extract unique sequences from a FASTA file

Sequence Name Reformatter

Reformat sequence names in a FASTA file

PAUP* Diversity Matrix Formatter

Reformat diagonal distance matrices into columnar output

Explain Phred

Explain Phred+33 quality scores


An Excel spreadsheet that jitters points with identical (x, y) values

Downloadable Scripts


Pipeline to analyze 454 Pyrosequencing and Ion Torrent sequencing data, including read quality filtering and alignment, indel and carryforward error correction, single nucleotide variant calling, and calculation of nucleotide variant and hyplotype frequencies


Identify and correct 454 Pyrosequencing errors using quality scores

Automatically annotate sequences for bulk submission to GenBenk

Relabel nodes in a Newick tree

Reformat column distance matrix into the input format of Peter Gilbert et al’s. diverstest

Calculate amino acid frequency for each position in an alignment

Analyze sequence divergence and diversity

Reformat output from NetNLGlyc

Reformat output from NetOLGlyc

Report external branch lengths of a Newick tree

Calculate nucleotide frequencies at each position of a reference sequence from a blastn XML output file

Calculate error rates of different types (insertion, deletion, and substitution) from a blastn XML output file


Quickly align two sequences for ad-hoc comparison by a variety of methods (BLAST, needle, muscle, etc.)

Remove identical and overlapping sequences from an alignment

Sequence manipulator

Convert between alignment file formats; clean FASTA sequences

Reformat output from LANL’s sequence locator


Generate a simple consensus from a SAM/BAM file for each read group


HIV Sequence Locator

A JSON API around LANL’s sequence locator tool, providing easy programmatic access to positioning, region, and protein data

Master Blaster

A basic API for quick, remote NCBI BLAST+ queries which returns results as XML

Toolkits and Libraries

RecordStream::Bio and recs-fastq

Two collections of record-handling tools related to biology, for use with the excellent data-slicing tools in RecordStream


A Perl library providing programmatic access to LANL’s sequence locator tool, used to provide our HIV Sequence Locator JSON web API


A Perl library for parsing CIGAR strings (“Compact Idiosyncratic Gapped Alignment Report”), such as those used in the SAM file format, and translating coordinates to/from the reference/query.

Source code

When available, source code is generally provided via the lab’s GitHub page or the specific software’s own homepage.

Need more?

If nothing above quite fits the bill, we’d love to talk with you about your research needs. Send us an email!

For a complete list of our CFAR-sponsored services please visit the CFAR Molecular Profiling and Computational Biology core.