Computational Biology Software

Our lab uses molecular, computational, and virus biology techniques to provide insights into the relationship between HIV and its human hosts in an effort to fight the AIDS pandemic.

Below is a catalog of software we’ve produced during this work to help us and others study retroviruses. Applications marked “requires login” are currently only available to lab members and our collaborators. Please email us if you’d like to learn more about them.

Support

We provide technical and scientific support for those using our software, so please email us if you have question or are running into trouble. Our participation in a Center for AIDS Research (CFAR) provides funding for these services, so please acknowledge the CFAR in your work if you use our software.

Applications

Viroverse (source code)

A platform for the collection, storage, retrieval, and analysis of experimental data for microbiology workflows and the principal data store for HIV and SIV sequencing experiments conducted in the Mullins Lab. Currently housing over 50,000 viral nucleotide sequences, together with comprehensive metadata about their creation including PCR protocols, gel images, de-identified subject clinical data, and more.

DIVEIN (New updates: implements PhyML v3.3.20220408, FastTree v2.1.10 and RAxML v8.1.12, sequence clustering based on sequence/tree)

Study divergence, diversity, informative sites, and other phylogenetic features of nucleotide and amino acid sequences using automatically constructed maximum likelihood trees

Methylation Station (source code)

Analyze and visualize DNA methylation data in the blink of an eye

ISDB (source code)

An open-source software toolkit for building databases of proviral integration sites

TCozy

A database application supporting large-scale tissue culture of primary cells in a microplate format and multi-locus qPCR analysis of HIV genomes. It provides start-to-finish workflow management, reduces or eliminates many possible sources of data loss, enables review of work in progress, and serves as a data repository for downstream analysis.

Databases

HIRIS

Query, download, and visualize HIV-1 integration sites from the proviral reservoir, built with ISDB

PrimerDB

Search and display information on viral primers used in our lab

EpitopeDB

Retrieve and analyze HIV immunology data, such as ELISpot and HLA assays

Web Tools

Lab Labels (source code)

Generate PDFs suitable for printing sheets of cryo-safe labels.

ViroBLAST

BLAST multiple viral sequence databases, including public and local data, from the web

Integration Sites

Determine the location of HIV integration sites in the human genome

WebPSSM

Score V3 loop co-receptor motifs

ChromatQuantitator

Quantitate mixed base composition in Sanger sequencing chromatograms

QUALITY

Quantify the amplifiable viral templates under the conditions of your limiting diluation assay using a variant of the Minimum χ² (MC) method which allows specification of the probabilities of a false negative and false positive PCR

Viral Growth Rate Calculation

Compare and evaluate competitive viral fitness

LiteraTracker

Track the scientific literature by keyword frequency in a wide array of journals

rdvg

Test for the detection of latent reservoirs

idvg

Test for the detection of continued viral replication under antiviral therapies

DistParser

Extract branch lengths from a Newick tree and calculate divergence and diversity

UniSeq

Identify and extract unique sequences from a FASTA file

Sequence Name Reformatter

Reformat sequence names in a FASTA file

PAUP* Diversity Matrix Formatter

Reformat diagonal distance matrices into columnar output

Explain Phred

Explain Phred+33 quality scores

JitterCalc

An Excel spreadsheet that jitters points with identical (x, y) values

Downloadable Scripts

ICC

Pipeline to analyze 454 Pyrosequencing and Ion Torrent sequencing data, including read quality filtering and alignment, indel and carryforward error correction, single nucleotide variant calling, and calculation of nucleotide variant and hyplotype frequencies

CorQ

Identify and correct 454 Pyrosequencing errors using quality scores

AutoSequin.pl

Automatically annotate sequences for bulk submission to GenBenk

ChangeNewickNodeId.pl

Relabel nodes in a Newick tree

ColumnDist2diverstest.pl

Reformat column distance matrix into the input format of Peter Gilbert et al’s. diverstest

CountAAFreq.pl

Calculate amino acid frequency for each position in an alignment

DiverAnalysis.pl

Analyze sequence divergence and diversity

NetNGlycParser.pl

Reformat output from NetNLGlyc

NetOGlycParser.pl

Reformat output from NetOLGlyc

NewickTermBranch.pl

Report external branch lengths of a Newick tree

parseBlastXML_calcFreq.pl

Calculate nucleotide frequencies at each position of a reference sequence from a blastn XML output file

parseBlastXML_calcErrRate.pl

Calculate error rates of different types (insertion, deletion, and substitution) from a blastn XML output file

qalign

Quickly align two sequences for ad-hoc comparison by a variety of methods (BLAST, needle, muscle, etc.)

RemoveOverlapSeq.pl

Remove identical and overlapping sequences from an alignment

Sequence manipulator

Convert between alignment file formats; clean FASTA sequences

SequLocatorParser.pl

Reformat output from LANL’s sequence locator

simple-consensus-per-read-group

Generate a simple consensus from a SAM/BAM file for each read group

AA_divergence_wbw.pl

Calculate amino acid divergence window by window from an amino acid sequence alignment

APIs

HIV Sequence Locator

A JSON API around LANL’s sequence locator tool, providing easy programmatic access to positioning, region, and protein data

Master Blaster

A basic API for quick, remote NCBI BLAST+ queries which returns results as XML

Toolkits and Libraries

RecordStream::Bio and recs-fastq

Two collections of record-handling tools related to biology, for use with the excellent data-slicing tools in RecordStream

Bio::WebService::LANL::SequenceLocator

A Perl library providing programmatic access to LANL’s sequence locator tool, used to provide our HIV Sequence Locator JSON web API

Bio::Cigar

A Perl library for parsing CIGAR strings (“Compact Idiosyncratic Gapped Alignment Report”), such as those used in the SAM file format, and translating coordinates to/from the reference/query.

Mullins Lab Computational Biology Software

About