HIV Sequence Locator API

A dead-simple web API for LANL's HIV sequence locator providing results in JSON. Positioning, region, and protein information is all available. Most of the data presented in the human-readable HTML page is extracted via this API. Get in touch if you need something that's missing!

Endpoint

POST .../within/hiv

Requires one or more values for the POST parameter sequence or a single-valued fasta parameter as a URL-encoded string or file upload of FASTA-formatted sequences.

Both protein and nucleotide sequences are accepted, although the data returned varies by type due to what LANL returns. See the curl example which queries a protein sequence and the same sequence as nucleotides. If you use LANL's tool directly, the reverse complements of your sequences will also be attempted and the best matching picked; in the interests of reliability and consistency, this API tells LANL not to reverse complement sequences. You should instead take care of this before submitting.

Optionally accepts a (highly recommended) base parameter set to nucleotide or amino acid which forces all sequences to be interpreted as the given base type. This is necessary when submitting sequences with an ambiguous base type due to the overlap in IUPAC alphabets. In such cases, LANL seems to assume nucleotides, potentially producing incorrect results. For example, the amino acid sequence MGGDMKDNW is also a valid nucleotide sequence, albeit one many ambiguous bases. Interpreting it as nucleotides, however, is incorrect. It is not uncommon for short amino acid peptides to exhibit this property.

On success (HTTP 200) the response body is a JSON array of objects, one per sequence. Both HTTP 4xx and 5xx status codes are used on failure with plain text bodies containing an error message.

The format parameter may be set to csv to return comma-separated values partially representating the full results. format may also be explicitly set to json, though there is no need to as JSON is the default and will remain so.

HTTP Status Reason
405 Method Not Allowed The request did not use the HTTP POST method
415 Unsupported Media Type The provided fasta parameter appears to be in the wrong format
422 Unprocessable Entity No sequence or fasta parameter was provided, or the parameter did not contain any sequences
503 Service Unavailable An unexpected condition occurred while parsing results from LANL
500 Internal Server Error An unexpected error occurred while processing your request

The API tries not to return incorrect data from misparses of LANL's output. If it detects an anomoly in any of its parsing stages, it will abort the request and return an HTTP 503 Service Unavailable. If this happens to your request, or if you are receiving results you don't expect, please let us know!

Quick lookup

Submit sequences as a FASTA file and download the location results as a CSV file. Note that the CSV does not contain all of the information the API can provide since CSV does not have standard support for nested or multi-valued data structures. This form uses the API described above.



Created by Thomas Sibley of the Mullins Lab at the University of Washington, Department of Microbiology.

Questions? Drop us a line.

Source code

Examples

curl

curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --data sequence=SLYNTVAVLYYVHQR \
     --data sequence=TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG
[
   {
      "query" : "sequence_1",
      "query_sequence" : "SLYNTVAVLYYVHQR",
      "base_type" : "amino acid",
      "reverse_complement" : "0",
      "alignment" : "\n Query SLYNTVAVLY YVHQR  15\n       :::::::.::  ::::    \n  HXB2 SLYNTVATLY CVHQR\n\n  ",
      "hxb2_sequence" : "SLYNTVATLYCVHQR",
      "similarity_to_hxb2" : "86.7",
      "start" : "77",
      "end" : "91",
      "genome_start" : "1018",
      "genome_end" : "1062",
      "polyprotein" : "Gag",
      "region_names" : [
         "Gag",
         "p17"
      ],
      "regions" : [
         {
            "cds" : "Gag",
            "aa_from_cds_start" : [
               "229",
               "273"
            ],
            "aa_from_polyprotein_start" : null,
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "aa_from_query_start" : [
               "1",
               "15"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ]
         },
         {
            "cds" : "p17",
            "aa_from_cds_start" : [
               "229",
               "273"
            ],
            "aa_from_polyprotein_start" : null,
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "aa_from_query_start" : [
               "1",
               "15"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ]
         }
      ]
   },
   {
      "query" : "sequence_2",
      "query_sequence" : "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
      "base_type" : "nucleotide",
      "reverse_complement" : "0",
      "alignment" : "\n Query TCATTATATA ATACAGTAGC AACCCTCTAT TGTGTGCATC AAAGG  45\n       :::::::::: :::::::::: :::::::::: :::::::::: ::::: \n  HXB2 TCATTATATA ATACAGTAGC AACCCTCTAT TGTGTGCATC AAAGG  1062\n\n  ",
      "hxb2_sequence" : "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
      "similarity_to_hxb2" : "100.0",
      "start" : "229",
      "end" : "273",
      "genome_start" : "1018",
      "genome_end" : "1062",
      "polyprotein" : "Gag",
      "region_names" : [
         "Gag",
         "p17"
      ],
      "regions" : [
         {
            "cds" : "Gag",
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "na_from_cds_start" : [
               "229",
               "273"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ],
            "na_from_query_start" : [
               "1",
               "45"
            ],
            "protein_translation" : "SLYNTVATLYCVHQR"
         },
         {
            "cds" : "p17",
            "aa_from_protein_start" : [
               "77",
               "91"
            ],
            "na_from_cds_start" : [
               "229",
               "273"
            ],
            "na_from_hxb2_start" : [
               "1018",
               "1062"
            ],
            "na_from_query_start" : [
               "1",
               "45"
            ],
            "protein_translation" : "SLYNTVATLYCVHQR"
         }
      ]
   }
]
curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --data base='amino acid' \
     --data sequence=MGGDMKDNW
curl -X POST https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv \
     --form base=nucleotide \
     --form fasta=@/path/to/your/input.fa

Perl

Directly using Bio::WebService::LANL::SequenceLocator

#!/usr/bin/env perl
#
# First install the library:
#   cpan -i Bio::WebService::LANL::SequenceLocator
# 
use strict;
use warnings;
use Bio::WebService::LANL::SequenceLocator;

my $locator = Bio::WebService::LANL::SequenceLocator->new(
    agent_string => 'Your Organization - you@example.com',
);

my @sequences = $locator->find([
    "agcaatcagatggtcagccaaaattgccctatagtgcagaacatcc"
   ."aggggcaagtggtacatcaggccatatcacctagaactttaaatgca",
]);

Through our web API

#!/usr/bin/env perl
use strict;
use warnings;

use JSON qw< decode_json >;
use LWP::UserAgent;

my $agent    = LWP::UserAgent->new( agent => 'you@example.com' );
my $response = $agent->post(
    "https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv" => [
        sequence => "TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG",
    ],
);
unless ($response->is_success) {
    die "Request failed: ", $response->status_line, "\n",
        $response->decoded_content;
}
my $results = decode_json( $response->decoded_content );

# $results is now an array ref, like the JSON above
print $results->[0]{polyprotein}, "\n";

Python

#!/usr/bin/env python2
from urllib2 import Request, urlopen, URLError
from urllib  import urlencode
import json

request = Request('https://indra.microbiol.washington.edu/locate-sequence/within/hiv')
data = urlencode({
    'sequence': [
        'SLYNTVAVLYYVHQR',
        'TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG'
    ]
}, True);

try:
    response = urlopen(request, data)
    text     = response.read()
    results  = json.loads(text)
except URLError, e:
    print 'Request failed: ', e
except ValueError, e:
    print 'Decoding JSON failed: ', e
finally:
    if results == None:
        exit(1)

print results

R

library("RCurl")
library("rjson")

results = tryCatch(
  fromJSON(
    postForm(
      "https://indra.mullins.microbiol.washington.edu/locate-sequence/within/hiv",
      sequence="SLYNTVAVLYYVHQR",
      sequence="TCATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGG")),
  HTTPError = function(e) cat("Error making request: ", e$message),
  error = function(e) cat("Error decoding JSON"))

print(lapply(results, function(s) s$genome_start))