Instructions for parseBlastXML_calcFreq.pl

This script parses the xml output file from NCBI's Blastn program and calculates nucleotide frequencies of query sequences at each position of the subject (reference) sequence.

Input

  1. Blastn output xml file.
  2. Subject (reference) sequence fasta file.
  3. Cutoff value for fraction of query sequence that aligned to subject sequence (0-1). The alignment that fraction is below the cutoff is considered as randomly. The default value is 0.6.

Output

  1. Tab delimited table of nucleotide frequencies at each position of subject (reference) sequence.

How to run

  1. Download parseBlastXML_calcFreq.pl to your local computer, and set it as executable by typing the following command in your working directory:

    chmod +x parseBlastXML_calcFreq.pl

  2. Create a subdirectory in your working directory to hold input files and output files. In the subdirectory, type following command:

    perl ../parseBlastXML_calcFreq.pl blastXmlOutFile referencFastaFile outputFreqFile alignFractionCutoff