SequenceUtils

java.lang.Object
- info.bioinfweb.commons.bio.SequenceUtils

```
public class SequenceUtils
extends java.lang.Object
```
Useful tools to manipulate nucleotide and amino acid sequences stored as character sequences.

Author:

Ben Stöver

Field Summary

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`ALL_DNA_CHARS`
`static java.lang.String`	`ALL_RNA_CHARS`
`static java.lang.String`	`DNA_CHARS`
`static char`	`GAP_CHAR`
`static char`	`MATCH_CHAR`
`static char`	`MISSING_DATA_CHAR`
`static java.lang.String`	`RNA_CHARS`
`static char`	`STOP_CODON_CHAR`

Constructor Summary

Constructors
Constructor and Description

SequenceUtils()

Constructors
Constructor and Description
`SequenceUtils()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static char`	`aminoAcidConsensus(java.lang.String[] alignmentColumn)`
`static java.util.Map<java.lang.Character,java.lang.Double>`	`aminoAcidFrequencies(java.lang.String[] alignmentColumn)`
`static char`	`complement(char c)` Returns the complement of the specified character.
`static java.lang.String`	`complement(java.lang.CharSequence sequence)` Returns the complementing sequence.
`static java.lang.String`	`deleteAllGaps(java.lang.CharSequence seq)` Returns the specified sequence without any gaps.
`static java.lang.String`	`deleteFromLeft(java.lang.String seq, int count)` Deletes `count` characters from the left side of the sequence which are not gaps ("-").
`static java.lang.String`	`deleteFromRight(java.lang.String seq, int count)` Deletes `count` characters from the left side of the sequence which are not gaps ("-").
`static java.lang.String`	`deleteGapsFromLeft(java.lang.String seq, int count)` Deletes the specified number of gaps from the sequence starting on the left side.
`static java.lang.String`	`deleteGapsFromRight(java.lang.String seq, int count)` Deletes the specified number of gaps from the sequence starting on the right side.
`static java.lang.String`	`deleteLeadingGaps(java.lang.String seq)` Deletes all leading gaps from a sequence.
`static java.lang.String`	`deleteLeadingTrailingGaps(java.lang.String seq)` Deletes all leading and trailing gaps of the specified sequence.
`static java.lang.String`	`deleteTrailingGaps(java.lang.String seq)` Deletes all trailing gaps from a sequence.
`static java.lang.String`	`dnaToRNA(java.lang.String dna)`
`static java.util.Set<java.lang.Character>`	`getAminoAcidOneLetterCodes(boolean includeAmbiguity)` Returns a set of all amino acid one letter codes in upper case.
`static java.util.Set<java.lang.String>`	`getAminoAcidThreeLetterCodes(boolean includeAmbiguity)` Returns a set of all amino acid three letter codes in upper case.
`static java.util.Set<java.lang.Character>`	`getNucleotideCharacters()` Returns a set of all nucleotide characters, including 'T' and 'U' as well as all IUPAC ambiguity codes in upper case.
`static boolean`	`isAminoAcidAmbiguityCode(java.lang.String code)` Determines whether the specified character is an amino acid ambiguity code.
`static boolean`	`isDNAChar(char c)`
`static boolean`	`isInTokenSet(char c, java.lang.String tokens)`
`static boolean`	`isNonAmbiguityAminoAcid(java.lang.String code)`
`static boolean`	`isNonAmbiguityNucleotide(char c)`
`static boolean`	`isNucleotideAmbuguityCode(char nucleotide)` Determines whether the specified character is an IUPAC ambiguity code.
`static boolean`	`isRNAChar(char c)`
`static java.lang.String`	`leftSubsequence(java.lang.String seq, int count)` Returns the left subsequence with the specified length (which does not include gaps).
`static int`	`lengthWOGaps(java.lang.CharSequence sequence)`
`static char`	`nucleotideConsensus(char[] alignmentColumn)` Returns the nucleotide that occurs most often in the specified alignment row.
`static char[]`	`nucleotideConstituents(char nucleotide)` If the specified nucleotide is an IUPAC ambiguity code this method returns an array containing all nucleotides that could be represented by the code.
`static java.util.Map<java.lang.Character,java.lang.Double>`	`nucleotideFrequencies(char[] alignmentColumn)` Counts the nucleotide frequencies in the specified alignment column.
`static char`	`oneLetterAminoAcidByThreeLetter(java.lang.String threeLetterCode)` Converts the specified three letter amino acid code into a one letter representation.
`static char[]`	`oneLetterAminoAcidConstituents(java.lang.String code)` Returns the one letter amino acid representations of all compounds that could be represented by the specified ambiguity code.
`static java.lang.String`	`randSequence(boolean dna, int length, double rateCG)` Returns a sequence of random DNA or RNA characters.
`static java.lang.String`	`randSequence(boolean dna, int length, double rateC, double rateG, double rateA)` Returns a sequence of random DNA or RNA characters.
`static java.lang.String`	`reverse(java.lang.CharSequence sequence)` Returns the reverse of the specified sequence.
`static java.lang.String`	`reverseComplement(java.lang.CharSequence sequence)` Returns the reverse complemented sequence.
`static java.lang.String`	`rightSubsequence(java.lang.String seq, int count)` Returns the right subsequence with the specified length (which does not include gaps).
`static char[]`	`rnaConstituents(char nucleotide)`
`static java.lang.String`	`rnaToDNA(java.lang.String rna)`
`static java.lang.String`	`threeLetterAminoAcidByOneLetter(char oneLetterCode)` Converts the specified one letter amino acid code into a three letter representation.
`static java.lang.String[]`	`threeLetterAminoAcidConstituents(java.lang.String code)` Returns the three letter amino acid representations of all compounds that could be represented by the specified ambiguity code.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - GAP_CHAR
```
public static final char GAP_CHAR
```
    See Also:
    
    Constant Field Values
  - MISSING_DATA_CHAR
```
public static final char MISSING_DATA_CHAR
```
    See Also:
    
    Constant Field Values
  - MATCH_CHAR
```
public static final char MATCH_CHAR
```
    See Also:
    
    Constant Field Values
  - STOP_CODON_CHAR
```
public static final char STOP_CODON_CHAR
```
    See Also:
    
    Constant Field Values
  - DNA_CHARS
```
public static final java.lang.String DNA_CHARS
```
    See Also:
    
    Constant Field Values
  - ALL_DNA_CHARS
```
public static final java.lang.String ALL_DNA_CHARS
```
    See Also:
    
    Constant Field Values
  - RNA_CHARS
```
public static final java.lang.String RNA_CHARS
```
    See Also:
    
    Constant Field Values
  - ALL_RNA_CHARS
```
public static final java.lang.String ALL_RNA_CHARS
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SequenceUtils
```
public SequenceUtils()
```
- Method Detail
  - getNucleotideCharacters
```
public static java.util.Set<java.lang.Character> getNucleotideCharacters()
```
    Returns a set of all nucleotide characters, including 'T' and 'U' as well as all IUPAC ambiguity codes in upper case.
    
    Returns:
    
    a new set with all nucleotide tokens
  - nucleotideConstituents
```
public static char[] nucleotideConstituents(char nucleotide)
```
    If the specified nucleotide is an IUPAC ambiguity code this method returns an array containing all nucleotides that could be represented by the code.
    Constituents returned for ambiguity code are always DNA nucleotides (thymine is always used instead of uracil). (Anyway, if 'U' is specified for nucleotide the returned array will contain 'U' as the only element and not 'T').
    If the specified character is not an ambiguity code character, an empty array is returned.
    
    Parameters:
    
    nucleotide - the character that may be an ambiguity code
    
    Returns:
    
    an array of the nucleotides as upper case letters
  - rnaConstituents
```
public static char[] rnaConstituents(char nucleotide)
```
  - isNonAmbiguityNucleotide
```
public static boolean isNonAmbiguityNucleotide(char c)
```
  - isNucleotideAmbuguityCode
```
public static boolean isNucleotideAmbuguityCode(char nucleotide)
```
    Determines whether the specified character is an IUPAC ambiguity code.
    
    Parameters:
    
    nucleotide - the character that may be an ambiguity code
    
    Returns:
    
    true of the specified character is a valid ambiguity code, false otherwise.
  - getAminoAcidOneLetterCodes
```
public static java.util.Set<java.lang.Character> getAminoAcidOneLetterCodes(boolean includeAmbiguity)
```
    Returns a set of all amino acid one letter codes in upper case.
    
    Parameters:
    
    includeAmbiguity - Specify true here if amino acid ambiguity codes shall also be contained in the returned set or false if only unambiguous characters shall be contained.
    
    Returns:
    
    a new set of amino acid codes
  - getAminoAcidThreeLetterCodes
```
public static java.util.Set<java.lang.String> getAminoAcidThreeLetterCodes(boolean includeAmbiguity)
```
    Returns a set of all amino acid three letter codes in upper case.
    
    Parameters:
    
    includeAmbiguity - Specify true here if amino acid ambiguity codes shall also be contained in the returned set or false if only unambiguous characters shall be contained.
    
    Returns:
    
    a new set of amino acid codes
  - oneLetterAminoAcidByThreeLetter
```
public static char oneLetterAminoAcidByThreeLetter(java.lang.String threeLetterCode)
```
    Converts the specified three letter amino acid code into a one letter representation. Besides the defined amino acid codes this method also accepts any string which consists of one character that is repeated three times. (E.g. "---" would be converted to '-').
    
    Parameters:
    
    threeLetterCode - the three letter code to be converted
    
    Returns:
    
    the according three letter code in upper case
    
    Throws:
    
    java.lang.IllegalArgumentException - if the specified code is not a valid one letter amino acid code or a three character long repetition of the same character
  - threeLetterAminoAcidByOneLetter
```
public static java.lang.String threeLetterAminoAcidByOneLetter(char oneLetterCode)
```
    Converts the specified one letter amino acid code into a three letter representation.
    
    Parameters:
    
    oneLetterCode - the one letter code to be converted
    
    Returns:
    
    the according three letter code where the first letter is in upper case (e.g. "Pro") if oneLetterCode was a valid amino acid representation or a string consisting of three repetitions of the specified character otherwise (E.g. '-' would be converted to "---".)
  - oneLetterAminoAcidConstituents
```
public static char[] oneLetterAminoAcidConstituents(java.lang.String code)
```
    Returns the one letter amino acid representations of all compounds that could be represented by the specified ambiguity code.
    
    Parameters:
    
    code - the ambiguity one or three letter ambiguity code to be converted
    
    Returns:
    
    an array with the one letter representations of the respective amino acids or an empty array if code was not a valid ambiguity code
  - threeLetterAminoAcidConstituents
```
public static java.lang.String[] threeLetterAminoAcidConstituents(java.lang.String code)
```
    Returns the three letter amino acid representations of all compounds that could be represented by the specified ambiguity code.
    
    Parameters:
    
    code - the ambiguity one or three letter ambiguity code to be converted
    
    Returns:
    
    an array with the three letter representations of the according amino acids or an empty array if code was not a valid ambiguity code
  - isNonAmbiguityAminoAcid
```
public static boolean isNonAmbiguityAminoAcid(java.lang.String code)
```
  - isAminoAcidAmbiguityCode
```
public static boolean isAminoAcidAmbiguityCode(java.lang.String code)
```
    Determines whether the specified character is an amino acid ambiguity code.
    
    Parameters:
    
    code - the string that may be an ambiguity code (one and three letter codes are supported)
    
    Returns:
    
    true of the specified character is a valid amino acid ambiguity code, false otherwise.
  - reverse
```
public static java.lang.String reverse(java.lang.CharSequence sequence)
```
    Returns the reverse of the specified sequence.
    
    Parameters:
    
    sequence - the source sequence
    
    Returns:
    
    the inverse sequence
  - complement
```
public static char complement(char c)
```
    Returns the complement of the specified character. (IUPAC ambiguity codes are also supported.) If an undefined character is specified (e.g. "-") it is returned unchanged.
    
    Parameters:
    
    c - the nucleotide to be complemented
    
    Returns:
    
    the complemented nucleotide character in upper or lower case depending on c
  - complement
```
public static java.lang.String complement(java.lang.CharSequence sequence)
```
    Returns the complementing sequence. Ambiguity codes are supported. Upper case and lower case characters are replaced by their according complements.
    
    Parameters:
    
    sequence - the sequence to be complemented
    
    Returns:
    
    always a DNA sequence (no matter if the input was RNA or not)
  - reverseComplement
```
public static java.lang.String reverseComplement(java.lang.CharSequence sequence)
```
    Returns the reverse complemented sequence. Ambiguity codes are supported. Upper case and lower case characters are replaced by their according complements.
    
    Parameters:
    
    sequence - the sequence to be reverse complemented
    
    Returns:
    
    always a DNA sequence (no matter if the input was RNA or not)
  - rnaToDNA
```
public static java.lang.String rnaToDNA(java.lang.String rna)
```
  - dnaToRNA
```
public static java.lang.String dnaToRNA(java.lang.String dna)
```
  - isDNAChar
```
public static boolean isDNAChar(char c)
```
  - isRNAChar
```
public static boolean isRNAChar(char c)
```
  - isInTokenSet
```
public static boolean isInTokenSet(char c,
                                   java.lang.String tokens)
```
  - lengthWOGaps
```
public static int lengthWOGaps(java.lang.CharSequence sequence)
```
  - leftSubsequence
```
public static java.lang.String leftSubsequence(java.lang.String seq,
                                               int count)
```
    Returns the left subsequence with the specified length (which does not include gaps).
    If there are less than count characters in seq the whole sequence is returned.
    Example: leftSubsequence("AT-A-GCCTG-CTG", 5) would return AT-A-GC
    
    Parameters:
    
    seq - the source sequence
    
    count - the number characters that shall be contained in the returned subsequence (not including gaps)
    
    Returns:
    
    the left subsequence containing the specified number of characters and possibly additional gaps
  - rightSubsequence
```
public static java.lang.String rightSubsequence(java.lang.String seq,
                                                int count)
```
    Returns the right subsequence with the specified length (which does not include gaps).
    If there are less than count characters in seq the whole sequence is returned.
    Example: rightSubsequence("AT-A-GCCTG-CTG", 5) would return TG-CTG
    
    Parameters:
    
    seq - the source sequence
    
    count - the number characters that shall be contained in the returned subsequence (not including gaps)
    
    Returns:
    
    the right subsequence containing the specified number of characters and possibly additional gaps
  - deleteFromLeft
```
public static java.lang.String deleteFromLeft(java.lang.String seq,
                                              int count)
```
    Deletes count characters from the left side of the sequence which are not gaps ("-").
    
    Parameters:
    
    seq - the original sequence
    
    count - the number of characters to be removed
    
    Returns:
    
    the sequence without the deleted characters and gaps
  - deleteFromRight
```
public static java.lang.String deleteFromRight(java.lang.String seq,
                                               int count)
```
    Deletes count characters from the left side of the sequence which are not gaps ("-").
    
    Parameters:
    
    seq - the original sequence
    
    count - the number of characters to be removed
    
    Returns:
    
    the sequence without the deleted characters and gaps
  - deleteLeadingGaps
```
public static java.lang.String deleteLeadingGaps(java.lang.String seq)
```
    Deletes all leading gaps from a sequence.
    
    Parameters:
    
    seq -
    
    Returns:
  - deleteTrailingGaps
```
public static java.lang.String deleteTrailingGaps(java.lang.String seq)
```
    Deletes all trailing gaps from a sequence.
    
    Parameters:
    
    seq -
    
    Returns:
  - deleteGapsFromLeft
```
public static java.lang.String deleteGapsFromLeft(java.lang.String seq,
                                                  int count)
```
    Deletes the specified number of gaps from the sequence starting on the left side. If the sequence contains less than the specified number of gaps a sequence without gaps is returned.
    
    Parameters:
    
    seq - the sequence that contains the gaps
    
    count - the number of gaps to be removed
    
    Returns:
    
    the sequence with less gaps
  - deleteGapsFromRight
```
public static java.lang.String deleteGapsFromRight(java.lang.String seq,
                                                   int count)
```
    Deletes the specified number of gaps from the sequence starting on the right side. If the sequence contains less than the specified number of gaps a sequence without gaps is returned.
    
    Parameters:
    
    seq - the sequence that contains the gaps
    
    count - the number of gaps to be removed
    
    Returns:
    
    the sequence with less gaps
  - deleteLeadingTrailingGaps
```
public static java.lang.String deleteLeadingTrailingGaps(java.lang.String seq)
```
    Deletes all leading and trailing gaps of the specified sequence.
    
    Parameters:
    
    seq -
    
    Returns:
  - deleteAllGaps
```
public static java.lang.String deleteAllGaps(java.lang.CharSequence seq)
```
    Returns the specified sequence without any gaps.
    
    Parameters:
    
    seq -
    
    Returns:
  - randSequence
```
public static java.lang.String randSequence(boolean dna,
                                            int length,
                                            double rateC,
                                            double rateG,
                                            double rateA)
```
    Returns a sequence of random DNA or RNA characters. (The three rates together have to be lower than 1. The rate for thymidine or uracil is determined from the others.)
    
    Parameters:
    
    dna - determines whether a DNA sequence shall be returned (RNA id false)
    
    length - the length of the sequence
    
    rateC - the rate for cytosine
    
    rateG - the rate for guanine
    
    rateA - the rate for adenine
    
    Returns:
    
    the random sequence
  - randSequence
```
public static java.lang.String randSequence(boolean dna,
                                            int length,
                                            double rateCG)
```
    Returns a sequence of random DNA or RNA characters.
    
    Parameters:
    
    dna - determines whether a DNA sequence shall be returned (RNA id false)
    
    length - the length of the sequence
    
    rateCG - the rate for cytosine and guanine (must lower than 1)
    
    Returns:
    
    the random sequence
  - nucleotideFrequencies
```
public static java.util.Map<java.lang.Character,java.lang.Double> nucleotideFrequencies(char[] alignmentColumn)
```
    Counts the nucleotide frequencies in the specified alignment column.
    IUPAC ambiguity codes are supported and counted accordingly for several nucleotides. (Examples: N would be counted as 0.25 for each nucleotide, R would be counted as 0.5 for C and 0.5 for T.)
    
    Parameters:
    
    alignmentColumn - the contents of the alignment column from which the consensus shall be calculated
    
    Returns:
    
    a map with the nucleotides as keys and their frequencies as values
  - nucleotideConsensus
```
public static char nucleotideConsensus(char[] alignmentColumn)
```
    Returns the nucleotide that occurs most often in the specified alignment row. IUPAC ambiguity codes are recognized and counted accordingly for several nucleotides.
    
    Parameters:
    
    alignmentColumn - the contents of the alignment column from which the consensus shall be calculated
    
    Returns:
    
    the most frequent nucleotide (If several nucleotides occur equally often, one of them is chosen arbitrarily.)
  - aminoAcidFrequencies
```
public static java.util.Map<java.lang.Character,java.lang.Double> aminoAcidFrequencies(java.lang.String[] alignmentColumn)
```
  - aminoAcidConsensus
```
public static char aminoAcidConsensus(java.lang.String[] alignmentColumn)
```

Class SequenceUtils

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

GAP_CHAR

MISSING_DATA_CHAR

MATCH_CHAR

STOP_CODON_CHAR

DNA_CHARS

ALL_DNA_CHARS

RNA_CHARS

ALL_RNA_CHARS

Constructor Detail

SequenceUtils

Method Detail

getNucleotideCharacters

nucleotideConstituents

rnaConstituents

isNonAmbiguityNucleotide

isNucleotideAmbuguityCode

getAminoAcidOneLetterCodes

getAminoAcidThreeLetterCodes

oneLetterAminoAcidByThreeLetter

threeLetterAminoAcidByOneLetter

oneLetterAminoAcidConstituents

threeLetterAminoAcidConstituents

isNonAmbiguityAminoAcid

isAminoAcidAmbiguityCode

reverse

complement

complement

reverseComplement

rnaToDNA

dnaToRNA

isDNAChar

isRNAChar

isInTokenSet

lengthWOGaps

leftSubsequence

rightSubsequence

deleteFromLeft

deleteFromRight

deleteLeadingGaps

deleteTrailingGaps

deleteGapsFromLeft

deleteGapsFromRight

deleteLeadingTrailingGaps

deleteAllGaps

randSequence

randSequence

nucleotideFrequencies

nucleotideConsensus

aminoAcidFrequencies

aminoAcidConsensus