Parameter-free Alignment-free comparison of regulatory sequences (cis-regulatory modules)


Abstract:
One of the most basic question in bioinformatics is how to measure similarity between biological sequences. When dealing with protein sequences or coding genes, this is probably the most studied problem as it relates to the identification of homologous sequences. The use of tools like BLAST \cite{blast} to assess the degree of similarity between two sequences is nowadays a standard procedure. In this paper we focus on the same question, but for regulatory sequences like promoters or enhancers of genes.

Software

Here you can find the java application UnderII with some examples.
Unzip the following file: ZIP

Run UnderII using the command:
java -jar UnderII.jar SequenceFileName NumberOfSequences OutputFile

Where the file "SequencesFileName" is a Fasta file, "NumberOfSequences" is the number of sequences in the input file and "OutputFile" contains all pairwise scores.

To run the example included type:
java -jar UnderlyII.jar ForeBrain.fasta 10 Output_ForeBrain.txt

Where the file ForeBrain.fasta contains a set of mouse enhancers active in the Fore Brain.

Licence

The software is freely available for academic use.
For questions about the tool, please contact Matteo Comin or Davide Verzotto.

Reference

Please cite the following papers:
M.Comin, D. Verzotto,
"Beyond fixed-resolution alignment-free measures for mammalian enhancers sequence comparison"
Proceeding of the 12-th Asia Pacific Bioinformatics Conference 2014.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014, Vol. 11, No. 4, pp. 628-637 Pdf
(impact factor 1.7)