What is Genoogle? Sticky

Genoogle is a software for fast similar DNA sequence searching. It is a full functional similar genetics searching tool, having a text mode interface with simple scripting language, web interface, and web service interface. Genoogle is fast, easy and completely free.

You, as biotechnology researcher, will use Genoogle to find similar genetics sequences from your new sequences at previous know data base. Genoogle is easy to embedded into another projects, as with its java and jar files or using its web service interface. Genoogle is very fast. With the use of its data indexing and parallel computing techniques , it can be more than twenty-five times faster than BLAST-like softwares.

Genoogle is fully developed in Java and its entire source code is available and released under GPL3 License. It was tested and executed at Linux, Windows and Mac OS X platforms without modification.

Actual Features:

  • Fast similar sequences searching.
  • Good sensibility.
  • Text mode interfaces.
  • Web Services Interface.
  • Very Simple Web interface, but support for JSP.
  • Good memory requirements. (For a 4 gigabytes data bank, it is necessary not more than 3 gigabytes of RAM memory).
  • Working (and tested) at Windows and Linux.
  • Data banks with more than 8 gigabytes.
  • Console and batch interfaces.

Missing and planed features:

  • Better web interface.
  • Proteins indexing and searching.
  • Clusters implementation.

It is a Beta Version of Genoogle, it means: it lacks some features and have some know and unknown bugs.
So, I hope that the users (you!) will inform me about bugs and features which you will like to have.

For further information, read the Installatiom Guide, FAQ, and Manuscript.

The name Genoogle comes from Genes + Google, the final world domination plan is to develop a software to locate genes likes Google is to locate information in the Web. Genoogle does *not* have *any* affiliation with Google Inc and I hope its name will not cause problem.

Release-0.81

Jun
10

Genoogle BETA 0.81

Date: 06/10/2011.

Download link: http://svn.pih.bio.br/genoogle-packages/releases/0.81/

If you already has Genoogle, just download the new genoogle.jar at http://svn.pih.bio.br/genoogle-packages/releases/0.81/genoogle.jar and replace it, otherwise, you can download the full package at http://svn.pih.bio.br/genoogle-packages/releases/0.81/genoogle.tar.gz

Main changes:

  • Fixed bug of double values display.
  • Changed XML output for the alignment. Now it is used query, target and align.
  • Filtering and removing insignificant alignments, it is, alignments with evalue bigger than 0.1.
  • When then query is a fasta file with header, the header is displayed at the XML output file.
  • Support for contig fasta header format, done specially for the E.coli.
  • Support for EMBL fasta header format.
  • FIxed bug that was missing the file run_standalone_web.sh at the genoogle.tar.gz package.
  • Using protobuf version 2.4.

Please use this Group to send questions, suggestions and bugs reports.

Posted By albrecht read more

Case: Escherichia_coli_TY-2482.contig.fa against ecoli.nt

Jun
08

One of the biggest public health news is the "new" E. Coli bacteria, which until now killed 22 people on Germany.
The Genoogle project is always seeking for "real world" tasks, and we decided to make a comparation of the E. Coli O104 Genome Assembly agains the E. Coli genome provided by NCBI.

For that, the E. Coli genome, with 4.5Mb approximately, was formated using the following parameters:

mask="111010010100110111" sub-sequence-length="11" low-complexity-filter="5"

A search using the Escherichia_coli_TY-2482.contig.fa (with 5Mb) sequence with the following parameters was made:

max-sub-sequence-distance value="11" min-hsp-length="11" extend-dropoff="5"
max-hits-results="3" max-threads-index-search="4"max-threads-extend-align="16"

The full search, of all 1217 contigs took 7.5 seconds and the results can be observed at http://pih.bio.br/genoogle/Escherichia_coli_TY-2482_X_ecoli.xml.

It shows that Genoogle is really fast and shows interesting facts, like that for very similar sections, by example at iteration 8, where found a long similar place at AE000437 Escherichia coli K-12 MG1655 section 327 of 400 of the complete genome, it still have small mutations.

For the Genoogle development point of view, two things could be observed:
- It is extremely necessary to display at results the input query sequence, to be possible to have a context. It will be made displaying the input sequence header at the iteration field.
- Filter low level scored/high e-value from the output. The output displays alignment of 4, 5 base pair, with e-value higher than one, which means complete non-sense alignment.

The two tasks are being development now and soon we have done it, a new version will be released and a new search between E. Coli genomes will be made.

Posted By albrecht read more
Subscribe to Genoogle RSS
X
Enter your Genoogle username.
Enter the password that accompanies your username.
Loading