NEWS
PROJECTS
sphinx
videothief
cdfixer
skypeexport
ABOUT
CONTACT


< 9/2010 >
mo tu we th fr sa su
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30




Sphinx

MySQL fulltext search engine



Overview

Sphinx is a fulltext search engine, primarily developed to implement efficient high quality fulltext searching through SQL databases, but also capable of searching trhough other kinds of locally available documents.

Sphinx source distribution provides the following software:

  • indexer: an utility to create fulltext indices;
  • search: a simple utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (such as Web scripts);
  • sphinxapi: a set of API libraries for popular Web scripting languages (currently, PHP and Perl);
  • sphinxlib: embeddable C++ library.


Features

Main Sphinx features are:

  • high indexing and search speed;
  • indexing MySQL databases directly;
  • indexing any kinds of documents through XML interface;
  • support for phrase proximity ranking (a kind of passage ranking);
  • support for English and Russian stemming;
  • support for any number of document fields with on-the-fly configurable weights;
  • support for document groups (ie. limiting search to a set of database subsections on-the-fly);
  • support for stopwords;
  • support for "match all" and "match any" search modes;
  • APIs for PHP, Perl and C++.

Some of the features which are planned to be added in the near future:

  • indexing other databases (such as PostgreSQL, Interbase, etc) directly;
  • support for classic TF*IDF (non-phrase) ranking;
  • support for query language allowing for boolean queries, exact phrase queries, etc;
  • support for extracting relevant document excerpts;
  • support for fuzzy word matching and query correction services;
  • support for query results caching.

Key Sphinx features are its speed and phrase proximity ranking.

As for the speed, indexing on modern machines (ie. on 2000-3000 MHz CPUs) is up to 4000-6000 KB/sec and most queries execute in 0.1 to 0.3 seconds even without stopword removal.

With phrase proximity ranking, the better the match between query phrase and the document field, the higher is the rank, with perfect phrase match yielding the highest rank. Compared to usual statistical TF*IDF ranking found in most other engines, this ranking scheme usually provides better results.


Performance examples

To give you an idea what's Sphinx performance is, we provide the following examples. Note that the data in this examples table is NOT from syntetic tests, but, instead, obtained by analyzing logs on live Web sites where Sphinx is installed.

Site info Documents Text size CPU Index Search
Lyrics site 140,000 152 Mb 1.7 GHz 50-60 sec 0.11 sec avg
General forum 65,000 548 Mb 1.26 GHz 500-550 sec (*) 0.04 sec avg
Web directory 3,220,000 348 Mb 2.0 GHz 120-130 sec 0.09 sec avg

(*) on this site, most of CPU time is spent in external HTML filtering program.


Get it

Sphinx is now finally available to general public via it's own sphinxsearch.com site.



  copyright © 1998-2004
all rights reserved
thiefs will be zombified
designed by Steorra