Features
Main Sphinx features are:
- high indexing and search speed;
- indexing MySQL databases directly;
- indexing any kinds of documents through XML interface;
- support for phrase proximity ranking (a kind of passage ranking);
- support for English and Russian stemming;
- support for any number of document fields with on-the-fly configurable weights;
- support for document groups (ie. limiting search to a set of database subsections on-the-fly);
- support for stopwords;
- support for "match all" and "match any" search modes;
- APIs for PHP, Perl and C++.
Some of the features which are planned to be added in the near future:
- indexing other databases (such as PostgreSQL, Interbase, etc) directly;
- support for classic TF*IDF (non-phrase) ranking;
- support for query language allowing for boolean queries, exact phrase queries, etc;
- support for extracting relevant document excerpts;
- support for fuzzy word matching and query correction services;
- support for query results caching.
Key Sphinx features are its speed and phrase proximity
ranking.
As for the speed, indexing on modern machines (ie. on 2000-3000 MHz CPUs)
is up to 4000-6000 KB/sec and most queries execute in 0.1 to 0.3 seconds even
without stopword removal.
With phrase proximity ranking, the better the match between query
phrase and the document field, the higher is the rank, with perfect phrase
match yielding the highest rank. Compared to usual statistical TF*IDF ranking found
in most other engines, this ranking scheme usually provides better results.
(*) on this site, most of CPU time is spent
in external HTML filtering program.