PHP Search Engine for Zen Cart: Sphinx or Sphider?
A search engine module based on Sphider for Zen Cart Module was recently developed
(see
http://www.zen-cart.com/index.php?main_page=product_contrib_info&cPath=40_47&products_id=1194).
This module is good for a web site with lot of EasyPages. But for a web
site with 50,000 products and structured data, Sphinx is still the best.
Features of Sphinx
SQL Full-text Search Engine
-
Indexes via SphinxSE direct connection to
MySQL or PostgreSQL - Accepts data in XML format from using the Sphinx API
- Note that this package does not include
a robot spider for traversing web sites. -
High-speed indexing, up to 10 MB/sec on
fast CPU - Can store up to 100 GB text / 100 million documents on a
single CPU -
Indexes any number of fields and metadata
tags, can be changed on the fly -
Stemming for English and Russian words -
Simple search modes, “match all”, “match
any”, “match phrase”
-
Also supports Boolean expressions (using
&, |, -, and parentheses) - Field searching, wildcarding, and proximity operators
-
Fast searching (under 0.1 second on 2-4 GB
index) - Stemming query processing English and Russian natively, and
with libstemmer
for French, Spanish, Portuguese, Italian, German, Dutch, Swedish,
Norwegian, Danish, Finnish, Hungarian -
Relevance based on BM25,
(a flavor of TF:IDF) and phrase-based ranking. - Sort by relevance, metadata in ascending or descending
order, time range, or a SQL-like expression - Group results by day, week, month, year, or an attribute
value, and show only the best match per group. -
Can adjust relevance according to fields
(even at runtime) - Options for faceted metadata display
-
XML results interface for custom
integration - Can replace SQL WHERE, ORDER BY, and GROUP BY query elements
- APIs for PHP, Python, Java, Perl,
and Ruby - Can utilize many CPU cores
-
Distributed search – multiple search
engines can search segmented indexes on multiple machines. - Scales to at least 1.2 billion records, 30 million searches
per day on a 15-server installation -
Source code and binaries available -
Administration by config file
Sphider Features:
Spidering and
indexing
* Performs full text indexing.
* Can index both static and dynamic
pages.
* Finds links in href, frame, area and
meta tags, and can also follow links given in javascript as strings via
window.location and window.open.
* Respects robots.txt protocol, and
nofollow and noindex tags.
* Follows server side redirections.
* Allows spidering to be limited by
depth (ie maximum number of clicks from the starting page), by
(sub)domain or by directory.
* Allows spidering only the urls
matching (or not matching) certain keywords or regular expressions.
* Supports indexing of pdf and doc files
(using external binaries for file conversion).
* Allows resuming paused spidering.
* Possbility to exclude common words
from being indexed.
Searching
* Supports AND, OR and phrase searches
* Supports excluding words (by putting a
‘-’ in front of a word, any page including the word will be omitted
from the results).
* Option to add and group sites into
categories
* Possibility to limit searching to a
given category and its subcategories.
* Possibility of searcing in a specified
domain only.
* “Did you mean” search suggestion on
mistyped queries.
* Context-sensitive auto-completion on
search terms (a la Google Suggest)
* Word stemming for english (searching
for “run” finds “running”, “runs” etc).
Comparison of Sphider and Sphinx PHP search engines
| Sphider | Sphinx | |
|---|---|---|
| Overall ranking | *** | *** |
| Database | MySQL, SQLite | MySQL, PostgreSQL, Flat files |
| Multilanguage support | No | Yes |
| Support | Medium (forum) | Good |
| User interface | Easy | Easy |
| Customizability | High | High |
| PHP 5 compatible | Yes | Yes |
| SQLite compatible | No | No |
| URL-free crawling | Yes | Yes |
| Install package download | 44K | ~300K |
| Installation | Medium | Easy |
| Access needed to install | Root | Shell (non root) |
| Recommended file limit | High | Very high |
| Index speed | Very slow | 4-10 MB/sec |
Ref:
PHP Search Engine
Showdown http://www.onlamp.com/pub/a/php/2006/02/16/search-engine-showdown.html?page=2