Tuesday, February 12th, 2008...2:15 am

Data clustering methods

Jump to Comments

For hierarchical clustering have a look at this resource

Browsing the web I met interesting open source framework for search clustering engines.
here is their table of algos (Original source)

 

Algorithm Author Speed s* Hierarchical
clustering
Other features Example results
100 200 400
FuzzyAnts Steven Schockaert 2.17 8.70 16.93 yes london
HAOG-STC Karol GoЕ’embniak 0.04 0.11 0.28 yes london
Lingo** Stanislaw Osinski 0.34 0.52 0.84 no multilingual clustering london
Rough k-means Ngo Chi Lang 1.38 6.76 27.73 no london
STC Oren Zamir
(impl: Dawid Weiss)
0.04 0.10 0.23 no london
Lingo3G*** StanisЕ’aw OsiЕ”ski
(Carrot Search)
0.03 0.06 0.13 yes multilingual clustering, synonyms, advanced tuning, scalability (5000 snippets in 1.3s*) london

 

 

*) Clustering speed measurements were done for 100, 200, 400 snippets
downloaded from Yahoo! for query ‘london’, using the Carrot2
standalone GUI application. Benchmark environment: Pentium M 1.3 GHz, 768
MB RAM, Windows XP. Java Virtual Machine: Sun JDK 1.4.2, JVM switches:
-Xmx512m -Xms128m -XX:NewRatio=1 -server. Time presented in the table is
an average of 75 runs, for each algorithm time measurement was followed
by 25 untimed warm-up runs.

  **) Lingo is the default clustering algorithm used in the
Carrot2 live demos.

  ***) Lingo3G is a
commercial document clustering
engine
and is not available in the Open
Source part of Carrot2. Please contact
Carrot Search for details.

Leave a Reply