Home
Services
About
Us
Projects
Contact
Us
Disclaimer
|
bktCluster vs. k-means run times for different
sample sizes and a test data set with 200 5-dimensional random clusters.
N/Cluster |
Total Points |
bktCluster Time (s) |
bktCluster Clusters |
k-means Time (s) |
Replications
required to find all 200 actual centroids |
Speedup |
100 |
20000 |
0.749 |
196 |
10.976 |
50 |
14.654 |
250 |
50000 |
0.748 |
196 |
23.564 |
50 |
31.494 |
500 |
100000 |
0.861 |
199 |
58.080 |
50 |
67.437 |
1000 |
200000 |
1.039 |
200 |
122.436 |
50 |
117.827 |
2500 |
500000 |
1.501 |
200 |
N/A |
N/A |
N/A |
5000 |
1000000 |
2.185 |
200 |
N/A |
N/A |
N/A |
10000 |
2000000 |
3.453 |
200 |
N/A |
N/A |
N/A |
100 |
20000 |
0.726 |
200 |
10.976 |
50 |
15.109 |
250 |
50000 |
0.792 |
200 |
23.564 |
50 |
29.736 |
500 |
100000 |
0.884 |
200 |
58.080 |
50 |
65.710 |
|
bktCluster large sample algorithm (default) using the
default partition split threshold |
|
bktCluster large sample algorithm (default) using user
defined partition split thresholds |
N/A |
k-means time exceeded 5 minutes for the 50 replication
test |
Note 1 |
Cluster counts are not provided for k-means since the
number of clusters, 200, is an input to the algorithm and it
will always find 200 clusters. |
Note2 |
The actual cluster centroids were found using k-means for
all sample sizes at a replication count of 50. This means that
for these
data sets at least 50 replications should be used, regardless
of sample size, since the actual centroids would not be known
in advance. |
|