Home
Services
About
Us
Projects
Contact
Us
Disclaimer
|
bktCluster vs. k-means run times for different
sample sizes and a test data set with 200 5-dimensional random clusters.
N/Cluster |
Total Points |
bktCluster Time (s) |
bktCluster Clusters |
k-means Time (s) |
Replications needed to find first
actual centroid match
|
Speedup |
100 |
20000 |
0.749 |
196 |
2.120 |
10 |
2.830 |
250 |
50000 |
0.748 |
196 |
2.289 |
5 |
3.059 |
500 |
100000 |
0.861 |
199 |
5.945 |
5 |
6.903 |
1000 |
200000 |
1.039 |
200 |
61.204 |
25 |
58.900 |
2500 |
500000 |
1.501 |
200 |
N/A |
N/A |
N/A |
5000 |
1000000 |
2.185 |
200 |
N/A |
N/A |
N/A |
10000 |
2000000 |
3.453 |
200 |
N/A |
N/A |
N/A |
100 |
20000 |
0.726 |
200 |
2.120 |
10 |
2.918 |
250 |
50000 |
0.792 |
200 |
2.289 |
5 |
2.889 |
500 |
100000 |
0.884 |
200 |
5.945 |
5 |
6.726 |
|
bktCluster large sample algorithm (default) with default distance
threshold |
|
bktCluster large sample algorithm (default) with a distance
threshold of 25 |
N/A |
k-means time exceeded 5 minutes for the 50 replication
test (see Note 2) |
Note 1 |
Cluster counts are not provided for k-means since the
number of clusters, 200, is an input to the algorithm and it
will always find 200 clusters. |
Note 2 |
The actual cluster centroids were found using k-means for all sample
sizes at a replication count of 50. This means that for these
data sets at least 50 replications should be used, regardless
of sample size, since the actual centroids would not be known
in advance. The timing results for 50 replications can be found
here. |
Note 3 |
Replication counts used were: 1, 5, 10, 25, and 50. |
|