Biometrics Northwest LLC

Performing Data Analysis and Modeling

Home

Services

About Us

Projects

Contact Us

Disclaimer

bktCluster vs. k-means run times for different sample sizes and a test data set with 200 5-dimensional random clusters.

N/Cluster Total
Points
bktCluster
Time (s)
bktCluster
Clusters
k-means
Time (s)
Replications
needed to find
first actual
centroid match
Speedup
100 20000 0.749 196 2.120 10 2.830
250 50000 0.748 196 2.289 5 3.059
500 100000 0.861 199 5.945 5 6.903
1000 200000 1.039 200 61.204 25 58.900
2500 500000 1.501 200 N/A N/A N/A
5000 1000000 2.185 200 N/A N/A N/A
10000 2000000 3.453 200 N/A N/A N/A
100 20000 0.726 200 2.120 10 2.918
250 50000 0.792 200 2.289 5 2.889
500 100000 0.884 200 5.945 5 6.726

  bktCluster large sample algorithm (default) with default distance threshold
  bktCluster large sample algorithm (default) with a distance threshold of 25
N/A k-means time exceeded 5 minutes for the 50 replication test (see Note 2)
Note 1 Cluster counts are not provided for k-means since the number of clusters, 200, is an input to the algorithm and it will always find 200 clusters.
Note 2 The actual cluster centroids were found using k-means for all sample sizes at a replication count of 50. This means that for these data sets at least 50 replications should be used, regardless of sample size, since the actual centroids would not be known in advance. The timing results for 50 replications can be found here.
Note 3 Replication counts used were: 1, 5, 10, 25, and 50.

Back to top

For information send email to: info@biometricsnw.com

Last Update: October 20, 2024

Copyright 2005-2024 Biometrics Northwest LLC