Biometrics Northwest LLC

Performing Data Analysis and Modeling

Home

Services

About Us

Projects

Contact Us

Disclaimer

bktCluster vs. k-means run times for different sample sizes and a test data set with 200 5-dimensional random clusters.

N/Cluster Total
Points
bktCluster
Time (s)
bktCluster
Clusters
k-means
Time (s)
Replications
required
to find all
200 actual
centroids
Speedup
100 20000 0.749 196 10.976 50 14.654
250 50000 0.748 196 23.564 50 31.494
500 100000 0.861 199 58.080 50 67.437
1000 200000 1.039 200 122.436 50 117.827
2500 500000 1.501 200 N/A N/A N/A
5000 1000000 2.185 200 N/A N/A N/A
10000 2000000 3.453 200 N/A N/A N/A
100 20000 0.726 200 10.976 50 15.109
250 50000 0.792 200 23.564 50 29.736
500 100000 0.884 200 58.080 50 65.710

  bktCluster large sample algorithm (default) using the default partition split threshold
  bktCluster large sample algorithm (default) using user defined partition split thresholds
N/A k-means time exceeded 5 minutes for the 50 replication test
Note 1 Cluster counts are not provided for k-means since the number of clusters, 200, is an input to the algorithm and it will always find 200 clusters.
Note2 The actual cluster centroids were found using k-means for all sample sizes at a replication count of 50. This means that for these data sets at least 50 replications should be used, regardless of sample size, since the actual centroids would not be known in advance.

Back to top

For information send email to: info@biometricsnw.com

Last Update: October 20, 2024

Copyright 2005-2024 Biometrics Northwest LLC