Biometrics Northwest LLC: bktCluster run times for different sample sizes and the test data set with 200 clusters.

Biometrics Northwest LLC

Performing Data Analysis and Modeling

Home

bktCluster vs. k-means run times for different sample sizes and a test data set with 200 5-dimensional random clusters.

N/Cluster	Total Points	bktCluster Time (s)	bktCluster Clusters	k-means Time (s)	Replications needed to find first actual centroid match	Speedup
100	20000	0.749	196	2.120	10	2.830
250	50000	0.748	196	2.289	5	3.059
500	100000	0.861	199	5.945	5	6.903
1000	200000	1.039	200	61.204	25	58.900
2500	500000	1.501	200	N/A	N/A	N/A
5000	1000000	2.185	200	N/A	N/A	N/A
10000	2000000	3.453	200	N/A	N/A	N/A
100	20000	0.726	200	2.120	10	2.918
250	50000	0.792	200	2.289	5	2.889
500	100000	0.884	200	5.945	5	6.726

	bktCluster large sample algorithm (default) with default distance threshold
	bktCluster large sample algorithm (default) with a distance threshold of 25
N/A	k-means time exceeded 5 minutes for the 50 replication test (see Note 2)
Note 1	Cluster counts are not provided for k-means since the number of clusters, 200, is an input to the algorithm and it will always find 200 clusters.
Note 2	The actual cluster centroids were found using k-means for all sample sizes at a replication count of 50. This means that for these data sets at least 50 replications should be used, regardless of sample size, since the actual centroids would not be known in advance. The timing results for 50 replications can be found here.
Note 3	Replication counts used were: 1, 5, 10, 25, and 50.

For information send email to: info@biometricsnw.com

Last Update: October 20, 2024

Copyright 2005-2024 Biometrics Northwest LLC