Flash-KMeans: Fast and Memory-Efficient Exact K-Means

(arxiv.org)

93 points | by matt_d 3 days ago

4 comments

frakt0x90 29 minutes ago
They created this in service of their video generation model which "clusters and reorders tokens based on semantic similarity using k-means.":
http://arxiv.org/pdf/2505.18875
wood_spirit 3 hours ago
Does this have corresponding speed ups or memory gains for normal CPUs too? Just thinking about all the cups of coffee that have been made and drunk while scikit-learn kmeans chugs through a notebook :)
[-]
- snovv_crash 2 hours ago
  For CPU with bigger K you would put the centroids in a search tree, so take advantage of the sparsity, while a GPU would calculate the full NxK distance matrix. So from my understanding the bottleneck they are fixing doesn't show up on CPU.
  [-]
  - xavxav 2 hours ago
    search trees tend not to scale well to higher dimensions though, right?
    from what I've seen I had the impression that Yinyang k-means was the best way to take advantage of the sparsity.
- openclaw01 1 hour ago
  [dead]
matrix2596 3 hours ago
looks like flash attention concepts applied to kmeans, nice speedup results
maiconburn 21 minutes ago
[dead]