He boiled all dimensionality reduction down to two approaches: matrix decomposition and k-nearest neighbors.
I'm seeing the newest contenders for top algorithms leaning harder into the KNN dynamics, and less into the more complex maths of UMAP. (PaCMAP[1] and LocalMAP[2], which run via an attractive-repulsive force simulation based on high dimensional KNN graph)
Based on the end of his talk, Leland seems to generally align that this is where the next innovations are: KNNs.
Some interesting provocations are coming out:
Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective (Apr 2025)
https://arxiv.org/html/2410.16608v2
EDIT: that last paper is not as relevant as I recall on reviewing it further. So the point is mainly the take-away from Leland's expertise in his video
I'm not sure if I'm understanding correctly, but it reminds me of the kernel trick. The distances between the training samples and a target sample are computed, the distances are scaled through a kernel function, and the scaled distances are used as features.
I agree, they should have said it, and I had to go looking too. but it may be that they're just leaving this to others, and talking to ppl who already think about KNN's?
But yeah, my own research (not just that link) says HNSW's get KNN complexity down to O(log n) for search, and O(n log n) for building the whole index.
This reminds me the talk by Leland McInness (employed by a branch of Canadian NSA and creator of UMAP) on rethinking dimensional reduction:
Rethinking Unsupervised Learning https://youtu.be/v_jcu7rUqqE
He boiled all dimensionality reduction down to two approaches: matrix decomposition and k-nearest neighbors.
I'm seeing the newest contenders for top algorithms leaning harder into the KNN dynamics, and less into the more complex maths of UMAP. (PaCMAP[1] and LocalMAP[2], which run via an attractive-repulsive force simulation based on high dimensional KNN graph)
Based on the end of his talk, Leland seems to generally align that this is where the next innovations are: KNNs.
Some interesting provocations are coming out:
Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective (Apr 2025) https://arxiv.org/html/2410.16608v2
[1]: https://youtu.be/sD-uDZ8zXkc
[2]: https://youtu.be/vmSUQ-4LT4Y
EDIT: that last paper is not as relevant as I recall on reviewing it further. So the point is mainly the take-away from Leland's expertise in his video
I'm not sure if I'm understanding correctly, but it reminds me of the kernel trick. The distances between the training samples and a target sample are computed, the distances are scaled through a kernel function, and the scaled distances are used as features.
https://en.wikipedia.org/wiki/Kernel_method
Fast? The author should remind us the computational complexity of calculating the kNN...
I agree, they should have said it, and I had to go looking too. but it may be that they're just leaving this to others, and talking to ppl who already think about KNN's?
http://ann-benchmarks.com/index.html
But yeah, my own research (not just that link) says HNSW's get KNN complexity down to O(log n) for search, and O(n log n) for building the whole index.