Feature Extraction with KNN | Svelte Hacker News

patcon a day ago

This reminds me the talk by Leland McInness (employed by a branch of Canadian NSA and creator of UMAP) on rethinking dimensional reduction:

Rethinking Unsupervised Learning https://youtu.be/v_jcu7rUqqE

He boiled all dimensionality reduction down to two approaches: matrix decomposition and k-nearest neighbors.

I'm seeing the newest contenders for top algorithms leaning harder into the KNN dynamics, and less into the more complex maths of UMAP. (PaCMAP[1] and LocalMAP[2], which run via an attractive-repulsive force simulation based on high dimensional KNN graph)

Based on the end of his talk, Leland seems to generally align that this is where the next innovations are: KNNs.

Some interesting provocations are coming out:

Assessing and improving reliability of neighbor embedding methods: a map-continuity perspective (Apr 2025) https://arxiv.org/html/2410.16608v2

[1]: https://youtu.be/sD-uDZ8zXkc

[2]: https://youtu.be/vmSUQ-4LT4Y

EDIT: that last paper is not as relevant as I recall on reviewing it further. So the point is mainly the take-away from Leland's expertise in his video

ibgeek a day ago

I'm not sure if I'm understanding correctly, but it reminds me of the kernel trick. The distances between the training samples and a target sample are computed, the distances are scaled through a kernel function, and the scaled distances are used as features.

https://en.wikipedia.org/wiki/Kernel_method

esafak a day ago

Fast? The author should remind us the computational complexity of calculating the kNN...

patcon a day ago

I agree, they should have said it, and I had to go looking too. but it may be that they're just leaving this to others, and talking to ppl who already think about KNN's?
http://ann-benchmarks.com/index.html
But yeah, my own research (not just that link) says HNSW's get KNN complexity down to O(log n) for search, and O(n log n) for building the whole index.