Binary vector embeddings are so cool

80 points by emschwartz 3 days ago | 12 comments

That is crazy. So how do you use these right now? Do you store vectors on disk and iterate over them, or something else?

How do you do the search?

petra 2 days ago | root | parent | next |

Since hamming distance calculations take relatively few gates, its might be possible to compute those in-memory(ram/flash) to achieve massive amount of paralellism(for ex, a specific hamming distance calculations block for each stored vector) and at much lower power.

This could be useful for web-scale embeddings search.

emschwartz 2 days ago | root | parent | prev |

Yeah, that's exactly right. You take the embedding of your query and then compare it to the embeddings of your documents (or chunks of documents).

You can use vector databases that implement the Approximate Nearest Neighbors (ANN) algorithm, or if you don't have too many documents you can just brute force the similarity comparison.

I'm not sure at what point you _really_ need or want a vector database, but anecdotally, calculating the Hamming distance for a couple thousand vectors seems pretty negligible.

mosselman a day ago | root | parent |

Very cool thanks. I am using pgvector right now. I will run into scaling issues at some point I suspect/hope.

I’ve been thinking about some setup on object storage. The data I have could be grouped per account. If you are able to compress the total index size to 1% of traditional vectors you can probably get a long way with fetching the binaries from s3 only when necessary to some sort of warm cache location.

Or just get a hetzner machine with 20TB of storage and not worry about it I guess

kevinventullo 2 days ago | prev | next |

It got me thinking, what might it look like to natively train a binary quantized embedding model? You can’t do calculus per se on {0,1}, but maybe you could do something like randomly flip bits with a probability weighted by the severity of the error during backprop… anyway, I’m sure there’s plenty of literature about this.

janalsncm 2 days ago | root | parent | next |

Probably just an extreme version of quantization-aware training? During training you round the prediction to the range you want, but keep it as a float.

Since rounding isn’t differentiable there’s fancy techniques to approximate that as well.

> QAT backward pass typically uses straight-through estimators (STE), a mechanism to estimate the gradients flowing through non-smooth functions

https://pytorch.org/blog/quantization-aware-training/

alok-g 2 days ago | root | parent | prev |

You may find this interesting:

https://writings.stephenwolfram.com/2024/08/whats-really-goi...

xfalcox 2 days ago | prev | next |

I shipped this in Discourse a couple of weeks ago, and it's incredible tech. And fully supported in PostgreSQL via pgvector.

2-3-7-43-1807 2 days ago | prev | next |

Is it fair to say that Hamming distance relates to cosine similarity in this context like ReLU relates to the Sigmoid function for neural networks?

titusz 2 days ago | prev | next |

Yes, binary embeddings are cool indeed. And because they can be so small they even double down as powerful cluster identifiers. Built this a while ago: https://huggingface.co/spaces/iscc/iscc-sct

nalzok 2 days ago | prev | next |

I wonder what would happen if we quantized each dimension to 0.5 (or even fewer) bits instead of 1, i.e., taking 2 (or more) scalar components at a time and mapping them to 0 or 1 based on some carefully designed rules.

2 days ago | prev |

[deleted]