Fuzzy Memory
- jimli44
- Dec 14, 2024
- 2 min read
Updated: Dec 17, 2024
I don’t mean the kind we have after a hangover, but the kind powering some of the greatest models we know.

“But do I understand it right though, it means storing something and not getting the exact back?” That is correct.
“Why such storage would be useful?” How about it kind of never runs out of space therefore we don’t need to worry about deleting stuff.
Here is how it works. Assume the data we need to store is in vector form of size 64. The storage memory chosen is a 64x64 matrix, starting empty.
(For the sake of word count KPI) The conventional and logical way would be:
Divide the memory into 64 slots and put the data in one by one.
After the space is full, choose the most outdated or least important vector and replace it with new data.
To retrieve data, simply use the corresponding slot location as address and read the stored data out
In practice, even we are Ok with fixed number of data slots, making an optimal decision about which slot to replace is not often easy.
Now let’s look at the fuzzy way:
Pair each data vector with a 64 random numbers vector. Let’s called this random number vector, the “key”.
Matrix multiply the data and key vectors produces a data mask, of size 64x64.
Add the data mask to memory matrix and storing is done.
To retrieve the data, multiply the memory matrix with the transpose of the key and we get the data back, well the fuzzy version of it.
No matter whether you get it or not at this point, let’s do a real example. Here is a picture, 64x200 pixels, grayscale for simplicity.

So this picture is over 3 times bigger than the storage memory’s size (64x64), let’s squeeze it in and then try to reconstruct the picture using retrieved data.
Following is what the memory matrix looks like, after all 200 data masks overlap on each other, pretty much just noise as expected.

Finally the reconstructed picture using retrieved data. The original letters can be recognized vaguely, just as it says on the tin, fuzzy.

The trick behind is the orthogonality between the keys. Repeated data will be enforced and occasional ones will be drown in noise gradually, never need to worry about what to delete. Paired with NN, which has high tolerance to noisy input, is when this method really shines ☀️
Comments