Algorithm based on LLMs doubles lossless data compression rates

NoSpotOfGround · 4 days ago

Algorithm based on LLMs doubles lossless data compression rates

@skip0110@lemm.ee · 4 days ago

This is not new knowledge and predates the current LLM fad.

See the Hutter prize which has had “machine learning” based compressors leading the ranking for some time: http://prize.hutter1.net/

It’s important to note when applied to compressors, the model does produce a code (aka encoding) that exactly reproduces the input. But on a different input the same model is unlikely to produce an impressive compression.

@Dragonstaff@leminal.space · 4 days ago

Can you define “compressors” here? (Google was unhelpful.)

@skip0110@lemm.ee · 4 days ago

I could have said it better.

I mean compressor as half of a compression/decompression algorithm. The better way I should have worded it is: when you apply machine learning to a compression problem, you can do it lossless…your uncompressed output will be identical to the input, every time.

“NNCP” is a good search term to learn more, specifically about how this works.