What you gain in performance, you somewhat sacrifice in flexibility, at least in comparison with OpenEXR.
OpenEXR was designed for modularity, allowing efficient access to individual layers or channels. This is crucial in VFX workflows where only specific passes (like normals or diffuse) might be needed at any one time. This access is possible because EXR stores channels separately and supports tiled or scanline-based access.
The custom compression method Aras proposes - using meshoptimizer on 16K pixel chunks, followed by zstd as a second compressor step - achieves significantly faster decompression and better compression speeds than EXR ZIP, HTJ2K, or JPEG-XL lossless. However, it trades off random access and requires decompressing the entire image at once, which increases memory usage. Individual frames for a VFX production can be multiple gigabytes (i.e. dozens of 32-bit layers at 4K resolution).
The author's proposal is still compelling, and I wonder if a variant could find its way into some sort of archival format.
Reading through author's float compression series, I can't unnotice that in this post plot axis got switched - and it is a lot... easier? more elegant? to have speed on the vertical somehow
That you can chain one compressor with a second reminds me of the QOI (a png-competitor) whose output is often competitive with png (which uses gzip) _before_ it's output gets compressed with something as mundane as zstd or gzip.
My understanding is that chaining compressors is a classic technique for image compression. IIRC PNG and Basis are both implemented as initial transformation/pre-filtering/conditioning pass(es) designed to make the image data more compressible before feeding it to a codec like gzip or zstd.
This definitely works for things that aren't images too. I previously proved that you could improve the compression ratio for WebAssembly significantly by performing lossless transforms on the module before feeding it to gzip or brotli (though the gains are much smaller for brotli since it's so good to begin with): https://github.com/WebAssembly/design/issues/1180
One classic transformation for executable code is to convert memory offsets to absolute addresses for compression. Absolute addresses are more compressible than relative ones.
Probably the single oldest trick in the code compression book.
In sane code, there are more function calls than there are functions. Imagine, now, that there's a function at 0x1337, and it's called from 69 different places in the code.
If we're using relative addresses, this would, of course, result in 69 different addresses to compress - each relative address being the difference between 0x1337 and the position of the code that calls it.
If we're using absolute addresses, we get the same exact address 0x1337 repeated 69 times - which is way more compressor friendly.
Mesh optimizer's performance here is a nice reminder: the state of the art in general purpose compression is hard to beat, but special purpose still has room for improvement.
What you gain in performance, you somewhat sacrifice in flexibility, at least in comparison with OpenEXR.
OpenEXR was designed for modularity, allowing efficient access to individual layers or channels. This is crucial in VFX workflows where only specific passes (like normals or diffuse) might be needed at any one time. This access is possible because EXR stores channels separately and supports tiled or scanline-based access.
The custom compression method Aras proposes - using meshoptimizer on 16K pixel chunks, followed by zstd as a second compressor step - achieves significantly faster decompression and better compression speeds than EXR ZIP, HTJ2K, or JPEG-XL lossless. However, it trades off random access and requires decompressing the entire image at once, which increases memory usage. Individual frames for a VFX production can be multiple gigabytes (i.e. dozens of 32-bit layers at 4K resolution).
The author's proposal is still compelling, and I wonder if a variant could find its way into some sort of archival format.
Reading through author's float compression series, I can't unnotice that in this post plot axis got switched - and it is a lot... easier? more elegant? to have speed on the vertical somehow
at least for me
That you can chain one compressor with a second reminds me of the QOI (a png-competitor) whose output is often competitive with png (which uses gzip) _before_ it's output gets compressed with something as mundane as zstd or gzip.
My understanding is that chaining compressors is a classic technique for image compression. IIRC PNG and Basis are both implemented as initial transformation/pre-filtering/conditioning pass(es) designed to make the image data more compressible before feeding it to a codec like gzip or zstd.
This definitely works for things that aren't images too. I previously proved that you could improve the compression ratio for WebAssembly significantly by performing lossless transforms on the module before feeding it to gzip or brotli (though the gains are much smaller for brotli since it's so good to begin with): https://github.com/WebAssembly/design/issues/1180
Exe filters are cool, I think I first saw the split stream thing in the kkrunchy writeup https://fgiesen.wordpress.com/2011/01/24/x86-code-compressio..., looks like it was first in PPMexe.
Vidvox HAP and Resolume DXV codecs also have a fast lossless compression stage
One classic transformation for executable code is to convert memory offsets to absolute addresses for compression. Absolute addresses are more compressible than relative ones.
Probably the single oldest trick in the code compression book.
Isn’t it the other way around? Absolute addresses are all different while relatives often repeat, leading to better compression.
In sane code, there are more function calls than there are functions. Imagine, now, that there's a function at 0x1337, and it's called from 69 different places in the code.
If we're using relative addresses, this would, of course, result in 69 different addresses to compress - each relative address being the difference between 0x1337 and the position of the code that calls it.
If we're using absolute addresses, we get the same exact address 0x1337 repeated 69 times - which is way more compressor friendly.
Thanks. I was initially thinking of memory addresses for data. Indeed, it's a nice trick for code.
Relevant discussion on Academy Software Foundation Slack where the author, Aras, 1st posted a link to the blog post about 2.5 weeks ago:
https://academysoftwarefdn.slack.com/archives/CMLRW4N73/p175...
Mesh optimizer's performance here is a nice reminder: the state of the art in general purpose compression is hard to beat, but special purpose still has room for improvement.
And not only that, but you can use a special-purpose optimiser for a different domain and somehow get great results!
As I said before JPEG XL lossless performance is really really slow. I am wondering if it is inherently its spec or its implementation.