If the process of image generation / description was fully reversible we could store image descriptions instead of a list of pixels...
But if one feeds an image description from chatGPT to Dall-e and back in a loop, how many steps does it take to revert to pure noise? (surely this has been tried? but I couldn't find it)
I mean there are billions of perceptually distinct images that map to the same “text description”. So text would generally be both lossy and inefficient.
> instead of a list of pixels
We don’t store lists of pixels. Not even lossless formats like PNG does that. Good ole JPEG has 1:10 - 1:20 compression ratio, ballpark.
“The same screenshot of an iOS app, but the Subscribe button is clear blue and more prominent”
(Although it wouldn’t work directly, since it looks like git runs “text dump” on both independently and then compares the text naively).