An optimization functionality as in the cdimage.exe/oscdimg.exe tool from Microsoft would be useful to grab few bytes when building our images. I see the functionality working like that:
- When adding files, sort them by (partial) hash.
- If two files differ by this (partial) hash, then they are different.
- If two files have the same (partial) hash, they can be identical. Then perform a binary comparison to really know whether they are the same or different.
- If they are the same, the entries in the ISO table should point to the same data, otherwise they point to the different files.
All the nice things are in the (partial) hash computation and the binary comparison. In comparison, the cdimage.exe/oscdimg.exe utility from Microsoft only performs a partial (on the first 64kB of the files) MD5 hash check (option -o) so that if the first bytes of two files are the same but the rest is different, you can get problems, or, it does a full binary comparison (option -oc) and is obviously slower.