I’ve been playing around with 3-dimensional binary data visualization, inspired by ..cantor.dust.. The results have been interesting, fun, and in some cases even useful.
My implementation is fairly rudimentary: every three bytes in a file is treated as an x, y, z coordinate for a data point in a 3D plane where each axis extends from 0 through 255. This means that if the file has data that contains a certain range of byte values (e.g., printable ASCII characters), those bytes will generate coordinates in the same general area of the 3D plot.
Since non-random data types (executable code, strings, etc) will have an uneven distribution of bytes, different data types will generate different visual patterns. Here is one of my favorites, which was created from a file containing AVR32 executable code:
Different file types have different byte distributions, and thus create different patterns in the 3D plot. We can even start to identify file types based on their corresponding visualizations:
More importantly, we begin to see patterns associated with certain types of data.
Printable ASCII characters, for example, are represented by the bytes 32 – 126, so they are usually grouped in a small box that is above 0 on each axis, but also at or below the midpoint of each axis.
We can see a grouping of ASCII characters clearly in this otherwise random (compressed) firmware image:
Likewise, executable code tends to produce thin horizontal and vertical lines.
This file contains unknown data, but its byte patterns suggest that it is primarily composed of executable code:
We can also see in the above visualization that there is little or no printable ASCII characters. This gives us a good idea of what to look for – and what not to look for – when performing a deeper analysis of this data.
I’ve integrated the 3D visualization into binwalk; just use the –3D option:
$ binwalk --3D file.bin
It’s still pretty experimental, but if you want to play with it, grab the latest binwalk code from the git repository.