Visualizing binary files with ImHex's DSL, the "pattern language"
(xy2i.blogspot.com)185 points by xy2_ 6 days ago | 16 comments
185 points by xy2_ 6 days ago | 16 comments
dloss 6 days ago | prev | next |
Other tools for parsing and analyzing binary data are listed here: https://github.com/dloss/binary-parsing
jcul 6 days ago | prev | next |
Great write up!
I looked at ImHex a good while back and I think I had some runtime issues or maybe even compilation issues and didn't dig deeper. Even though the definition language piqued my curiosity.
These days I tend to just use xxd, bless, ghex, or seldom wxHexEditor, depending on what I need. But ImHex looks really powerful, like it could replace all the GUI ones. I'm looking forward to giving it another go tomorrow.
Though these days I spend most of my time in wireshark, which is kind of a hex viewer in a way.
How does it manage with huge files? Does it try to load the entire thing into memory. I remember wxHexEditor being good for that, and even being able to open block devices directly and process memory IIRC. Might be getting mixed up with HxD.
The decompression and combining compressed with decompressed sections looks very cool. Is the decompression in memory or written to disk?
// TagRecord Tags[while(!std::mem::eof())];
This loop based length stuff is very cool too, though for large files I'd imagine it could be slow as it will need to iterate through all records to determine the offset for records at the end of the file.
To be fair, wireshark / pcap files have this problem too.
GuB-42 5 days ago | root | parent | next |
I got to try it a while back and same problem, it uses very recent versions of C++ which my distro didn't support. I finally got the AppImage, but I had a few breaking bugs, making it too unreliable for actual work. But I also noticed that the project is quite active, so maybe most of the issues are fixed now. Maybe I should give it a try.
For now, my hex editor of choice is 010editor. Not free software, but the best for my application. Like ImHex, it supports huge files and block devices, and it has a powerful definition language.
viraptor 6 days ago | root | parent | prev |
> though for large files I'd imagine it could be slow as it will need to iterate through all records to determine the offset for records at the end of the file.
Yeah, it's not doing lazy evaluation, so you need to watch out. It's probably not the solution you want for (for example) looking at 500GB disk images.
rixtox 6 days ago | prev | next |
Kind of related, a tool that allows you to hand write ASCII-art-annotated hex dump files, while also able to generate the original binary file from such text file: https://github.com/netspooky/xx/blob/main/examples/elf.xx
octagons 6 days ago | prev | next |
I wasn’t aware that ImHex had this feature - perhaps I’ll try it!
I’ve been singing the praises of 010 Editor for years specifically because of its template and scripting features, the former of which is nearly identical to this DSL.
dannas 6 days ago | prev | next |
There's an ImHex WebAssembly build accessible online at: https://web.imhex.werwolv.net/.
netsharc 6 days ago | prev | next |
Wow, I've never thought of it, but "syntax"-highlighting for binary files would be awesome.. e.g. "these bytes indicate the beginning of the next frame" (when talking about MP3/video files), maybe with mouseover support where it says e.g. "this value at this location indicates it's a $FOO variant of the file".
Anyone know of such a tool?
tripflag 6 days ago | root | parent | next |
Kaitai Struct has an online demo which basically does this; https://ide.kaitai.io/
pie_flavor 6 days ago | root | parent | prev | next |
I deal with a lot of cryptographic documents (e.g. public keys) and https://lapo.it/asn1js/ is a godsend for making sense of them. You just paste in hex or pem, and it shows the full deconstructed format along with two-way 'syntax highlighting' where if you hover over part of the deconstruction it highlights the equivalent part of the binary data. Hit the 'load' button for a representative example.
frabert 6 days ago | root | parent | prev |
010 editor has something like this. Okteta too. They both use DSLs to represent formats
AstroJetson 5 days ago | root | parent |
+1 for the 010 editor. It comes with a number of pre-built binary templates, I use the sqlite one all the time. A recent upgrade got syntax coding by the tree-sitter environment, which is a great upgrade. Highly recommended.https://www.sweetscape.com/010editor/
crabbone 6 days ago | prev | next |
Whoa, a flashback! Many, many years ago I worked on online presentation software that dynamically assembled Flash clips from clip-arts / user-provided graphics and text. There was a lot of parsing and reassembling of SWF involved. There used to be Swfmill library, but it didn't have enough stuff in it to deal with animations / transformations. Sigh that was a fun project.
My approach at the time was to have Org-mode tables detailing the meaning and composition of chunks of binary data. That wasn't nearly as comfortable as this editor seems to be, but I was told that it looked very impressive to people looking over my shoulder. Just like in those "hackers movies" where the screen is filled with gibberish sequences of digits and other random characters :)
amszmidt 6 days ago | prev | next |
Wireshark has a similar feature where you can open an ELF, or PNG and look at the sections. The LUA interface isn't to shabby either to write such "dissectors".
fragmede 6 days ago | prev |
Looks slightly more expressive than Kaitai's binary format DSL.
genewitch 6 days ago | next |
I have a strong memory that AFL - american fuzzy lop the binary fuzzer had a feature similar to what this was doing based on the highlighted portions and screenshots. It wasn't the AFL status screen, it was (may have been a) third party app, and it would color code parts of the input files based on the outputs or whatever from afl's processing.
For example, there was a color key that explained that say, purple meant "magic bytes", like "0x4a46494600" for JFIF0, and if any part of the input file caused errors it meant it was probably a checksum and needed to be "fixed" so afl could properly fuzz all the functions in the source code.
I'm not super in to fuzzing or that realm anymore, so i doubt i could describe it better than i did, here. I clicked through to see if someone have leveraged the AFL stuff for use in another tool, which would be cool.
edit: i think it was afl-analyze - i had a go at the source code for aflplusplus:
> A nifty utility that grabs an input file and takes a stab at explaining its structure by observing how changes to it affect the execution path.
> Another tool in AFL++ is the afl-analyze tool. It takes an input file, attempts to sequentially flip bytes and observes the behavior of the tested program. It then color-codes the input based on which sections appear to be critical and which are not; while not bulletproof, it can often offer quick insights into complex file formats.