The Binary Never Touches the LLM.

The binary never touches the LLM. That’s the rule. A 0.5b model drafts a verdict from a structured analysis report — entropy map, PE section layout, suspicious strings, embedded file offsets, API import patterns, hardcoded IPs. A 7b model reads that draft, checks it against the same analysis, corrects what’s wrong, and produces the final report. The binary itself is processed entirely by deterministic code before either model sees anything.

2 model pipeline (PatchSpec)

0 binary execution required

YARA rule generated per analysis

GRIMOIRE is static binary triage. Feed it a PE, ELF, or raw binary and get a plain-English verdict, a prioritized list of where to look first, and a starter YARA rule — in the time it takes the local 7b model to run inference. No sandbox, no Ghidra, no setup beyond Ollama running locally.

If the draft missed the embedded JPEG, the patch model finds it. That’s what the second model is for.

The entropy scan is the first pass. A sliding window of Shannon entropy across the binary identifies encrypted regions (high entropy, statistically flat), compressed sections, and packed payloads. A packed binary has a characteristic entropy profile — low at the unpacker stub, high in the packed payload, sharp transition at the unpack routine. GRIMOIRE marks those boundaries and reports the offsets.

The heuristics layer runs simultaneously: packer signatures, embedded file magic bytes (a JPEG or PE header buried inside a binary is worth noting), suspicious API import combinations (VirtualAlloc + WriteProcessMemory + CreateRemoteThread is a specific sequence that appears in process injection), crypto constants, hardcoded IPs and URLs, and registry persistence paths. These are deterministic pattern matches — no inference, just looking for things that shouldn’t be there.

Both outputs go to the draft model, which produces a quick narrative. The patch model reads the draft alongside the raw analysis and corrects it. The final report is authoritative. The YARA rule comes from the same analysis — string signatures, entropy thresholds, PE section characteristics — assembled into a detection rule you can deploy immediately or use as a starting point.

The honest limitation: ELF support is partial. Section parsing works, but the heuristics and LLM context are oriented toward PE files. Full ELF analysis is on the roadmap and not there yet. For Linux binaries and firmware, the entropy scan runs but the verdict quality degrades.

The tool sits at the end of a detection pipeline: the anomaly scorer identifies a suspicious flow, the protocol analyzer finds a binary payload in the traffic, GRIMOIRE triages the extracted binary. It can also run standalone — point it at any executable and get a triage report.

Full paper →