GRIMOIRE: Static Binary Triage via PatchSpec Two-Model Inference

Abstract

GRIMOIRE is a static binary triage tool for PE and ELF executables. The analysis pipeline has four stages: binary ingestion and format detection, sliding-window Shannon entropy computation, structure heuristics (packer signatures, embedded payload detection, suspicious API scanning, crypto constant matching, hardcoded IOC extraction), and PatchSpec two-model inference via Ollama (draft model: qwen2.5:0.5b; patch model: qwen2.5-coder:7b). Output formats include terminal, markdown, JSON, and a starter YARA rule. The binary is never executed. The LLM receives a structured analysis report as context, not raw bytes.

Overview

GRIMOIRE triages binary executables using static analysis followed by LLM-assisted interpretation. The design principle is separation: deterministic analysis code processes the binary, the LLM interprets the results of that analysis. The binary never reaches the inference step — what reaches the model is a structured summary of what the deterministic code found.

grimoire analyze malware.exe
grimoire analyze firmware.bin --output json --save report.json
grimoire analyze sample.elf --output markdown --save report.md

Requirements: Python 3.12+, uv, Ollama running locally with qwen2.5-coder:7b and qwen2.5:0.5b pulled.

Analysis Pipeline

Stage 1: Binary Ingestion

Format detection reads file magic bytes:

MZ header → PE (Windows executable)
\x7fELF → ELF (Linux executable)
Other → raw binary / firmware

PE parsing (stdlib only, no external dependencies):

Sections: name, virtual address, raw size, characteristics, per-section entropy
Import table: DLL names and imported function names
Export table: exported function names
PE header fields: timestamp, machine type, subsystem, characteristics

String extraction: Printable ASCII runs ≥ 6 characters are extracted from the binary. Strings are classified: URLs, IP addresses, registry paths, file paths, base64 candidates, crypto material headers (BEGIN PRIVATE KEY, etc.).

Stage 2: Entropy Scan (entropy.py)

A sliding window of configurable size (default 256 bytes) computes Shannon entropy at each byte offset:

H(window) = -Σ p(b) log₂ p(b)

where the sum is over all observed byte values b in the window.

Entropy interpretation:

Range	Interpretation
0.0 – 2.0	Repeated data, null padding
2.0 – 5.0	Structured data, code
5.0 – 7.0	Compressed or encrypted
7.0 – 8.0	Encrypted (near-random)

Regions above the configurable high-entropy threshold are flagged as suspicious. The entropy profile across the binary is included in the analysis context passed to the LLM.

Stage 3: Structure Heuristics (heuristics.py)

A library of pattern-matching checks run against the extracted binary data:

Packer signatures: Known packer stubs (UPX, MPRESS, Themida, etc.) are detected by magic bytes and section name patterns.

Embedded payload detection: Magic byte scanning for files embedded within the binary:

PE headers (MZ)
JPEG (\xff\xd8\xff)
ZIP/PKZip
PDF
Other common formats

Offset of each embedded payload is reported.

Suspicious API combinations: Import table analysis flags combinations associated with specific attack techniques:

Process injection: VirtualAlloc + WriteProcessMemory + CreateRemoteThread
Credential dumping: LSASS-related imports
Persistence: RegSetValueEx + startup path strings
Network: raw socket creation outside standard networking libraries

Crypto constants: Known constants for AES (S-box, round constants), RC4, and other algorithms are matched against the binary body.

IOC extraction: Hardcoded IPv4/IPv6 addresses, URLs, domain names, and registry paths are extracted and reported.

ELF heuristics (partial): Section presence/absence checks, unusual program header counts, stripped symbol tables. Full ELF section semantic analysis is deferred to v0.2.

Stage 4: PatchSpec Two-Model Inference

PatchSpec is a two-model speculative inference pipeline adapted for binary triage.

Draft model (qwen2.5:0.5b, ~500M parameters):

Receives the structured analysis context: entropy summary, PE sections, heuristic findings, extracted strings
Produces a fast initial triage narrative: verdict, rationale, where to look first
Latency: 1–3 seconds on consumer hardware

Patch model (qwen2.5-coder:7b, ~7B parameters):

Receives: the same structured analysis context + the draft model’s output
Verifies the draft against the raw analysis
Corrects errors (missed findings, overstated confidence, incorrect attribution)
Expands with additional findings the draft missed
Produces the authoritative final report

The patch model’s output is authoritative. The draft exists to seed the patch model’s reasoning and reduce total inference time by providing a starting point rather than generating from a cold start.

Example output:

Verdict: Malicious
Confidence: High
Rationale: Hardcoded C2 IP addresses and embedded JPEG consistent
with screenshot exfiltration. Process injection API combination
(VirtualAlloc + WriteProcessMemory + CreateRemoteThread) present.

Where to Look First:
1. Embedded JPEG at offset 0x00002d42 — likely screenshot payload
2. Base64 blob at strings offset 0x00001a30 — possible encoded config
3. Gzip stream at offset 0x00003b46 — decompress for second-stage payload

YARA Rule Generation (yara_gen.py)

After analysis, GRIMOIRE generates a starter YARA rule from the findings:

rule GRIMOIRE_auto_sample {
    meta:
        generated_by = "GRIMOIRE"
        verdict = "malicious"
    strings:
        $s1 = "192.168.1.105" ascii
        $s2 = "VirtualAlloc" ascii
        $s3 = "CreateRemoteThread" ascii
    condition:
        uint16(0) == 0x5A4D and
        filesize < 10MB and
        2 of ($s*)
}

The generated rule is a starting point, not a finished detection. It captures the most distinctive strings and structural features identified during triage. Production deployment requires analyst review and testing against benign corpora.

Supported Formats

Format	Detection	Sections	Entropy	Heuristics
PE/EXE (x86/x64)	Full	Full	Full	Full
ELF	Magic only	Partial	Full	Partial
Raw binary / firmware	No	No	Full	Partial

Full ELF section parsing and ELF-specific heuristics are planned for v0.2.

Output Formats

Text (default): Rich terminal output with colored tables, anomaly cards, and a formatted report panel.

Markdown (--output markdown): Structured .md suitable for analyst notes, issue trackers, or incident reports.

JSON (--output json): Machine-readable. Schema:

{
  "grimoire_version": "0.1.0",
  "analyzed_at": "...",
  "file": {
    "name": "...", "size": 0, "format": "PE",
    "overall_entropy": 6.8, "arch": "x64",
    "pe_sections": [{"name": ".text", "entropy": 6.2}, ...]
  },
  "suspicious_regions": [{"offset": "0x2d42", "type": "embedded_jpeg"}],
  "findings": [...],
  "report": {
    "draft": "...",
    "final": "...",
    "timing_ms": {"draft": 1200, "patch": 8400}
  }
}

Integration Position

GRIMOIRE is the binary triage layer in the minidet detection stack. It runs async via a token-bucket rate limiter, triggered only when PE, ELF, or ZIP magic bytes appear in a network flow payload. This keeps expensive 7b inference from running on every flow while ensuring binaries extracted from network traffic receive full triage.

GRIMOIRE can also be used standalone for offline triage of any binary.

Roadmap

v0.2: Full ELF section parsing, byte-level model fine-tuning on binary corpora
v0.3: PCAPR integration — feed protocol structure analysis as model context for network capture triage
v0.4: Ghidra plugin