White Paper AIResearchnetwork securityintrusion detectionmalware detection

minidet: A Queue-Based Multi-Signal Network Threat Detection Sidecar

Architecture and design rationale for minidet β€” a Suricata sidecar that runs four signal layers (LIMEN, BPB, GRIMOIRE, PCAPR) at different costs and latencies, correlates verdicts across destination IPs, and emits enriched EVE JSON.

May 11, 2026 Β· 12 min read Β· zer0contextlost
Abstract

minidet is a network threat detection sidecar for Suricata. It runs four signal layers against every TCP/UDP flow: LIMEN (supervised cosine similarity, inline, ~200Β΅s per flow), BPB (unsupervised byte-level novelty, async, token-bucket sampled), GRIMOIRE (static binary triage, async, triggered only on binary payloads), and PCAPR (deep protocol analysis, triggered only by correlator-initiated investigations). A weighted scoring system combines signals; a sliding-window correlator groups scores by destination IP and fires investigations when thresholds are crossed. All output is EVE JSON, compatible with any SIEM already ingesting Suricata alerts.

Overview

minidet enriches an existing Suricata deployment without replacing it. The output format is EVE JSON β€” the same format Suricata already produces β€” so a SIEM that reads Suricata alerts reads minidet alerts without configuration changes.

The design principle is cost-proportional analysis: cheap signals run on everything, expensive signals run only when warranted.


Signal Layers

SignalSourceLatencyTriggers on
LIMENSupervised cosine similarity~200Β΅sEvery flow
BPBUnsupervised byte-level noveltyAsyncToken-bucket sampled
GRIMOIREStatic binary triage10–30sPE/ELF/ZIP magic in payload
PCAPRDeep protocol analysisSecondsCorrelator investigation

LIMEN (Inline)

LIMEN runs synchronously in the FlowRouter on every flow. It encodes the raw flow bytes into a 256-dim fingerprint and compares against the PatternStore. The verdict (MALICIOUS, BENIGN, UNKNOWN) and confidence are written to the flow’s envelope immediately.

Cost: ~200Β΅s per flow. Suitable for inline deployment at typical enterprise traffic rates. Not suitable for 1Gbps without FAISS migration (see LIMEN paper).

BPB Scorer (Async, Sampled)

The Substrate-PCAP model assigns a bits-per-byte score to each flow. High BPB indicates structural novelty relative to benign training data β€” a complementary signal to LIMEN’s cosine similarity.

BPB scoring runs async via a token-bucket queue, capped at a configurable rate (default 50 flows/s). This prevents the unsupervised model from saturating CPU when UNKNOWN rate spikes. The BPB result arrives asynchronously and is merged into the flow’s envelope by the EnvelopeStore.

GRIMOIRE (Async, Conditional)

GRIMOIRE runs async via a separate token-bucket queue, triggered only when a flow payload contains PE, ELF, or ZIP magic bytes. Binary inference is expensive (10–30 seconds for the 7b patch model); running it on every flow is not viable. Triggering on magic bytes limits GRIMOIRE to flows that actually contain an extractable binary.

PCAPR (Investigation Only)

PCAPR is the most expensive signal and runs only when the Correlator opens a formal investigation. It never runs on individual flows. When an investigation fires, the InvestigationWorker slices the relevant flows from the pcap and passes them to PCAPR for deep protocol analysis. Results are cached by (pcap_path, dst_ip) β€” repeated investigations against the same destination do not re-run PCAPR.


Weighted Scoring

The EnvelopeStore aggregates enrichments per flow_id and computes a weighted score when all expected signals have arrived (or the 60-second TTL expires):

SignalWeightRationale
grimoire_malicious3Most specific β€” rarely fires, usually correct
limen_malicious2Supervised match to known-malicious pattern
pcapr_beacon2Highly regular timing is strong C2 indicator
pcapr_tls_known_bad2JA3 match to known-bad fingerprint
pcapr_dns_tunnel2DNS exfiltration scoring
bpb_anomaly1Structural novelty β€” real signal, lower specificity
limen_unknown1No reliable match β€” warrants attention, not conviction

Verdict thresholds:

  • Score β‰₯ 4 β†’ MALICIOUS
  • Score β‰₯ 2 β†’ SUSPICIOUS
  • Score = 1 β†’ UNKNOWN
  • Score = 0 β†’ BENIGN

Correlator

The Correlator watches the stream of emitted EVE events and groups them by destination IP using pluggable strategies:

StrategyGroups byUse case
exact_ipSingle destination IPTargeted C2 communication
subnet_24/24 subnetScanning, lateral movement
subnet_16/16 subnetBroad scanning campaigns
ASNAutonomous systemInfrastructure-level attribution

Each strategy maintains a sliding window of event-time (not wall-clock) score sums. When the sum crosses the threshold within the window, an InvestigationCase is fired.

Default thresholds: 3 flows / score sum β‰₯ 4 within 120 seconds (exact_ip); 5 flows / score sum β‰₯ 6 within 120 seconds (subnet_24).

Time windowing uses event-time with watermarks. GRIMOIRE enrichment can arrive 30+ seconds after the flow that delivered the binary; wall-clock bucketing would drop late-arriving signals. Event-time ensures all enrichments are counted in the window they belong to.


Investigation Workflow

When the Correlator fires:

  1. InvestigationWorker receives the InvestigationCase (destination IP, pcap path, correlated flow IDs)
  2. Scapy slices the relevant flows from the pcap by destination IP
  3. PCAPR analyzes the slice β€” beacon detection, TLS fingerprinting, malware family attribution, state machine inference
  4. Results are cached by (pcap_path, dst_ip)
  5. An investigation_report EVE event is emitted into the output stream

Investigation report EVE JSON:

{
  "event_type": "investigation_report",
  "case_id": "inv-...-exact_ip-185.220.101.5",
  "correlation": {
    "dst_ip": "185.220.101.5",
    "flow_count": 3,
    "score_sum": 9
  },
  "pcapr": {
    "beacon": {"interval_mean": 60.1, "regularity": "highly regular"},
    "tls": {"ja3_hash": "...", "sni": "updates.example.com"},
    "tls_known_bad": true,
    "tls_family": "CobaltStrike-default",
    "family_matches": [
      {"family": "CobaltStrike", "confidence": 0.95, "matching_signals": ["ja3", "beacon_interval"]}
    ]
  },
  "verdict": "malicious",
  "verdict_reason": "LIMEN family=cobalt_strike_ssload (9 neighbor votes); JA3=CobaltStrike-default; 3 high-score flows to 185.220.101.5 within 120s"
}

EVE JSON Schema β€” Flow Event

{
  "event_type": "mini_detective",
  "flow_id": "sha256(src_ip+dst_ip+dst_port+proto+floor(start_ts))",
  "timestamp": "2026-05-12T19:00:00Z",
  "src_ip": "10.0.1.42",
  "src_port": 54321,
  "dest_ip": "185.220.101.5",
  "dest_port": 443,
  "proto": "TCP",
  "limen": {
    "verdict": "malicious",
    "confidence": 1.0,
    "top_neighbors": ["cobalt_strike_ssload_2024-04-18"]
  },
  "bpb": {
    "score": 7.7,
    "anomaly": true,
    "n_bytes": 512,
    "threshold": 2.0
  },
  "grimoire": {
    "report": {"final": "Verdict: Malicious β€” hardcoded C2 IP, process injection APIs"}
  },
  "score": 6,
  "verdict": "malicious"
}

The flow_id join key is sha256(src_ip + dst_ip + dst_port + proto + floor(start_ts)). Host IP is not used as the join key β€” NAT, CGNAT, and DHCP churn break host_ip joins. The 5-tuple is stable across the lifetime of a flow.


Component Map

minidet/
  router.py              FlowRouter β€” pipeline entry point; LIMEN inline, queues async
  envelope.py            EnvelopeStore β€” collects enrichments, computes weighted score
  correlator.py          Sliding-window correlator; fires InvestigationCase
  investigation_worker.py PCAPR investigation queue and caching
  pcap_worker.py         PCAPR subprocess wrapper + signal extraction
  bpb_scorer.py          BpbScorer wrapper around SubstrateScorer
  grimoire_worker.py     GRIMOIRE binary analysis wrapper
  capture.py             CaptureWorker β€” live packet capture via Scapy

Current Status and Roadmap

Operational today (offline):

  • LIMEN scoring against saved pcaps
  • GRIMOIRE triage of extracted binaries
  • PCAPR offline protocol analysis
  • Correlator and EnvelopeStore
  • EVE JSON output

Required for live 1Gbps deployment:

  • LIMEN PatternStore β†’ FAISS (Phase 0)
  • bytestrand monorepo unification to eliminate model code divergence (Phase A)
  • CaptureWorker dpkt/af-packet implementation for live interface capture (Phase B)
  • Rate limiter between LIMEN and flow_queue to handle UNKNOWN rate spikes (Phase B)
  • FAISS upgrade validation (>1K flows/s at 250K entries)

BPB + LIMEN telemetry fusion: Both signals currently logged in parallel. Fusion rule will be defined from data after characterizing BPB false-positive rate on production benign traffic (VPNs, game protocols, proprietary RPC). Score fusion is a stopgap; shared representation (replacing LIMEN’s encoder with SubstrateNet backbone) is the long-term consolidation.


Smoke Test

An end-to-end test against a CobaltStrike/SSLoad pcap is available:

python smoke_test.py path/to/cobalt_strike.pcap

Runs the full pipeline and prints resulting EVE events. Expected output: LIMEN MALICIOUS verdict with CobaltStrike family attribution, BPB anomaly, investigation report with beacon detection and JA3 match.