minidet: A Queue-Based Multi-Signal Network Threat Detection Sidecar

Abstract

minidet is a network threat detection sidecar for Suricata. It runs four signal layers against every TCP/UDP flow: LIMEN (supervised cosine similarity, inline, ~200µs per flow), BPB (unsupervised byte-level novelty, async, token-bucket sampled), GRIMOIRE (static binary triage, async, triggered only on binary payloads), and PCAPR (deep protocol analysis, triggered only by correlator-initiated investigations). A weighted scoring system combines signals; a sliding-window correlator groups scores by destination IP and fires investigations when thresholds are crossed. All output is EVE JSON, compatible with any SIEM already ingesting Suricata alerts.

Overview

minidet enriches an existing Suricata deployment without replacing it. The output format is EVE JSON — the same format Suricata already produces — so a SIEM that reads Suricata alerts reads minidet alerts without configuration changes.

The design principle is cost-proportional analysis: cheap signals run on everything, expensive signals run only when warranted.

Signal Layers

Signal	Source	Latency	Triggers on
LIMEN	Supervised cosine similarity	~200µs	Every flow
BPB	Unsupervised byte-level novelty	Async	Token-bucket sampled
GRIMOIRE	Static binary triage	10–30s	PE/ELF/ZIP magic in payload
PCAPR	Deep protocol analysis	Seconds	Correlator investigation

LIMEN (Inline)

LIMEN runs synchronously in the FlowRouter on every flow. It encodes the raw flow bytes into a 256-dim fingerprint and compares against the PatternStore. The verdict (MALICIOUS, BENIGN, UNKNOWN) and confidence are written to the flow’s envelope immediately.

Cost: ~200µs per flow. Suitable for inline deployment at typical enterprise traffic rates. Not suitable for 1Gbps without FAISS migration (see LIMEN paper).

BPB Scorer (Async, Sampled)

The Substrate-PCAP model assigns a bits-per-byte score to each flow. High BPB indicates structural novelty relative to benign training data — a complementary signal to LIMEN’s cosine similarity.

BPB scoring runs async via a token-bucket queue, capped at a configurable rate (default 50 flows/s). This prevents the unsupervised model from saturating CPU when UNKNOWN rate spikes. The BPB result arrives asynchronously and is merged into the flow’s envelope by the EnvelopeStore.

GRIMOIRE (Async, Conditional)

GRIMOIRE runs async via a separate token-bucket queue, triggered only when a flow payload contains PE, ELF, or ZIP magic bytes. Binary inference is expensive (10–30 seconds for the 7b patch model); running it on every flow is not viable. Triggering on magic bytes limits GRIMOIRE to flows that actually contain an extractable binary.

PCAPR (Investigation Only)

PCAPR is the most expensive signal and runs only when the Correlator opens a formal investigation. It never runs on individual flows. When an investigation fires, the InvestigationWorker slices the relevant flows from the pcap and passes them to PCAPR for deep protocol analysis. Results are cached by (pcap_path, dst_ip) — repeated investigations against the same destination do not re-run PCAPR.

Weighted Scoring

The EnvelopeStore aggregates enrichments per flow_id and computes a weighted score when all expected signals have arrived (or the 60-second TTL expires):

Signal	Weight	Rationale
`grimoire_malicious`	3	Most specific — rarely fires, usually correct
`limen_malicious`	2	Supervised match to known-malicious pattern
`pcapr_beacon`	2	Highly regular timing is strong C2 indicator
`pcapr_tls_known_bad`	2	JA3 match to known-bad fingerprint
`pcapr_dns_tunnel`	2	DNS exfiltration scoring
`bpb_anomaly`	1	Structural novelty — real signal, lower specificity
`limen_unknown`	1	No reliable match — warrants attention, not conviction

Verdict thresholds:

Score ≥ 4 → MALICIOUS
Score ≥ 2 → SUSPICIOUS
Score = 1 → UNKNOWN
Score = 0 → BENIGN

Correlator

The Correlator watches the stream of emitted EVE events and groups them by destination IP using pluggable strategies:

Strategy	Groups by	Use case
`exact_ip`	Single destination IP	Targeted C2 communication
`subnet_24`	/24 subnet	Scanning, lateral movement
`subnet_16`	/16 subnet	Broad scanning campaigns
ASN	Autonomous system	Infrastructure-level attribution

Each strategy maintains a sliding window of event-time (not wall-clock) score sums. When the sum crosses the threshold within the window, an InvestigationCase is fired.

Default thresholds: 3 flows / score sum ≥ 4 within 120 seconds (exact_ip); 5 flows / score sum ≥ 6 within 120 seconds (subnet_24).

Time windowing uses event-time with watermarks. GRIMOIRE enrichment can arrive 30+ seconds after the flow that delivered the binary; wall-clock bucketing would drop late-arriving signals. Event-time ensures all enrichments are counted in the window they belong to.

Investigation Workflow

When the Correlator fires:

InvestigationWorker receives the InvestigationCase (destination IP, pcap path, correlated flow IDs)
Scapy slices the relevant flows from the pcap by destination IP
PCAPR analyzes the slice — beacon detection, TLS fingerprinting, malware family attribution, state machine inference
Results are cached by (pcap_path, dst_ip)
An investigation_report EVE event is emitted into the output stream

Investigation report EVE JSON:

{
  "event_type": "investigation_report",
  "case_id": "inv-...-exact_ip-185.220.101.5",
  "correlation": {
    "dst_ip": "185.220.101.5",
    "flow_count": 3,
    "score_sum": 9
  },
  "pcapr": {
    "beacon": {"interval_mean": 60.1, "regularity": "highly regular"},
    "tls": {"ja3_hash": "...", "sni": "updates.example.com"},
    "tls_known_bad": true,
    "tls_family": "CobaltStrike-default",
    "family_matches": [
      {"family": "CobaltStrike", "confidence": 0.95, "matching_signals": ["ja3", "beacon_interval"]}
    ]
  },
  "verdict": "malicious",
  "verdict_reason": "LIMEN family=cobalt_strike_ssload (9 neighbor votes); JA3=CobaltStrike-default; 3 high-score flows to 185.220.101.5 within 120s"
}

EVE JSON Schema — Flow Event

{
  "event_type": "mini_detective",
  "flow_id": "sha256(src_ip+dst_ip+dst_port+proto+floor(start_ts))",
  "timestamp": "2026-05-12T19:00:00Z",
  "src_ip": "10.0.1.42",
  "src_port": 54321,
  "dest_ip": "185.220.101.5",
  "dest_port": 443,
  "proto": "TCP",
  "limen": {
    "verdict": "malicious",
    "confidence": 1.0,
    "top_neighbors": ["cobalt_strike_ssload_2024-04-18"]
  },
  "bpb": {
    "score": 7.7,
    "anomaly": true,
    "n_bytes": 512,
    "threshold": 2.0
  },
  "grimoire": {
    "report": {"final": "Verdict: Malicious — hardcoded C2 IP, process injection APIs"}
  },
  "score": 6,
  "verdict": "malicious"
}

The flow_id join key is sha256(src_ip + dst_ip + dst_port + proto + floor(start_ts)). Host IP is not used as the join key — NAT, CGNAT, and DHCP churn break host_ip joins. The 5-tuple is stable across the lifetime of a flow.

Component Map

minidet/
  router.py              FlowRouter — pipeline entry point; LIMEN inline, queues async
  envelope.py            EnvelopeStore — collects enrichments, computes weighted score
  correlator.py          Sliding-window correlator; fires InvestigationCase
  investigation_worker.py PCAPR investigation queue and caching
  pcap_worker.py         PCAPR subprocess wrapper + signal extraction
  bpb_scorer.py          BpbScorer wrapper around SubstrateScorer
  grimoire_worker.py     GRIMOIRE binary analysis wrapper
  capture.py             CaptureWorker — live packet capture via Scapy

Current Status and Roadmap

Operational today (offline):

LIMEN scoring against saved pcaps
GRIMOIRE triage of extracted binaries
PCAPR offline protocol analysis
Correlator and EnvelopeStore
EVE JSON output

Required for live 1Gbps deployment:

LIMEN PatternStore → FAISS (Phase 0)
bytestrand monorepo unification to eliminate model code divergence (Phase A)
CaptureWorker dpkt/af-packet implementation for live interface capture (Phase B)
Rate limiter between LIMEN and flow_queue to handle UNKNOWN rate spikes (Phase B)
FAISS upgrade validation (>1K flows/s at 250K entries)

BPB + LIMEN telemetry fusion: Both signals currently logged in parallel. Fusion rule will be defined from data after characterizing BPB false-positive rate on production benign traffic (VPNs, game protocols, proprietary RPC). Score fusion is a stopgap; shared representation (replacing LIMEN’s encoder with SubstrateNet backbone) is the long-term consolidation.

Smoke Test

An end-to-end test against a CobaltStrike/SSLoad pcap is available:

python smoke_test.py path/to/cobalt_strike.pcap

Runs the full pipeline and prints resulting EVE events. Expected output: LIMEN MALICIOUS verdict with CobaltStrike family attribution, BPB anomaly, investigation report with beacon detection and JA3 match.