Overview
minidet enriches an existing Suricata deployment without replacing it. The output format is EVE JSON β the same format Suricata already produces β so a SIEM that reads Suricata alerts reads minidet alerts without configuration changes.
The design principle is cost-proportional analysis: cheap signals run on everything, expensive signals run only when warranted.
Signal Layers
| Signal | Source | Latency | Triggers on |
|---|---|---|---|
| LIMEN | Supervised cosine similarity | ~200Β΅s | Every flow |
| BPB | Unsupervised byte-level novelty | Async | Token-bucket sampled |
| GRIMOIRE | Static binary triage | 10β30s | PE/ELF/ZIP magic in payload |
| PCAPR | Deep protocol analysis | Seconds | Correlator investigation |
LIMEN (Inline)
LIMEN runs synchronously in the FlowRouter on every flow. It encodes the raw flow bytes into a 256-dim fingerprint and compares against the PatternStore. The verdict (MALICIOUS, BENIGN, UNKNOWN) and confidence are written to the flowβs envelope immediately.
Cost: ~200Β΅s per flow. Suitable for inline deployment at typical enterprise traffic rates. Not suitable for 1Gbps without FAISS migration (see LIMEN paper).
BPB Scorer (Async, Sampled)
The Substrate-PCAP model assigns a bits-per-byte score to each flow. High BPB indicates structural novelty relative to benign training data β a complementary signal to LIMENβs cosine similarity.
BPB scoring runs async via a token-bucket queue, capped at a configurable rate (default 50 flows/s). This prevents the unsupervised model from saturating CPU when UNKNOWN rate spikes. The BPB result arrives asynchronously and is merged into the flowβs envelope by the EnvelopeStore.
GRIMOIRE (Async, Conditional)
GRIMOIRE runs async via a separate token-bucket queue, triggered only when a flow payload contains PE, ELF, or ZIP magic bytes. Binary inference is expensive (10β30 seconds for the 7b patch model); running it on every flow is not viable. Triggering on magic bytes limits GRIMOIRE to flows that actually contain an extractable binary.
PCAPR (Investigation Only)
PCAPR is the most expensive signal and runs only when the Correlator opens a formal investigation. It never runs on individual flows. When an investigation fires, the InvestigationWorker slices the relevant flows from the pcap and passes them to PCAPR for deep protocol analysis. Results are cached by (pcap_path, dst_ip) β repeated investigations against the same destination do not re-run PCAPR.
Weighted Scoring
The EnvelopeStore aggregates enrichments per flow_id and computes a weighted score when all expected signals have arrived (or the 60-second TTL expires):
| Signal | Weight | Rationale |
|---|---|---|
grimoire_malicious | 3 | Most specific β rarely fires, usually correct |
limen_malicious | 2 | Supervised match to known-malicious pattern |
pcapr_beacon | 2 | Highly regular timing is strong C2 indicator |
pcapr_tls_known_bad | 2 | JA3 match to known-bad fingerprint |
pcapr_dns_tunnel | 2 | DNS exfiltration scoring |
bpb_anomaly | 1 | Structural novelty β real signal, lower specificity |
limen_unknown | 1 | No reliable match β warrants attention, not conviction |
Verdict thresholds:
- Score β₯ 4 β MALICIOUS
- Score β₯ 2 β SUSPICIOUS
- Score = 1 β UNKNOWN
- Score = 0 β BENIGN
Correlator
The Correlator watches the stream of emitted EVE events and groups them by destination IP using pluggable strategies:
| Strategy | Groups by | Use case |
|---|---|---|
exact_ip | Single destination IP | Targeted C2 communication |
subnet_24 | /24 subnet | Scanning, lateral movement |
subnet_16 | /16 subnet | Broad scanning campaigns |
| ASN | Autonomous system | Infrastructure-level attribution |
Each strategy maintains a sliding window of event-time (not wall-clock) score sums. When the sum crosses the threshold within the window, an InvestigationCase is fired.
Default thresholds: 3 flows / score sum β₯ 4 within 120 seconds (exact_ip); 5 flows / score sum β₯ 6 within 120 seconds (subnet_24).
Time windowing uses event-time with watermarks. GRIMOIRE enrichment can arrive 30+ seconds after the flow that delivered the binary; wall-clock bucketing would drop late-arriving signals. Event-time ensures all enrichments are counted in the window they belong to.
Investigation Workflow
When the Correlator fires:
- InvestigationWorker receives the
InvestigationCase(destination IP, pcap path, correlated flow IDs) - Scapy slices the relevant flows from the pcap by destination IP
- PCAPR analyzes the slice β beacon detection, TLS fingerprinting, malware family attribution, state machine inference
- Results are cached by
(pcap_path, dst_ip) - An
investigation_reportEVE event is emitted into the output stream
Investigation report EVE JSON:
{
"event_type": "investigation_report",
"case_id": "inv-...-exact_ip-185.220.101.5",
"correlation": {
"dst_ip": "185.220.101.5",
"flow_count": 3,
"score_sum": 9
},
"pcapr": {
"beacon": {"interval_mean": 60.1, "regularity": "highly regular"},
"tls": {"ja3_hash": "...", "sni": "updates.example.com"},
"tls_known_bad": true,
"tls_family": "CobaltStrike-default",
"family_matches": [
{"family": "CobaltStrike", "confidence": 0.95, "matching_signals": ["ja3", "beacon_interval"]}
]
},
"verdict": "malicious",
"verdict_reason": "LIMEN family=cobalt_strike_ssload (9 neighbor votes); JA3=CobaltStrike-default; 3 high-score flows to 185.220.101.5 within 120s"
}
EVE JSON Schema β Flow Event
{
"event_type": "mini_detective",
"flow_id": "sha256(src_ip+dst_ip+dst_port+proto+floor(start_ts))",
"timestamp": "2026-05-12T19:00:00Z",
"src_ip": "10.0.1.42",
"src_port": 54321,
"dest_ip": "185.220.101.5",
"dest_port": 443,
"proto": "TCP",
"limen": {
"verdict": "malicious",
"confidence": 1.0,
"top_neighbors": ["cobalt_strike_ssload_2024-04-18"]
},
"bpb": {
"score": 7.7,
"anomaly": true,
"n_bytes": 512,
"threshold": 2.0
},
"grimoire": {
"report": {"final": "Verdict: Malicious β hardcoded C2 IP, process injection APIs"}
},
"score": 6,
"verdict": "malicious"
}
The flow_id join key is sha256(src_ip + dst_ip + dst_port + proto + floor(start_ts)). Host IP is not used as the join key β NAT, CGNAT, and DHCP churn break host_ip joins. The 5-tuple is stable across the lifetime of a flow.
Component Map
minidet/
router.py FlowRouter β pipeline entry point; LIMEN inline, queues async
envelope.py EnvelopeStore β collects enrichments, computes weighted score
correlator.py Sliding-window correlator; fires InvestigationCase
investigation_worker.py PCAPR investigation queue and caching
pcap_worker.py PCAPR subprocess wrapper + signal extraction
bpb_scorer.py BpbScorer wrapper around SubstrateScorer
grimoire_worker.py GRIMOIRE binary analysis wrapper
capture.py CaptureWorker β live packet capture via Scapy
Current Status and Roadmap
Operational today (offline):
- LIMEN scoring against saved pcaps
- GRIMOIRE triage of extracted binaries
- PCAPR offline protocol analysis
- Correlator and EnvelopeStore
- EVE JSON output
Required for live 1Gbps deployment:
- LIMEN PatternStore β FAISS (Phase 0)
- bytestrand monorepo unification to eliminate model code divergence (Phase A)
- CaptureWorker dpkt/af-packet implementation for live interface capture (Phase B)
- Rate limiter between LIMEN and flow_queue to handle UNKNOWN rate spikes (Phase B)
- FAISS upgrade validation (>1K flows/s at 250K entries)
BPB + LIMEN telemetry fusion: Both signals currently logged in parallel. Fusion rule will be defined from data after characterizing BPB false-positive rate on production benign traffic (VPNs, game protocols, proprietary RPC). Score fusion is a stopgap; shared representation (replacing LIMENβs encoder with SubstrateNet backbone) is the long-term consolidation.
Smoke Test
An end-to-end test against a CobaltStrike/SSLoad pcap is available:
python smoke_test.py path/to/cobalt_strike.pcap
Runs the full pipeline and prints resulting EVE events. Expected output: LIMEN MALICIOUS verdict with CobaltStrike family attribution, BPB anomaly, investigation report with beacon detection and JA3 match.