Notes on Byte-Transformer Models for Detecting EDR-Evading Malware

How we trained an in-memory detection agent on raw bytes — and what surprised us about generalization to unseen packers.

Byte-level transformers don’t read assembly. They don’t care about import tables, section headers, or string artifacts — the things every AV and EDR has been trained to catch for thirty years. You feed them a sliding window of raw bytes and they return a probability. That property is exactly what makes them interesting for the category of malware that has defeated everything else: custom packers, in-memory-only loaders, and implants that spend significant engineering effort looking structurally normal.

We trained our first model on a corpus of packed malware families we’d reversed over two years of IR work. The headline accuracy on a held-out test set was good but not the interesting result. The interesting result was generalization: the model flagged a bespoke custom packer we’d never seen, written by a threat actor who clearly knew about signature-based detection, with a false-positive rate we could live with in a production EDR integration. It had learned structural priors about what machine code looks like — priors the threat actor’s tooling violated even while defeating every signature.

The operational challenges are real and mostly unsolved by the research literature. Latency on the endpoint, memory footprint in a constrained EDR agent, and the trust problem — asking an analyst to act on a model’s probability when they can’t read the model’s reasoning. We solved the latency and memory problems with quantization and a fast first-stage filter. The trust problem we solved with training: analysts who understand how the model was built and what failure modes look like will use it. Analysts handed a black box will route around it.

The thing nobody talks about in the research papers: the organizational change required to deploy this in a real enterprise is larger than the engineering work. Detection teams need new runbooks. Incident response teams need to know what a model-flagged alert means versus a signature-flagged alert. Leadership needs to understand why the false-positive rate is a policy decision, not a technical one. Getting the model into production took three months. Getting the organization to use it well took a year.