RC RANDOM CHAOS

MITRE already filed your detection bypass as AML.T0015

ML malware detection is a deterministic classifier with a mappable decision boundary. Attackers exploit its learned bias. That demands more engineering.

· 7 min read

The pitch for AI in detection is that it lets defenders do less. Less rule-writing. Less tuning. Less engineering. The pitch is backwards. Machine-learning detection - NGAV, ML-EDR, the static classifiers bolted onto modern endpoint agents - is a statistical function mapping features to a confidence score. It is deterministic. It is queryable. It has a decision boundary. That boundary is the attack surface, and most teams ship it untested.

This is not a debate about hype. It is a vulnerability class. MITRE tracks it in ATLAS, the adversarial-ML companion to ATT&CK. AML.T0015 is evade ML model. AML.T0043 is craft adversarial data. AML.T0040 is ML model inference API access. The techniques are catalogued because the attacks are repeatable, and the attacks are repeatable because the model behaves the same way every time you ask it.

Start with what the classifier actually learns. A static PE classifier trains on a feature vector. The EMBER reference set is the public template - byte histograms, byte-entropy histograms, imported and exported function lists, section names and sizes, header fields, string counts, presence of a debug section, presence of a signature. A gradient-boosted tree or a deep net learns the correlations between those features and the malicious or benign label in the training corpus. MalConv skips feature engineering and reads raw bytes through a convolutional net. Either way, the output is a score and a threshold.

The model does not learn malice. It learns correlation. That distinction is the entire problem. When the training corpus over-represents signed binaries in the benign class, the model weights a valid Authenticode signature toward benign. When benign samples carry certain version strings, manifest entries, or icon resources, the model weights those toward benign. The classifier is encoding the biases of its training distribution. Those biases are stable, and a stable bias is a predictable input-output relationship. An attacker who maps the relationship controls the output.

The exploit path does not require a generative model or a novel primitive. It requires query access and patience. When the classifier runs locally on the endpoint - which it does, because cloud round-trips are too slow for pre-execution verdicts - the attacker has an offline oracle with unlimited queries and zero rate limiting. AML.T0040. The attacker submits samples, reads verdicts, and reconstructs the shape of the decision boundary. This is gray-box probing against a function that never changes between queries and never logs that it was probed.

From there the work is feature-space perturbation under a functionality constraint. The attacker modifies the sample so its feature vector crosses the boundary into benign while the executable still runs. The perturbations live in regions the loader ignores. Append bytes to the overlay past the last section. Pad slack space inside section alignment. Add imports that are never called. Insert strings and resources harvested from known-benign software. Adjust section entropy by inflating low-entropy padding. None of this touches the malicious code path. The PE entry point executes exactly as before. The feature vector moves; the program does not. This is the gap between feature space and problem space, and adversarial-ML research has lived in that gap since 2018 - Demetrio and colleagues demonstrated header and padding attacks against MalConv that flipped verdicts by editing bytes the CPU never executes.

Query access is not even a hard requirement, because adversarial examples transfer. An attacker without the deployed model trains a substitute on the same public feature space - EMBER is open, MalConv is open - and crafts evasions against the surrogate. Models trained on overlapping feature distributions learn overlapping boundaries, so a perturbation that flips the surrogate flips the target at a usable rate. ATLAS catalogs this as AML.T0005, create proxy ML model, feeding a black-box evasion. The defender’s model being proprietary buys less than the vendor implies. The feature set is the shared secret, and the feature set is published.

The canonical real-world case is CylancePROTECT. In 2019 Skylight Cyber published a universal bypass. They analyzed the model, identified features it over-weighted toward benign, and found that appending a specific block of strings pulled from a popular benign game flipped malicious samples to benign. Same appended content, many different malware families, consistent flip. That is the signature of a model bias, not a one-off evasion. The classifier had learned that the presence of those benign-associated features outweighed the malicious signal in the rest of the file. The bug was not in the code that ran the model. The bug was the model - the decision boundary it had learned from its training data. There is no CVE for a decision boundary. You cannot patch a learned weight with a hotfix. You retrain, and retraining moves the boundary somewhere else.

This maps cleanly onto attacker tradecraft. Adversarial perturbation is defense evasion, ATT&CK TA0005. It chains with T1027, obfuscated files or information, and supports T1204, user execution, by getting a malicious binary past the pre-execution gate so a lure can run it. The attacker is not breaking cryptography or corrupting memory. They are submitting valid inputs to a classifier and observing that it returns the answer they engineered. The model performs to specification. The specification was wrong.

Now the part defenders underweight - what this produces in telemetry. Nothing. That is the finding. A conventional detection failure leaves a trail. A signature miss still logs the process create. A blocked-then-bypassed control fires an alert before it is evaded. An adversarial evasion against a static classifier produces a benign verdict and a single float below threshold. There is no Sysmon event for a model that scored 0.48 against a 0.50 cutoff. There is no Windows Security event for a confidence margin. The classifier does not emit its own uncertainty. It returns benign, the file is allowed, and the EDR records the same process create Event ID 1 it records for every legitimate binary. The defender sees a clean execution.

The blind spot is structural, not a tuning miss. Teams that consolidated layered detection onto a single ML verdict now depend on one signal, and that signal can be flipped without generating a second one. The model has no concept of being attacked. Querying it ten thousand times to map its boundary looks identical to ten thousand benign scans, because locally there is no API gateway counting queries, no anomaly rule on inference volume, no log line that says this host has scanned an improbable number of near-identical samples. AML.T0040 and AML.T0043 execute in a telemetry vacuum. Compare that to a brute-force against an Okta tenant or a credential-stuffing run against a Cloudflare-fronted login - those generate rate-limit events, velocity alerts, and IP reputation hits. Inference against a local model generates a CPU spike at worst.

The discipline this demands is more, not less, and it is specific. Treat the model as a component with an attack surface and instrument it accordingly. Log inference volume and score distributions per host so boundary-probing has somewhere to register. Alert on clusters of near-duplicate samples scoring just under threshold, which is the fingerprint of a perturbation search. Keep behavioral and post-execution telemetry independent of the static verdict, so a flipped pre-execution score is not the only gate between a binary and execution - process lineage, command-line auditing, T1059 script and shell telemetry, network egress all fire after the classifier has already said benign. Version the model, track training-data provenance, and monitor drift, because a model whose boundary you cannot describe is a model whose evasions you cannot predict. Run adversarial robustness testing as a release gate the way you would fuzz a parser. Ensembles and feature ablation raise the cost of a universal bypass because the attacker now has to move multiple boundaries at once, not one.

Provenance is not paperwork here. The same statistical machinery that makes the boundary mappable makes the training pipeline a target. AML.T0020, poison training data, sits one step upstream - an attacker who lands samples into the corpus a vendor scrapes from public feeds can teach the model that an attacker-chosen feature reads as benign, building the bias deliberately instead of discovering it. That is the supply-chain version of the Cylance bias, and it inherits the same property: it produces no runtime alert, because the model is doing exactly what its training taught it. Auditing where labels come from is detection engineering, not compliance theater.

The residual exposure survives any single fix. Retraining on adversarial examples hardens the model against the specific perturbations in the training set and moves the boundary; it does not eliminate the boundary, and a new boundary has new blind spots an attacker can map with the same offline oracle. Local deployment keeps handing attackers unlimited queries. The functionality constraint stays satisfiable because executable formats carry slack the loader ignores by design - overlay, padding, unreferenced imports, dead resources. The mechanism is stable. The fixes raise cost and add visibility. They do not close the class.

AI did not remove the engineering. It moved it. The work shifted from writing detection logic you can read to defending a statistical boundary you cannot see, instrumenting a component that fails silently, and testing a model the way an attacker will query it. Teams that bought the less-engineering pitch shipped an untested classifier as a control and called it coverage. The attackers are not using sophisticated generative models against them. They are submitting valid files and reading the score.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.