RC RANDOM CHAOS

135 Million Records Behind One Perimeter

McGraw Hill's 135 million account exposure proves edtech identity was classified low-risk while attackers priced it as inventory.

· 7 min read

Section 1: Opening position

McGraw Hill. 135 million account records. One vendor, one exposure event. That number is the entire position.

This is not a discussion of an incident. It is a discussion of what the trust model permitted. When 135 million account records sit inside a single exposure boundary, the boundary was not designed for the threat model it attracts. Edtech identity at scale is attacker-grade inventory. It always has been. The industry has treated it as low-risk because the individual account holder is a student or an educator, and the unit economics of attacking one looked weak. That calculation does not hold when the unit becomes 135 million.

Position: the breach is not the failure. The failure was the classification that allowed this volume of identities to concentrate behind a perimeter not hardened for the inventory it protected. The breach is the receipt.

Section 2: What actually failed

What is confirmed: 135 million account records were accessible to an unauthorized party. What is not confirmed: the specific access path, the duration of access, the data fields exposed beyond “account,” the authentication surface abused, whether the exposure resulted from a single access path or from many. Absence of those details is itself a condition. It does not imply anything about the mechanism.

What the confirmed fact establishes structurally: scope was not contained. Whatever the mechanism, a process existed by which 135 million records could be retrieved from an unauthorized position. That process was not stopped at a lower number. Whether rate limiting, per-identity scoping, segmentation, anomaly detection, or query-volume controls were present is not confirmed. What is confirmed is that if any of those controls existed, none produced an outcome smaller than 135 million.

The observable failure is at the access boundary, not at the data itself. Data exposure is the output. The input is an access path that retrieved at this scale without interruption. Every control positioned between a request and a record must account for why 135 million records passed through it. If the controls existed, they were not enforced. If they did not exist, the design was not modelled against the threat this identity population attracts. Either conclusion resolves to the same operator finding: the boundary was incorrectly placed.

Section 3: Why it failed

The specific attack method against McGraw Hill is not confirmed. The pattern across edtech is, and it is the pattern that makes a 135 million record exposure possible in a single event.

Edtech identity is provisioned in bulk. Institutions create accounts for students at enrollment. Passwords are often assigned by administrators, derived from predictable patterns, or never rotated after issuance. Authentication surfaces are public. MFA enforcement at sector scale is not confirmed, but observable adoption data consistently places edtech below consumer banking, enterprise SaaS, and healthcare. The credential pool produced by this provisioning model has three structural properties: large, predictable, and weakly defended. Those three properties describe inventory. Attackers price inventory by volume and by conversion rate, not by per-account value.

I ran large-scale credential operations pre-2018. Student and educator credential sets produced higher conversion than general consumer pools across every campaign where the pool was tested. The reason was not attacker sophistication. It was that the identity issuer did not treat the credential as a load-bearing control. The institution set the password. The user was not required to change it. The platform did not enforce rotation, strength, or reuse checks. Reuse across personal services was dense. The surface named “login” was performing identity validation only. It was not performing defense.

Why this fails at 135 million scale: the authentication layer cannot function as the primary trust boundary when the identity population is provisioned cheaply and defended weakly. If login is the boundary, and login can be stuffed, brute-forced, or bypassed through session compromise, the effective boundary is the attacker’s request budget, not the platform’s control posture. Whether that is what occurred against McGraw Hill is not confirmed. What is confirmed is that a platform holding 135 million identities at low classification was operating on a trust model that has been publicly exploited against edtech for years. The control model treated these identities as lower-risk than the attacker did. Attackers set the price. Defenders do not get to override it by classification.

Section 4: Mechanism of failure

The mechanism exposed by a 135 million record event is classification drift. Classification drift occurs when the control model is set once, at provisioning, and never recalibrated against the inventory it now holds. Edtech identity was classified as low-risk when a platform held thousands of accounts and when those accounts mapped to course access. That classification persisted as the platform grew to hold tens of millions, then hundreds of millions. The controls did not drift. The inventory did. The gap between them is where 135 million records became retrievable.

This is a boundary-placement problem, not an attacker sophistication problem. When identity is classified low, the controls placed around it are low. MFA is optional or absent. Rate limiting is tuned for normal classroom load, not credential spray. Anomaly detection is tuned for educator behaviour, not for a credential harvester operating through residential proxies. Monitoring thresholds assume a student population, not an inventory broker. Every control derived from the original classification performs against the workload the classification describes. None of them perform against the workload the inventory attracts. Whether any of those controls were present at McGraw Hill is not confirmed. What is confirmed is that none of them produced an outcome smaller than 135 million.

The structural exposure: any platform that provisions identity at institutional scale, defends it at individual scale, and aggregates it behind a single perimeter is running the same mechanism. The number 135 million is not the story. The story is that the control surface was sized for the classification, and the inventory was sized for the attacker. When those two sizing models diverge, the delta is the exposure. The receipt arrives later. The position is fixed at the point of classification. Everything after classification is execution of a pre-determined outcome.

Section 5: Parallel pattern

The same mechanism produces the same outcomes in every sector where identity is provisioned in bulk and defended individually. Healthcare member portals hold credential pools issued at enrollment, rarely rotated, defended at the login page. Payroll and HRIS platforms hold credentials for entire workforces, provisioned at onboarding, with authentication surfaces exposed to the open internet. Loyalty programs hold identity pools at nine-figure scale, provisioned with low-friction signup, defended with password-only authentication. Each of these populations is classified against the individual account holder. Each of them is priced by attackers against the aggregate.

The pattern is identical. Issuer provisions at scale. Issuer sets or accepts a weak credential. Issuer does not require the user to replace it. Platform exposes login publicly. Platform aggregates hundreds of thousands to hundreds of millions of these identities behind one authentication surface. Defence is applied per-request. Attack is applied per-pool. The per-request control model cannot price in the pool-level economics the attacker sees. The outcome converges on the same structural event: a single exposure window that retrieves at the scale of the inventory, not at the scale of any one control.

This is why the McGraw Hill number is not an outlier. It is a data point in a series. The series is defined by the mechanism, not by the sector. Any platform operating at eight or nine-figure identity scale with a classification inherited from a smaller era is running the same pattern. The attacker economics do not need to change. The inventory does not need to shift. The mechanism is already in place. What varies is which platform produces the next receipt and what the exposure number reads when it does. The pattern is not a prediction. It is a description of the current state.

Section 6: Operator position

Identity at scale is not a low-risk asset. It has never been. The classification that made it appear low-risk was an artifact of unit thinking, not of threat modelling. Attackers do not attack units. They attack pools. A control model that defends units while aggregating pools is not a control model. It is a staging area for the next 135 million record event. The classification is the vulnerability. Everything downstream of the classification is downstream of the vulnerability.

What must be true going forward: the authentication surface cannot be the primary trust boundary when the identity population is provisioned cheaply, defended weakly, and aggregated at nine-figure scale. If login is the boundary, the boundary is ineffective at this scale. The 135 million number proves it. No other proof is required. Trust must be validated continuously against the identity, the session, the device, the request pattern, and the blast radius of any single authenticated position. A control that stops at authentication does not scale against an inventory that is priced as inventory.

The McGraw Hill event is not a failure of technology. It is a failure of classification carried from a smaller system into a larger one without re-evaluation. Every edtech, healthcare, payroll, and loyalty platform holding identity at this scale is operating under the same inherited classification. Until the classification is repriced against the attacker’s unit economics, the mechanism continues. Controls that are not enforced are not controls. Identity is the boundary. The boundary was not placed at identity. The receipt reads 135 million. The next one is already being written.

See also: NordVPN for tunneled traffic when operating outside controlled networks.


#ad Contains an affiliate link.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.