RC RANDOM CHAOS

Contractor PAT leaked 270GB of Times source

The 2024 NYT source code leak was not a credential breach. It was a credential sprawl chain. The mechanism, telemetry gaps, and what still applies.

· 7 min read

The New York Times source code disclosure dropped 270GB of internal repositories onto an anonymous imageboard in June 2024. The breach vector was not a sophisticated intrusion. It was a GitHub access token belonging to a third-party contractor that had been exposed in a configuration file. The token granted read access to roughly 5,000 internal repositories. The repositories contained the actual payload - embedded API keys, internal service URLs, infrastructure manifests, and CI/CD credentials for systems far outside the editorial CMS the contractor was nominally permitted to touch.

The bug class here is not memory corruption. It is credential sprawl with no enforced blast radius. CWE-798, use of hard-coded credentials. CWE-540, information exposure through source code. CWE-522, insufficiently protected credentials. None of these classes are novel. All of them appeared in the OWASP Top 10 in 2017, 2021, and 2025. The persistence of the class across a decade of advisory cycles reflects the cost asymmetry. Embedding a key is one line of code. Rotating it across every consuming service and CI environment is a multi-team coordination problem. The shortcut wins on every sprint that nobody audits.

The mechanism of compromise is straightforward. A contractor commits a repository configuration containing a long-lived GitHub Personal Access Token. That PAT was scoped to repo rather than constrained by fine-grained permissions, granted organisation-wide read across private repositories rather than scoped to the specific projects the contractor worked on. The token is harvested - either by an attacker scanning public commit histories, a leak from the contractor’s own environment, or a misconfigured artifact store. The attacker authenticates to api.github.com with the token. GitHub returns a list of accessible repositories. The attacker clones them in bulk. T1213.003, data from code repositories. T1078.004, valid accounts - cloud. T1552.001, credentials in files.

The cloning itself is not the breach outcome. It is the staging step for the second-order extraction. Once the repositories are local to the attacker, the actual key material is mined out. Tools for this stage are public and mature. TruffleHog, GitLeaks, detect-secrets, GitGuardian’s CLI. The attacker runs entropy-based and regex-based scanners across the cloned tree, against the full git history rather than just HEAD. A key committed in 2019 and rotated out of the working tree in 2020 still sits in git log -p. The scan returns AWS access keys, Slack webhooks, Mailchimp API keys, third-party CMS tokens, internal service JWTs, database connection strings, Algolia search admin keys, Twilio credentials, and Sentry DSNs. Each finding is a potential pivot.

This is the chain. GitHub PAT yields repository access. Repository history yields embedded keys. Embedded keys yield access to AWS, to Slack workspaces, to email infrastructure, to third-party SaaS that holds customer data. The privilege escalation does not happen by exploiting a vulnerable binary. It happens by collecting credentials that were already present, already valid, and already broadly scoped. The attacker does not need to elevate. The credentials elevate for them.

The systemic failure is the trust contract between the editorial CMS infrastructure and the surrounding developer toolchain. A CMS for a newsroom does not require write access to ad-tech infrastructure. A WordPress plugin maintained by a contractor does not require credentials capable of querying the subscription database. A CI pipeline for the iOS app does not require keys that can publish to the company’s customer email lists. Each of these adjacencies existed because the credential model was permission-by-convenience. The token that worked yesterday still works today, for everything it ever worked for, until someone notices.

The real-world precedent for this pattern is dense. Uber 2016 - AWS keys in a private GitHub repository accessed via stolen credentials, 57 million records exfiltrated. Codecov 2021 - bash uploader script modified to exfiltrate environment variables from CI, downstream compromise of HashiCorp, Rapid7, Twilio. Toyota 2022 - access keys to customer data servers committed to a public GitHub repository for five years. CircleCI 2023 - malware on an engineer’s laptop exfiltrated session tokens, leading to customer secret exposure. Microsoft 2024 - Midnight Blizzard (APT29) pivoted through a legacy non-production tenant account into corporate email by enumerating OAuth applications with excessive permissions. The NYT incident is not anomalous. It is the modal cloud breach of the last five years.

The threat actor profile for this class is broad. Financially motivated groups - ShinyHunters, IntelBroker - operate as the public-facing distribution layer for credential-derived data drops. The acquisition layer beneath them is more diffuse. Initial access brokers harvest credentials from infostealer logs (Redline, Raccoon, Lumma) sold on Russian Market and Genesis successors. They harvest from leaked GitHub commits surfaced through GH Archive scraping. They harvest from misconfigured S3 buckets, exposed Elasticsearch instances, and forgotten Jenkins admin consoles. The token does not need to be stolen from inside the perimeter. The perimeter is the set of systems that accept the token, and most of those systems are external SaaS.

Telemetry for this attack chain is partial and unevenly distributed. On the GitHub side, the audit log records git.clone events with source IP, user agent, and repository identifier. A clone storm from a residential IP, a VPS provider, or a Tor exit - over a short window, across hundreds of repositories - is a strong signal. Most organisations do not stream the GitHub audit log into a SIEM. Most that do, do not alert on clone volume anomalies. GitHub Enterprise customers can enforce IP allowlists, but contractor PATs frequently exist on personal accounts that bypass organisation-level network controls. The audit log will show the event. Nothing will be watching.

AWS CloudTrail captures GetCallerIdentity, ListBuckets, GetSecretValue, and the rest of the IAM and Secrets Manager API surface. An exfiltrated AWS access key used from a new geography, a new ASN, or against a previously unseen service combination produces detectable signal. GuardDuty surfaces UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.OutsideAWS and Recon:IAMUser/MaliciousIPCaller for the textbook cases. The detection gap is the long-lived programmatic key that is supposed to be used from outside AWS - CI runners, contractor workstations, third-party SaaS integrations. There is no behavioural baseline that distinguishes legitimate use from attacker use, because the legitimate use already comes from arbitrary internet IPs.

For the third-party SaaS integrations downstream of the leaked keys, telemetry is whatever the vendor exposes. Slack records auth.test calls and tokens used by integration bots, but the surface for enterprise correlation is limited. Mailchimp, Twilio, SendGrid - each maintains its own audit surface, none of which a SIEM consumes by default. The defender’s visibility ends at the API boundary of every SaaS the credentials reach. The attacker’s visibility continues.

Detection engineering for this class does not begin with a SIEM rule. It begins with pre-commit secret scanning, push protection at the VCS layer, and short-lived credentials issued through OIDC federation rather than static keys. GitHub Advanced Security ships push protection that blocks commits containing patterns matching known token formats. AWS IAM Roles for GitHub Actions via OIDC removes long-lived access keys from CI entirely. HashiCorp Vault, AWS Secrets Manager dynamic secrets, and short-TTL workload identities collapse the window during which a leaked credential remains valid. None of these controls were absent from public guidance in 2024. The NYT incident occurred because they were absent from the contractor pipeline.

The residual exposure after the disclosure is the longer tail. Source code in attacker possession enables targeted analysis for second-order vulnerabilities - authentication logic, internal API contracts, undocumented admin endpoints, third-party service integrations whose credentials may not have been rotated. Rotation of leaked tokens addresses the immediate compromise. It does not address the durable intelligence advantage of holding the internal architecture in static form. Every endpoint in the codebase is now a fixed target with full implementation visibility. The defender is patching against an attacker with the source.

The technical reality is unchanged by the patch boundary. A leaked credential is rotated. A leaked codebase is not. The contractor PAT that started the chain has been revoked. The architectural decisions that made one contractor’s token a path to organisation-wide credential extraction remain. Credential scope as policy. Secret material as ephemeral state. Pipeline trust as a graph with explicit edges. The compromise of the New York Times was the compromise of an organisation that had not implemented these as enforced controls. The same description applies to most newsrooms, most SaaS companies, and most cloud-native enterprises operating in 2026. The incident is documentation of the modal posture, not the exception to it.

See also: NordVPN for tunneled traffic when operating outside controlled networks.


#ad Contains an affiliate link.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.