Scraping the bottom of the CORS barrel (part 1)

James Kettle’s 2016 research was instrumental in raising awareness of the deleterious effects of CORS (Cross-Origin Resource Sharing) misconfiguration on Web security. Does the story end there, though? Is writing about CORS-related security issues in 2022 futile? I don’t think so.

This post is the first in a series in which I will discuss more minor CORS-related issues and present lesser-known detection techniques. My primary audience is people on the offensive side, but folks on the defensive side may also find this series interesting.

I do not present the fundamentals of the CORS protocol here. If you need to familiarise yourself with the protocol, MDN Web Docs’ introduction is a good starting point.

Test resources, not just domains ¶

Not only can CORS be configured at different layers of the tech stack (reverse proxy, origin server, etc.), it can also be configured at different levels of granularity. For example, Express’s CORS middleware allows developers to configure CORS not just for the whole server, but also for individual routes. When hunting for CORS misconfigurations on a given domain, bear in mind that some resources may be CORS-aware and others not, and that different CORS-aware resources may have different CORS configurations. Testing for CORS awareness and misconfiguration of only a single path on your target domain would be a mistake, because you may fail to detect misconfigured CORS-aware resources on that domain.

Broaden your search for allowed origins ¶

Because the CORS protocol only tolerates a single serialized-origin value in the Access-Control-Allow-Origin response header, the set of allowed origins (if any) doesn’t readily reveal itself to attackers. In contrast, the script-src directive of a page’s Content Security Policy explicitly specifies sources for JavaScript that are valid for the page.

Historical note ¶

The W3C’s 2009 Working Draft about CORS did make provision for a multivalued Origin header, as did RFC 6454 (entitled “The Web Origin Concept”). However, no major browser has ever supported this feature, which led the W3C to revise its stance in its 2013 Candidate Recommendation. The Fetch Standard, which has since supplanted the latter as the official specification of the CORS protocol, does not support a multi-valued Origin header.

To even detect CORS awareness, let alone infer as much of the set of trusted origins as possible, you often have no other choice but dynamic analysis, treating the server as an oracle and repeatedly asking it yes-or-no questions, one candidate origin at a time:

Client: Do you allow this origin?

Server: No.

Client: Ok… What about that one?

Server: Nope. Keep trying.

Client: How about this other one?

Server: Yes, I do.

A good first step, according to accepted wisdom, consists in supplying the origin of the resource itself: for instance, if resource https://example.com/whatever is configured for CORS, it likely allows its own origin, https://example.com.

However, I’ve run into a few cases where resource https://example.com/whatever is configured for CORS and allows origin https://foo.example.com, but does not allow origin https://example.com or any subdomain of example.com other than foo.example.com. Had I not enumerated the subdomains of example.com and fed them to my dynamic analysis of the target, I would likely have failed to detect that it was even configured for CORS, because the responses to most of my probing requests didn’t contain any CORS response headers.

Beyond subdomains, which other origins is the resource of interest likely to allow in its CORS configuration (if any)? A promising source of candidate origins is other domains owned by your target, as well as their subdomains. A couple of reverse-WHOIS searches can reveal many of those domain names.

Don’t stop there. Meiser et al., in a fascinating 2021 paper entitled Careful Who You Trust: Studying the Pitfalls of Cross-Origin Communication, suggest that useful clues can be gleaned from your target’s /crossdomain.xml (Flash) and /clientaccesspolicy.xml (Silverlight) resources, if those resources exist on the server. In my experience, theirs is a good tip. Looking those pages up on the Wayback Machine may also yield a few additional candidate origins. Furthermore, as Meiser et al. rightly point out, the listing of https://a.com in the connect-src CSP directive of https://b.com is a good indication that https://a.com allows origin https://b.com in its CORS configuration.

Unescaped periods in the eTLD+1 part of a regexp ¶

Many CORS configurations that intend to allow multiple origins rely on some regular expression for origin validation. In some cases, the regexp is flawed, insofar as it matches more origins than intended by the developers. A common blunder consists in using unescaped periods to represent DNS label separators in the regexp.

Through dynamic analysis of your target, you may be able to infer that the developers, in order to allow https://example.com and arbitrary subdomains thereof, are using something like the following regexp for origin validation:

^https:\/\/(.*\.)?example.com$

Note that the period separating the subdomain DNS label from the eTLD+1 (example.com) is escaped as it should, which precludes attacks from origins like https://notexample.com. If you cannot find some XSS or subdomain takeover on a subdomain of example.com, you should give up and focus your efforts elsewhere.

Attentive readers may have noticed the presence of an unescaped period between the TLD (com) and second-level domain (example). Is this tantalising omission of an escape exploitable in some way? Unfortunately, the public-suffix list contains no entry that starts with example followed by a single character followed by com (at least, at the time of writing this post). Therefore, acquiring a domain name like attacker.examplezcom is impossible, unless you’re willing to go through the arduous and expensive process of registering a brand-new examplezcom eTLD for yourself.

However, in cases where the eTLD of the allowed origin is composed of, not just one, but multiple DNS labels, everything’s not lost. For instance, assume now that the following regexp is used for origin validation:

^https:\/\/(.*\.)?example.co.uk$

Note that the part of the regexp corresponding to the eTLD+1 contains two unescaped periods. The second unescaped period (the one between co and uk) is unexploitable, for the same reason as explained above. The first unescaped period (the one between example and co) is much more promising, because uk is itself a public suffix, and a domain like examplezco.uk, whose secure origin matches the regexp, is probably available for purchase. If the price for that domain isn’t prohibitive, you can buy it and mount your attack against your target from there.

Admittedly, because public suffixes that are composed of multiple DNS labels (co.uk, in my example) and for which a less specific public suffix exists (uk) are relatively rare, exploitable cases of unescaped periods within the eTLD+1 part of a regexp are even rarer. To this day, neither I nor infosec superstar James Kettle have ever come across one. But you should leave no stone unturned. In fact, the possibility of such a vulnerability should further spur you to broaden your search for domains allowed by your target’s CORS configuration.

CORS vs. SameSite ¶

CORS-aware resources that are meant to allow reliable cross-site access now need session-identifying cookies to be explicitly (i.e. rather than relying on browsers’ defaults) set with SameSite=None and Secure.

However, contrary to what you may have read elsewhere, legitimate use cases for the stricter Lax and Strict values in conjunction with CORS do exist. Developers who wish to protect their CORS-aware resources against cross-site attacks but nonetheless allow (all or some) same-site origins may indeed set their cookie with either of those two SameSite values. Because the SameSite attribute only affects cross-site requests, same-site requests do unconditionally carry such a cookie.

If you find yourself in a situation where the cookie is marked Lax or Strict, don’t dismiss the possibility of abuse too quickly. Seek instances of cross-site scripting or subdomain takeover on those same-site origins that the CORS-aware resource trusts, as Sam Curry once did to great effect:

This was the saving grace which let me exploit a CORS misconfig which leaked the session token. Very sad to see SameSite cookies but it’s definitely a fun new challenge :)

More about this subtlety in one of my previous posts.

Acknowledgments ¶

I’d like to thank Alesandro Ortiz, who was kind enough to review a draft of this post before publication.