-
-
Notifications
You must be signed in to change notification settings - Fork 88
Comments
RequestFactory: optimized script path detection performance#35
RequestFactory: optimized script path detection performance #35JanTvrdik wants to merge 2 commits intonette:master from
Conversation
e241565 to
8851c4d
Compare
src/Http/RequestFactory.php
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j should be defined outside. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This a common pattern. Is that confusing? Or is there a bug I don't see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's common pattern if it's used only in scope of cycle. your code use it outside, there is no bug, it's just less readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works in PHP, because it has crappy scope boundaries. But would not work e.g. in C++ (unless -fpermissive is set). +1 for moving it outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's readable enough. And I would not say that PHP has crappy scope boundaries – they just follow different rules than C/C++ (and other languages, of course) and I took advantage of those rules. So I'm keeping it unless @dg says otherwise.
dg
commented
Dec 20, 2014
You should use sentinel (i.e. / at end of string):
$script .= '/'; $path = $url->getPath() . '/'; $max = min(strlen($path), strlen($script)); for ($i = 0, $j = 0; $i < $max; $i++) { if ($path[$i] !== $script[$i] && strcasecmp($path[$i], $script[$i])) { break; } elseif ($path[$i] === '/') { $j = $i; } }
Test it with:
$_SERVER = array(
'REQUEST_URI' => '/test/in',
'SCRIPT_NAME' => '/test/index.php',
);
8851c4d to
1c1ff1f
Compare
JanTvrdik
commented
Dec 21, 2014
Good job with finding the failing use-case, I added it to the tests. The trailing slash was something I intentionally wanted to avoid.
fa5188e to
e8350c4
Compare
JanTvrdik
commented
Dec 21, 2014
One thing I don't understand is why we bother with case-insensitive comparison at all. Shouldn't URL path be generally considered case-sensitive? Also – we don't have unit tests for that. We should either add them or remove the case-insensitive behavior altogether.
Just to make it clear – this is no longer about performance, the optimized script path detection algorithm has close to zero slowdown due to the case-insensitivity. This is about what is technically right and wrong.
JanTvrdik
commented
Dec 21, 2014
One more thing – the case sensitivity was introduced in nette/nette@838abc3. However I'm unable to find a failing use-case for the previous implementation. Both known script path issues (http://forum.nette.org/cs/5932-lepsi-detekce-requesturi-a-scriptpath and nette/nette#489) worked on the ancient version properly.
dg
commented
Dec 21, 2014
It was introduced sooner, in 2008. But I have no idea why... I think it can be removed.
xificurk
commented
Dec 21, 2014
My guess is that the intention was to be compatible with Win-world. How does your webserver on Win handle URLs like http://localhost/DIR/INDEX.HTML or http://localhost/dir/index.html without URL rewriting?
Imho it makes sense to be case insensitive in the path - BFU doesn't care about case-sensitivity and might type in URL or its part in upper case.
dg
commented
Dec 21, 2014
Mac world is case insensitive too.
JanTvrdik
commented
Dec 21, 2014
@xificurk But nette/http is (IMHO) a low-level library, it's not its job to be smart and care much about BFU (that is also why I consider the default URL filters in RouteFactory wrong). Route is case-insensitive by default, but Route is not a low-level library.
xificurk
commented
Dec 21, 2014
@JanTvrdik You're right, this should be solved somewhere else... I was just guessing (not sure if correctly), what was the reason to introduce it in the first place.
e8350c4 to
98cc19a
Compare
98cc19a to
c5aa73e
Compare
JanTvrdik
commented
Dec 21, 2014
I've found on forum and added the missing failing case for the ancient script path detection algorithms.
I've also found what does removing the case-insensitivity break: On Windows (and possibly other platforms with case-insensitive file system) if you access http://localhost/nette/Sandbox/www/ when the directory name is actually lowercase sandbox you get:
REQUEST_URI => "/nette/Sandbox/www/" (19)
SCRIPT_NAME => "/nette/Sandbox/www/index.php" (28)
SCRIPT_FILENAME => "C:/Projects/nette/sandbox/www/index.php" (39)
DOCUMENT_ROOT => "C:/Projects" (11)
Therefore it still works fine even with case-sensitive behavior. However if for some reasons the SCRIPT_NAME would not be available, it fallbacks to SCRIPT_FILENAME and DOCUMENT_ROOT which results in /nette/sandbox/www/index.php which will fail with case-sensitive behavior. Which leads to the very same question I asked in #31 (comment). Under what circumstances is SCRIPT_NAME not available.
Micro-optimization.
Benchmark:
Results: