2
\$\begingroup\$

Due to a weird server setup on one of my client's websites, I needed to setup a script to load referenced pdf files through a case-insensitive lookup. We originally looked into mod_speling, but it was causing issues with other mod_rewrite declarations. I'm not entirely sure why, but I didn't have much control over those other declarations, since they had other people working on their site as well.

So anyway, they have a bunch of references to their PDF files all over the internet with a bunch of different capitalization. Realizing that getting all of these to switch over to the proper file name was gonna be nearly impossible, I opted to write a script to do the translation for me.

The script forces http://www.somedomain.com/somedirectory/subdirectory/anypdffile.pdf to load through my pdffiles.php file.

The script seems to work perfectly. I'm not too worried about server performance, as I don't think it's gonna be used all that often. I just wanted to get a second pair of eyes to make sure I didn't leave any major security loopholes in here since I'm reading a directory/file from a $_GET variable.

First I added this to the .htaccess file to redirect all pdf files to a php script

RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_URI} \.pdf$ [NC]
RewriteRule (.*) pdffiles.php?p=1ドル

Then put the following code in the pdffiles.php file:

<?php
// shorthand for our docroot
$root = $_SERVER['DOCUMENT_ROOT'];
// get our filepath (correct format is "somedir/otherdir/xyz.pdf")
$path= $_GET['p'];
// Rule 1 - Don't allow empty path
if (!empty($path)) {
 // get the directory for our file
 $dir = dirname($root.'/'.$path);
 // get the filename
 $file = basename($path);
 // get our extension
 $ext = pathinfo($file, PATHINFO_EXTENSION);
 // Rule 2 - Don't allow direct access to the script
 if (!strstr($_SERVER['REQUEST_URI'], 'pdffiles.php')) {
 // Rule 3 - Ensure PDF extension
 if (strtolower($ext) == "pdf") {
 // Rule 3 - Make sure document root + path is a real path
 if (($dir = realpath($dir))) {
 // Loop through directory contents
 if ($fh = opendir($dir)) {
 while (false !== ($tfile = readdir($fh))) {
 // ignore the default directories
 if ($tfile != "." && $tfile != "..") {
 // convert both to all lower case and try to match
 if (strtolower($tfile) == strtolower($file)) {
 // if they match then go ahead and output file
 header("Content-type: application/pdf");
 readfile($dir.'/'.$tfile);
 exit; 
 }
 }
 }
 }
 }
 }
 }
}
// we failed one of our checkpoints, redirect to 404
header("HTTP/1.1 301 Moved Permanently"); 
header("Location: /notfound.html");
?>
asked Jan 21, 2013 at 20:46
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Not sure if it's an exploitable issue, but what would happen if someone set p to a relative path like ../cgi/example.pdf? I can't forsee any issues though as the pdf extension check will stop you mistakenly serving up internal files. \$\endgroup\$ Commented Jan 22, 2013 at 5:50
  • \$\begingroup\$ @JasonLarke yeah I thought about that after I wrote it. They aren't really storing any sensitive PDFs anywhere on the server, so I'm not entirely worried about that, but it would definitely be an issue if it was in another context. \$\endgroup\$ Commented Jan 28, 2013 at 16:51
  • \$\begingroup\$ You may not have had any sensitive documents on the server when you wrote this script but how can you be sure that's still the case in two months? Checking for path traversal is almost always a very good idea. \$\endgroup\$ Commented Feb 21, 2013 at 23:33

1 Answer 1

1
\$\begingroup\$

As Jason stated checking for ".." is a good idea. If there are only a limited number of path where the pdfs are located on the server, I would even prefer matching against this set.

Nevertheless, you might also want to invert your ifs to get rid of the nesting and use guard conditions.

<?php
 // !!!! comment intention, not what your code is doing
 // Don't allow direct access to the script
 if (strstr($_SERVER['REQUEST_URI'], 'pdffiles.php')) notFound();
 // expected format is "somedir/otherdir/xyz.pdf"
 $path= $_GET['p'];
 if (empty($path)) notFound();
 $root = $_SERVER['DOCUMENT_ROOT'];
 $dir = realpath(dirname($root.'/'.$path));
 if ($dir===FALSE) notFound();
 $file = basename($path);
 $ext = pathinfo($file, PATHINFO_EXTENSION);
 if (strtolower($ext) != "pdf") notFound();
 if ($fh = opendir($dir)) {
 while (false !== ($tfile = readdir($fh))) {
 if ($tfile == "." || $tfile == "..") continue;
 if (strtolower($tfile) != strtolower($file)) continue;
 header("Content-type: application/pdf");
 readfile($dir.'/'.$tfile);
 exit; 
 }
 }
 function notFound() {
 // we failed one of our checkpoints, redirect to 404
 header("HTTP/1.1 301 Moved Permanently"); 
 header("Location: /notfound.html");
 exit();
 }
// ? > !!!! omit this. It's not necessary and you will have no empty line in you php files.

Maybe you want to also have a look at the DirectoryIterator

answered Jan 22, 2013 at 7:06
\$\endgroup\$
6
  • \$\begingroup\$ The concept of a "document directory" is a good one, most sites I've worked on generally store all their documents in a specific folder (with various levels of sub-folders). I'm not sure how the OP's filesystem is set up (and it sounds like he's inherited a bit of a mess), but something like this may be possible. \$\endgroup\$ Commented Jan 22, 2013 at 13:36
  • \$\begingroup\$ @David In addition to that you might want to check if the filenames are unique so you can drop the path completely and create the new "document directory" we recommend. \$\endgroup\$ Commented Jan 22, 2013 at 15:11
  • \$\begingroup\$ Ideally I would build the document directory, but this is kind of low budget and just a quick fix. The client didn't have the money for the time involved with doing it that way. As far rewriting the code to use guard conditions, does that actually provide any sort of benefit, other than making the code more readable? Don't get me wrong, I'm all about readable code, but if the existing code works as is, is there a point to rewriting it this way? \$\endgroup\$ Commented Jan 28, 2013 at 16:46
  • \$\begingroup\$ @David The code is your realm. You have to work and live with it. Of course no customer will pay for a refactor/cleanup/test item on the invoice, you just have to include it in your estimation and just do it. Nevertheless even on a short run a cleaner, more readable and structured code will speed up your issue fixing, this means either more time for cleaning up other code or lower cost for your customer. Especially on long term projects you waste a lot of time in understanding your own code after some months. \$\endgroup\$ Commented Jan 29, 2013 at 4:57
  • 1
    \$\begingroup\$ @David Just for completenes: broken window theory and leave the place cleaner than you found it \$\endgroup\$ Commented Feb 2, 2013 at 7:25

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.