Jump to content
Wikimedia Meta-Wiki

PDF doc search

From Meta, a Wikimedia project coordination wiki
This article is considered of unknown usefulness and may be a candidate for deletion.
If you want to revive discussion regarding the subject, you may try using the talk page or start a discussion at Meta:Babel.

See PDF doc search II for another solution to this problem.

  • Apache
  • Mandrakelinux

I modified the standard /includes/SpecialUpload.php page so that uploaded PDF documents with the .pdf extension will have their contents indexable.

I started by downloading and installing the XPDF tool. XPDF includes a Linux command line utility that will convert a PDF doc's text to ASCII and output it.

Then I modified SpecialUpload.php where it tests for a successful upload and just before it inserts the uploaded file information into the database. What this does is make the text of the PDF document an HTML comment block in the description text of the image's file page.

A user must change their preferences to search Images to be able to search the image's page (or add images to the default namespace search).

if( $this->saveUploadedFile( $this->mUploadSaveName,
 $this->mUploadTempName,
 !empty( $this->mSessionKey ) ) ) {
 /**
 * Update the upload log and create the description page
 * if it's a new file.
 */
 # MHART replace $textdesc with <!-- text from doc if .d
 if (strtolower($finalExt) == "pdf") {
 $NewDesc = $this->mUploadDescription . "\r\n" . "<!-- ";
 $toexec = "/usr/bin/pdftotext " . $this->mSavedFile . " -";
 exec($toexec, $DocText);
 foreach ($DocText as $DocLine) {
 $NewDesc .= "\r\n" . str_replace("-->","",$DocLine);
 }
 $NewDesc .= "\r\n" . " -->";
 }
 else
 $NewDesc = $this->mUploadDescription;
 ####
 wfRecordUpload( $this->mUploadSaveName,
 $this->mUploadOldVersion,
 $this->mUploadSize, 
 $NewDesc, # MHART - this line has been changed
 $this->mUploadCopyStatus,
 $this->mUploadSource );
 $this->showSuccess();
 }

My actual script is a bit different - because I'm handling other file types in similar fashion. Here's the combined documentation.

--MHart 17:15, 10 May 2005 (UTC) [reply ]

Modification to fix broken description

[edit ]

I think you need to modify the $this->mUploadSize line too, otherwise you will truncate your new description. I've changed it to strlen($NewDesc).

So update this function call

 wfRecordUpload( $this->mUploadSaveName,
 $this->mUploadOldVersion,
 $this->mUploadSize, 
 $NewDesc, # MHART - this line has been changed
 $this->mUploadCopyStatus,
 $this->mUploadSource );

with

 wfRecordUpload( $this->mUploadSaveName,
 $this->mUploadOldVersion,
 strlen($NewDesc), # MARKSW - this line has been changed
 $NewDesc, # MHART - this line has been changed
 $this->mUploadCopyStatus,
 $this->mUploadSource );

Other than that, it's great! Thanks for this :) Marksw 11:58, 7 October 2005 (UTC) [reply ]

AltStyle によって変換されたページ (->オリジナル) /