I am reading the book Clean Code by Rob Martin and am trying to write the cleanest possible code.
I wrote this script as part of my project. Could you please criticise it and let me know how it could be improved?
/**
* File Downloading, Unzipping and extracting Content - PHP
*
* Usage:
*
* $fileImporter = new FileImporter($fileReference);
* $nextRowAsArray = $fileImporter->readFileRowAsArray();
*
* where $fileReference is File Name without Extension
* $nextRowAsArray is associative Array representing File Entry
*
* If $fileReference.zip is not found, it is downloaded using FILE_DOWNLOAD_PATH
* The archiv $fileReference.zip is automatically extracted whenever needed,
* and first archiv Entry $file.txt is taken
*
* The File $file.txt is expected of the form:
*
* field_name1|field_name2
* entry1_field1|entry1_field2
* entry2_field1|entry2_field2
* ...
* entryN_field1|entryN_field2
*
* Here FILE_ENTRY_SEPARATOR is '|'
*
* Then the returned arrays $nextRowAsArray are subsequently of the form:
*
* array( 'field_name1' => 'entry1_field1', 'field_name2' => 'entry1_field2', )
* array( 'field_name1' => 'entry2_field1', 'field_name2' => 'entry2_field2', )
* ...
* array( 'field_name1' => 'entryN_field1', 'field_name2' => 'entryN_field2', )
*/
class FileImporter {
private $_download_path = FILE_DOWNLOAD_PATH;
private $_separator = FILE_ENTRY_SEPARATOR;
private $_fileReference;
private $_fileHandleCache = null;
private $_fileHeadRow = array();
public function __construct ($fileReference) {
$this->_fileReference = $fileReference;
}
public function readFileRowAsArray () {
$fileHandle = $this->_getCachedOrNewFileHandle();
$fileEntryArray = $this->_getNextRowAsArray($fileHandle);
if ( $this->_fileHeadRow ) {
$keys = $this->_fileHeadRow;
$values = $fileEntryArray;
} else {
$this->_fileHeadRow = $fileEntryArray;
$keys = $fileEntryArray;
$values = $this->_getNextRowAsArray($fileHandle);
}
return array_combine($keys, $values);
}
private function _getCachedOrNewFileHandle () {
return $this->_fileHandleCache ? $this->_getCachedFileHandle() : $this->_getNewFileHandle();
}
private function _getNextRowAsArray ($fileHandle) {
$fileNextLine = fgets($fileHandle);
if ( feof($fileHandle) ) return array();
$fileNextLineTrimmed = trim($fileNextLine);
return explode($this->_separator, $fileNextLineTrimmed);
}
private function _getCachedFileHandle () {
return $this->_fileHandleCache;
}
private function _getNewFileHandle () {
$zipFileName = $this->_getZipFileName();
if (!file_exists($zipFileName)) $this->_downloadFile($zipFileName);
return $this->_unzipAndGetFileHandle($zipFileName);
}
private function _getZipFileName () {
// return "{$this->_fileReference . '.zip';
return "{$this->_fileReference}.zip";
}
private function _downloadFile ($fileName) {
$sourceStream = fopen($this->_downloadPath . $fileName, 'r') or die("Could not download $fileName");
$destinationStream = fopen($fileName, 'w');
echo "\n Downloading $fileName\n";
stream_copy_to_stream($sourceStream, $destinationStream);
}
private function _unzipAndGetFileHandle ($zipFileName) {
$firstZipEntryName = $this->_getFirstZipEntryName($zipFileName);
if ( !file_exists($firstZipEntryName) ) $this->_ZipExtract($zipFileName);
$fileHandle = fopen($firstZipEntryName, "rt") or die(" Can't Read File $firstZipEntryName");
$this->_fileHandleCache = $fileHandle;
return $fileHandle;
}
private function _getFirstZipEntryName ($zipFileName) {
if ( !file_exists($zipFileName) ) {
throw new Exception("Attempting to read non-existing Zip File $zipFileName, exitting ....");
}
$zipArchiv = zip_open($zipFileName);
$firstZipEntry = zip_read($zipArchiv);
$firstZipEntryName = zip_entry_name($firstZipEntry);
zip_close($zipArchiv);
return $firstZipEntryName;
}
private function _zipExtract ($zipFileName) {
echo "\n Begin extraction at ", date('r').", file $zipFileName";
$zipArchiv = new ZipArchive;
$zipArchiv->open($zipFileName);
$zipArchiv->extractTo(".");
$zipArchiv->close();
}
}
Here are my tests, which have all passed:
/**
* Testing FileImporter
*/
// case 0: both Zip and Text files exist - creating
function createTextAndZipFiles ($fileRef, $textData) {
$textFile = $fileRef . '.txt';
$zipFile = $fileRef . '.zip';
// creating text file
file_put_contents($textFile, $textData);
// creating zip file
$zip = new ZipArchive;
$zip->open($zipFile, ZipArchive::CREATE);
$zip->addFile($textFile);
$zip->close();
}
$testFile = 'file-test';
$testText = <<<EOT
day|year
10|2011
31|1843
EOT;
createTextAndZipFiles($testFile, $testText);
$importer_test = new FileImporter($testFile);
$rowArray = $importer_test->readFileRowAsArray();
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
// empty row
$testText = <<<EOT
day|year
10|2011
31|1843
EOT;
createTextAndZipFiles($testFile, $testText);
$importer_test = new FileImporter($testFile);
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['day'] === '10' );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
// too long row
$testText = <<<EOT
day|year
10|2011|ha
31|1843
EOT;
createTextAndZipFiles($testFile, $testText);
$importer_test = new FileImporter($testFile);
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
// too short row
$testText = <<<EOT
day|year
2011
31|1843
EOT;
createTextAndZipFiles($testFile, $testText);
$importer_test = new FileImporter($testFile);
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
// case 1: Zip exists but Text doesn't
createTextAndZipFiles($testFile, $testText);
// delete Text
unlink($testFile . '.txt');
$importer_test = new FileImporter($testFile);
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
// case 2: Text exists but Zip doesn't
// faking download by using local subdirectory
// creating subdirectory 'test'
$subdirName = 'test';
// make sure directory exists and is not a file
if ( file_exists($subdirName) ) {
if ( !is_dir($subdirName) ) {
unlink($subdirName);
mkdir($subdirName);
}
} else {
mkdir($subdirName);
}
createTextAndZipFiles($testFile, $testText);
// now move zipfile into the subdirectory
$zipFileName = $testFile . '.zip';
rename($zipFileName, $subdirName . '/' . $zipFileName);
$importer_test = new FileImporter($testFile, $subdirName . '/');
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray === array() );
$rowArray = $importer_test->readFileRowAsArray();
assert( $rowArray['year'] === '1843' );
1 Answer 1
I haven't read the book you mention, although it looks good so I might buy it and have a read.
A few things for me
There are echo statements in the code, I am assuming that is for debugging and you are going to remove them.
In a couple of places you use die(), and then elsewhere you throw an exception. Personally I would throw exceptions and let it bubble up to somewhere that can handle them properly.
Your class is called fileImporter, but it also has a download aspect to it as well. I am wondering if fileDownloader should be separated out.
You have a private getter function
private function _getCachedFileHandle () {
return $this->_fileHandleCache;
}
Which I find unusual, I normally only use publicly accessible getters.
// It is just as easy to use this internally
$f= $this->_fileHandleCache;
// as it is to use, so why make it more complex
$f = $this->_getCachedFileHandle();
Code reuse, when I design stuff I only create a separate method if I can reuse the same code, or due to it's complexity/size of the function it is best to break it into smaller parts to make it more readable.
// this bit of code is only called once and is simple, so it could be included inline
public function readFileRowAsArray () {
$fileHandle = $this->_getCachedOrNewFileHandle();
// like this
public function readFileRowAsArray () {
$fileHandle = $this->_fileHandleCache ? $this->_getCachedFileHandle() : $this->_getNewFileHandle();
There also appears to be no way to overide these private variables which are set to constant values.
class FileImporter {
private $_download_path = FILE_DOWNLOAD_PATH;
private $_separator = FILE_ENTRY_SEPARATOR;
If that is the case, then why re-declare then, why not use FILE_DOWNLOAD_PATH and FILE_ENTRY_SEPARATOR directly in your code
If you do want to allow them to be overridden, then you could extend your constructor
public function __construct ($fileReference, $download_path = FILE_DOWNLOAD_PATH, $separator = FILE_ENTRY_SEPARATOR) {
$this->_fileReference = $fileReference;
$this->_download_path = $download_path ;
$this->_separator = $separator ;
}
Hope that gives you some ideas to think about