I'm a hobby coder at best. I know enough to make helpful programs to assist me with day to day life and work (mostly with PHP/JS, some C#). I'm wanting to learn more about good coding practices. I try not to use existing frameworks, mostly so I can learn the basic language, unless I need to for something more advanced/complex.
Below is a class I wrote yesterday for handling CSV files. It's a simple class that either reads from a CSV file or writes to a CSV file. When reading/writing, it also has an option for having a header row (which in my use case, would be associated database columns). It works well and without issue.
I was hoping for some feedback on my style, and how I wrote the program, and if I may be doing anything "wrong". For example, I used to nest if statements a lot, but have been trying to reduce the nesting to make the code easier to read. One such practice I still do is, in the get_csv function, I could assign the results to a temporary variable to handle, and then assign that to the Class's results public variable. Here, I opted to assign directly, and then modify if needed. I'm not sure which would be considered better practice though.
class CSV{
public $header_row = array(), $results = array();
private $filepath = null, $ext = null;
public function __construct($filepath = null){
if($filepath)
$this->set_filepath($filepath);
}
public function set_filepath($filepath){
if(!$this->ValidateFilepath($filepath))
return false;
$this->filepath = $filepath;
return true;
}
public function set_header_row($header_row){
if(!is_array($header_row))
return null;
$this->header_row = $header_row;
}
public function write_csv($data, $header = null, $filepath = null){
if($filepath)
if(!$this->set_filepath($filepath))
return false;
if(!is_array($data))
return;
if(!($file = fopen($this->filepath, "w")))
return false;
if($header && is_array($header))
fputcsv($file, $header);
foreach($data as $csv_data){
if(is_string($csv_data))
$csv_data = array($csv_data);
fputcsv($file, $csv_data);
}
fclose($file);
return true;
}
public function get_csv($filepath = null, $header = false, $delimiter = ","){
if($filepath)
if(!$this->set_filepath($filepath))
return false;
if(!($file = fopen($this->filepath, "r")))
return null;
while(($data = fgetcsv($file, 10000, $delimiter)))
array_push($this->results, $data);
if($header){
$this->header_row = array_shift($this->results);
foreach($this->results as $index => $result)
$this->results[$index] = array_combine($this->header_row, $this->results[$index]);
}
fclose($file);
return $this->results;
}
private function ValidateFilepath($filepath){
$extension = explode(".", $filepath);
if(count($extension) > 2)
return false;
if($extension[1] !== "csv")
return false;
$this->ext = "csv";
return true;
}
}
4 Answers 4
design of Public API
I don't understand this:
public function __construct($filepath = null){
if($filepath)
$this->set_filepath($filepath);
}
public function set_filepath($filepath){
if(!$this->ValidateFilepath($filepath))
return false;
$this->filepath = $filepath;
return true;
}
That is, I don't understand the Use Case where a caller would want to take advantage of that much flexibility.
We have a constructor and a boolean predicate. Apparently you contemplate these two cases:
c = CSV();c.set_filepath("foo.csv")c = CSV("foo.csv")
As the author of a calling app, I can't imagine why I would want that 1st case.
And then in the 2nd case, I still retain the flexbility
to call c.set_filepath("baz.csv"),
but why would I want that?
Better to re-assign c = CSV("baz.csv").
Keep it simple.
On which topic, why do we have a predicate there?
How is it helpful to have c.set_filepath("/path/does/not-exist.csv") return false?
DbC
explains that we'd make life much simpler for the caller
if we throw a fatal error upon learning the path
does not exist, rather than returning false
and expecting the caller to remember to check that boolean.
With exceptions, there's nothing to remember;
we wind up with a Public API which is much easier to use correctly,
every time.
When employing the 2nd use case, it's not even clear to me how I would determine that the ValidateFilepath() succeeded, since I get an object back either way.
tl;dr: Stick to just that 2nd use case. And if caller does get an object back, then we guarantee it shall correspond to a correctly validated file path.
path validation
I don't understand this part of ValidateFilepath():
if(count($extension) > 2)
return false;
if($extension[1] !== "csv")
I mean, we prohibit e.g. "foo.csv.gz", fine.
The > 2 test makes sense.
But then we blindly go ahead and dereference the [1] element,
without asking if we found any "." dots in the filepath?!?
unit tests
I didn't notice any automated unit tests in the OP,
but you need a suite of such tests.
And one of them should try to access "README" (with no dot) as a .CSV,
and verify that validation reports false.
I note in passing that we choose to view "PRICES.CSV" as
invalid, and that's fine. You get to make the rules.
I'm willing to believe that only downcased file extensions
are important for the use cases you care about.
extra args
public function get_csv($filepath = null, ...
No.
Don't do it.
This goes back to "design of Public API".
Caller already specified the path, with c = CSV("foo.csv").
We don't need a setter method.
And we don't need this get_csv() method to act like a setter.
(If we did, we would choose an honest name,
like set_path_and_get_csv(), though
SRP
would have something to say about that.)
Oh, wait, now I see that write_csv() similarly accepts a path.
I'm afraid that would simply not have occurred to me -- I would
expect a separate CSV writer object instead.
Or perhaps a 3-arg function, outside of the class.
I do like that caller can e.g. specify delimiter = ";".
validating a path
if(!($file = fopen($this->filepath, "r")))
return null;
Wait, ValidateFilepath() validated the path,
yet there are still additional failure modes possible?!?
What was the point of doing the validate back in the constructor?
Consider making ValidateFilepath() responsible for verifying
the file is readable.
Yes, this might possibly involve doing extra filesystem work;
the "separation of concerns" benefit is worth it.
A method should do One Thing, and do it well.
By the time we get here, the get_csv() function
should be confident that a class invariant holds:
$this->filepath shall contain a valid path.
In any event, asking the caller to do a null check is terrible. Many callers will neglect to do that. Much better to throw a fatal "file not found" error.
magic number
while(($data = fgetcsv($file, 10000, $delimiter)))
That ten thousand seems pretty arbitrary.
If you make it a default arg in the signature, similar to the delimiter, then the caller
- knows explicitly about the restriction, and
- can easily tailor its value to the caller's Use Case.
BTW, I like the "optional header row" aspect.
Though I might have defaulted it to true,
since most .CSV's I work with begin with column headings.
-
1\$\begingroup\$ Consider making ValidateFilepath() responsible for verifying the file is readable - are you proposing to introduce a TOCTOU bug right there?
fopencan fail, no matter what validations you have performed before - an enemy process could have removed the file or changed its permissions between your validation call and the actual attempt to read the file. This is too common mistake to go unnoticed. Soget_csvcannot be infallible, it should still throw some exception on failure (IDK how exceptions work in PHP, but hopefully you can do that) \$\endgroup\$STerliakov– STerliakov2025年10月25日 23:51:27 +00:00Commented Oct 25 at 23:51 -
1\$\begingroup\$ @STerliakov No, of course not. Clearly we still get an exception raised in such a case. I was trying to preserve at least some of the OP's original design. The OP apparently wanted a "check the path argument" phase, followed by a "read the table rows" phase, perhaps to improve the quality of the diagnostic for what OP believed was a common error like misspelled filename. // I was hoping that a revised codebase might point the way to a streaming rows interface, with smaller memory footprint. As it stands we read all into RAM and return, so no need for a class object. A function suffices. \$\endgroup\$J_H– J_H2025年10月26日 03:51:00 +00:00Commented Oct 26 at 3:51
-
1\$\begingroup\$ "I mean, we prohibit e.g. "foo.csv.gz", fine. The > 2 test makes sense." — I don’t think it does. Yes, it prohibits
foo.csv.gz, but it also prohibitsfoo.bar.csvorTransactions as of 15.08.2025.csv, which are both perfectly valid file names for a CSV file. Validating a file path/name should not care how many dots the file name has, only whether it ends in.csv(if even that – a file doesn’t need to have any extension at all to be valid CSV). \$\endgroup\$Janus Bahs Jacquet– Janus Bahs Jacquet2025年10月27日 13:37:04 +00:00Commented Oct 27 at 13:37
Bug
The ValidateFilepath function is inadequate. If the file name does not contain a dot (has no extension), your code would fail with an "Undefined array key" error. You could simply rely on the pathinfo lib. And maybe make the extension check case-insensitive.
Context handling
Your code should leverage try/catch patterns for exception handling. It would make sense to have a finally clause to close files gracefully eg:
} finally {
if ($handle !== null) {
fclose($handle);
}
}
Or better yet, implement a __destruct magic method in your class to take care of that.
Extension
Some files may have a different extension eg .txt and still be CSV-parsable. So, it's not such a good idea to have a blocking, hard-coded csv extension in your code. There is lots of boilerplate in that class, but it does very little to be useful as it stands.
Delimiter
get_csv accepts a delimiter, while write_csv doesn't. It would make sense to harmonize the two functions and add some flexibility that is currently lacking.
DRY
There is unneeded repetition of code eg: if(!$this->set_filepath($filepath)). I thought the __construct method would take care of that once for all?
Exception handling
The code should be more strict in checking arguments, for example:
public function set_header_row($header_row){
if(!is_array($header_row))
return null;
$this->header_row = $header_row;
}
If the function receives an argument that is not acceptable, it should provide a meaningful error message (can be a class method of some sort). If it returns false or null, it does not tell us much about the problem. And in fact, return codes are not always checked by those who invoke the functions, so the problem could go unnoticed and take more time to debug.
Whereas in other places like write_csv, you transparently alter data so your different functions have divergent behaviors:
if(is_string($csv_data))
$csv_data = array($csv_data);
The problem is the lack of consistency across your code, at the very least, expectations and default behaviors should be documented so there is no surprise to the developer using it.
Performance
How would your code behave with very large files?
In get_csv you do this after collecting data to the results array.
if($header){
$this->header_row = array_shift($this->results);
foreach($this->results as $index => $result)
$this->results[$index] = array_combine($this->header_row, $this->results[$index]);
}
As per the docs:
array_shift() shifts the first value of the array off and returns it, shortening the array by one element and moving everything down. All numerical array keys will be modified to start counting from zero while literal keys won't be affected.
Again, I am thinking about possible performance issues with large datasets. In order to avoid displacing the whole array, I think it would make sense to add the header early, and don't shift the whole of it for nothing. It is a matter of code order.
Keep in mind that I have not run your code, my feedback is based solely on visual analysis, and could be off the mark sometimes. My PHP is very rusty too.
-
\$\begingroup\$ PHP doesn't throw an exception on fopen(). On the other hand, it will close the handle on function's exit. \$\endgroup\$Your Common Sense– Your Common Sense2025年10月28日 07:40:35 +00:00Commented 2 days ago
While it isn't required, it would be good to follow a style guide. One popular set of style guides is the PHP Standards Recommendations - specifically PSR-1 and PSR-12. They have numerous recommendations for keeping code readable and consistent - especially when collaborating with others and your future self.
One of the many recommendations in PSR-12 pertains to 4.3 Properties and Constants
There MUST NOT be more than one property declared per statement.
There are also numerous recommendations about spacing and new-lines for braces, operators, etc. See the analysis results for the code with PSR-1 and PSR-12
Addressing requests/concerns
I was hoping for some feedback on my style, and how I wrote the program, and if I may be doing anything "wrong".
It appears that many conditional statements are not wrapped in braces. This is not a requirement, however it can lead to issues in the future when one attempts to add conditional statements and braces are omitted. For example - perhaps the constructor needs to be updated:
public function __construct($filepath = null){ if($filepath) $this->set_filepath($filepath); }
One might feel the need to add another conditional statement - e.g. to call another function or method - e.g. calling a new method like cache_filepath().
public function __construct($filepath = null){
if($filepath)
$this->set_filepath($filepath);
$this->cache_filepath($filepath);
}
The problem here is that even though the newly added line is indented the same as the line above it, it is not within a conditional block, which may lead to issues if the condition is not met.
For example, I used to nest
ifstatements a lot, but have been trying to reduce the nesting to make the code easier to read.
In both write_csv() and get_csv() the first three lines are as follows:
if($filepath) if(!$this->set_filepath($filepath)) return false;
The conditionals could be combined using the && logical operator:
if ($filepath && !$this->set_filepath($filepath)) {
return false;
}
Other review points
Method naming style is inconsistent
There are many methods with kabob_case - e.g. set_filepath, set_header_row(), etc. however the last one is proper/Pascal case: ValidateFilepath. Perhaps the casing is different because that last method has private scope?
Use type declarations for properties, arguments and return types.
Hopefully given the supported versions of PHP the code is running on PHP 8 or higher. Per the PHP documentation:
Type declarations can be added to function arguments, return values, as of PHP 7.4.0, class properties, and as of PHP 8.3.0, class constants. They ensure that the value is of the specified type at call time, otherwise a TypeError is thrown. 1
Object oriented approach could reduce manual File I/O operations
There is a native class SplFileObject which offers methods like fgetcsv() and fputcsv(). Using those methods could lead to fewer calls to fopen() and fclose().
It's not a class
I would say the main problem here is CSV not really being a class. But rather a collection of functions just "namespaced" under the same class.
One can even take write_csv(), and, after removing rather useless set_filepath() call, use it as a standalone function.
If you want to make it a class, then make use of the properties you have, instead of passing filepath as every function's parameter.
On the other hand, there is a class property, $results, which is utterly useless as a property. Why would you have it? I suppose you just did it "because it's a class". Well not every variable in a class needs to be a property.
Silent behavior
Another big problem is that your code never informs you of any errors it encountered. You are saying:
It works well and without issue.
And it's a very common cognitive bias among CS interns. Yes, it works. As long as everything goes fine. But your program can be only considered "working well" if it behaves correctly when something goes wrong, namely helps you to pinpoint the problem and fix it ASAP.
So instead of silently returning false your code should raise a particular error.
Memory
Your get_csv returns an array and also it uses twice more memory than needed, at some moment keeping two copies of the same data.
Consider only a single pass and returning a generator instead of array.
-
1\$\begingroup\$ I am still wondering if anyone is reading all this feedback \$\endgroup\$Your Common Sense– Your Common Sense2025年10月28日 07:25:05 +00:00Commented 2 days ago
$this->ext, why is it an instance attribute? \$\endgroup\$