I implemented a CSV Parser with Separation of concerns and SOLID principles in mind.
Does that code match the principles?
Here a sample data (CSV):
Participant;Eagerness;Education;Stability;Total
"John Doe";6.4;"G";1.3;9.0
"John Smith";2.9;"X";1.5;;
Main file:
<?php
require __DIR__ . DIRECTORY_SEPARATOR . 'vendor' . DIRECTORY_SEPARATOR . 'autoload.php';
use XXX\Command\CSVParser;
use XXX\Command\ResultPrinter;
use XXX\Command\ScoreCalculator;
use XXX\Command\ScoreParser;
use XXX\Strategy\AverageStrategy;
use XXX\Strategy\HighestStrategy;
use XXX\Strategy\LowestStrategy;
use XXX\Strategy\TypeStrategy;
$parser = new ScoreParser(new CSVParser(), new ScoreCalculator([]), new ResultPrinter());
try {
if (!isset($argv[1]) || !isset($argv[2])) {
throw new Exception('Missing parameter!' . PHP_EOL);
}
} catch (Exception $e){
die($e->getMessage());
}
$what = ucfirst($argv[1]);
$whoHow = $argv[2];
$strategy = match ($whoHow) {
'lowest' => new LowestStrategy(),
'highest' => new HighestStrategy(),
'average' => new AverageStrategy(),
'type' => new TypeStrategy(),
default => null,
};
try {
$filePath = 'data' . DIRECTORY_SEPARATOR . 'data.csv';
$parser->processCommand($filePath, $what, $whoHow, $strategy);
} catch (Exception $e) {
die($e->getMessage());
}
CSVParser.php
<?php
namespace XXX\Command;
class CSVParser
{
public function parse(string $csvFile): array
{
$lines = file($csvFile);
$data = [];
foreach ($lines as $line) {
$data[] = str_getcsv(trim($line), ';');
}
return $data;
}
}
ResultPrinter.php
<?php
namespace XXX\Command;
use XXX\Strategy\CalculationStrategy;
class ResultPrinter
{
public function printResult(?string $participant, ?string $result, string $header, ?CalculationStrategy $strategy): void
{
if ($result === null) {
$message = "No score found for $header";
} else {
if ($result === '') {
$message = "$participant has no score for $header";
} else {
if ($participant === null) {
$message = "The " . strtolower(
str_replace('Strategy', '', basename(str_replace('\\', '/', get_class($strategy))))
) . " score for $header is $result";
} else {
$message = "$participant scored $result on $header";
}
}
}
echo $message . PHP_EOL;
}
}
ScoreCalculator.php
<?php
namespace XXX\Command;
use Exception;
class ScoreCalculator
{
private array $data;
public function __construct(array $data)
{
$this->data = $data;
}
/**
* @throws Exception
*/
public function getColumnValues(string $header): array
{
$index = $this->getColumnIndex($header);
$values = [];
foreach ($this->data as $row) {
if (isset($row[$index]) && $row[$index] !== '') {
$values[] = $row[$index];
}
}
// Remove header column if exists
$index = array_search($header, $values);
if ($index !== false) {
unset($values[$index]);
}
return $values;
}
/**
* @throws Exception
*/
private function getColumnIndex(string $header): int|string
{
$index = array_search($header, $this->data[0]);
if ($index === false) {
throw new Exception("Column not found: $header");
}
return $index;
}
/**
* @throws Exception
*/
public function getParticipantScore(string $participant, string $header)
{
$participantIndex = $this->getColumnIndex('Participant');
$headerIndex = $this->getColumnIndex($header);
foreach ($this->data as $row) {
if ($row[$participantIndex] === $participant) {
return $row[$headerIndex];
}
}
return null;
}
}
ScoreParser.php
<?php
namespace XXX\Command;
use Exception;
use XXX\Strategy\CalculationStrategy;
class ScoreParser
{
private CSVParser $parser;
private ScoreCalculator $calculator;
private ResultPrinter $printer;
public function __construct(CSVParser $parser, ScoreCalculator $calculator, ResultPrinter $printer)
{
$this->parser = $parser;
$this->calculator = $calculator;
$this->printer = $printer;
}
/**
* @throws Exception
*/
public function processCommand(string $filename, string $what, string $whoHow, ?CalculationStrategy $strategy): void
{
$participant = null;
$data = $this->parser->parse($filename);
$this->calculator = new ScoreCalculator($data);
$participantIndex = filter_var($whoHow, FILTER_VALIDATE_INT);
if ($participantIndex === false) {
$values = $this->calculator->getColumnValues($what);
try {
$result = $strategy->calculate($values);
} catch (\Throwable $t) {
die($t->getMessage() . PHP_EOL);
}
} else {
if (!isset($data[$participantIndex])) {
throw new Exception('Participant not exists!' . PHP_EOL);
}
$participant = $data[$participantIndex][0];
$result = $this->calculator->getParticipantScore($participant, $what);
}
$this->printer->printResult($participant, $result, $what, $strategy);
}
}
CalculationStrategy.php
<?php
namespace XXX\Strategy;
interface CalculationStrategy
{
public function calculate(array $values): float|string;
}
AverageStrategy.php
<?php
namespace XXX\Strategy;
class AverageStrategy implements CalculationStrategy
{
public function calculate(array $values): float
{
$count = count($values);
$sum = array_sum($values);
return $sum / $count;
}
}
The other strategies are similar to the Average (implementing the interface CalculationStrategy).
I also implemented the unit tests for each class, but it's not the focus now.
1 Answer 1
Well, issues are so numerous and diverse here that I don't even know where to start.
From positioning probably. The title says a "CSV Parser". But the actual parser takes just a couple lines. And the rest being some data analyzer and, for some reason, printer. I would rather call it a CSV Analyzer suite or something. But taking all these different things together you are making them very tightly coupled. SOLID, among other things, means that you can use each class for different purposes. But your parser is only usable with Printer.
CSV Parser
Speaking of CSV parsers, one has to keep in mind the following landmarks:
- adherence to standards
- resource efficiency
- functionality
Standards
The CSV standard actually allows a line break in the column value. As long it's quoted, any number of line breaks is allowed inside. And of course, your parser will break on them. That's why a dedicated function, fgetcsv()
should be used when reading from a file. Yes, in your case, for one particular file format, line breaks is not a problem probably. But then you shouldn't call it a CSV parser, but some specific format parser. However, your best bet here is to actually use fgetcsv()
for compatibility and performance reasons.
Besides, CSV format allows arbitrary delimiters as well as enclosure and escape characters. Your parser ignores them all, and therefore cannot be called a CSV parser. It cannot even parse actual comma-separated values. fgetcsv()
is configurable, accepting a list of parameters. Your parser should allow them as well. Then you can create a decorator, some SpecificFormatParser that can have some defaults hardcoded.
Performance
Right now your parser is twice memory inefficient. First, it reads the entire file in memory. Although this memory will be freed after parsing, still it burdens your RAM for no reason. And that's another reason why you should use fgetcsv()
instead of file()/str_getcsv()
. Second, your parser returns an array. And despite the fact that you always need only one line at a time, it reads and returns them all at once.
You should avoid such a waste whenever possible. The usual method for that is simply to do your calculations inside of while
loop that reads lines one by one. But this method is quite inconvenient, especially with OOP. But luckily, PHP offers you an instrument that can let you "pack" the while loop, carry it away and then use elsewhere, under the guise of foreach
. It is called a generator. So a basic implementation could be like this
function parse()
{
$file = new SplFileObject($this->filename);
while (!$file->eof()) {
yield $file->fgetcsv($this->separator, $this->enclosure, $this->escape);
}
}
(I am using SplFileObject
just for sake of it, it looks nicer than meddling with fopen()
)
Then you can add a function that returns a real array, in case it's needed.
Functionality
Currently your class adds nothing to the existing functionality of str_getcsv()
(and even limits it). Why not to add some functionality? Like, reading the header and returning an associative array instead of a list. Or adding a possibility to index the final array by some column's value. Etc.
ResultPrinter
Shouldn't be really a part of this suite. The purpose of this class is unclear and even why it's a class at all, while it's actually just a function. Besides, no function should actually print anything. Just make it return the value. Which then can be either printed or used any other way - added to a email for example.
The calling code.
I am genuinely puzzled, why don't you make it just
if (!isset($argv[1]) || !isset($argv[2])) { die('Missing parameter!' . PHP_EOL); }
I mean, what's the point in throwing here? In the end it's just die with a message. Why bother?
The filename shouldn't be hardcoded but provided through arguments
-
\$\begingroup\$ Thank you so much. I admit I've been too much naive. The most important lesson I learned here is about
yield
. \$\endgroup\$friday-json– friday-json2023年10月21日 17:13:17 +00:00Commented Oct 21, 2023 at 17:13
php main.php Eagerness 1
where the first parameter is the name of the column I want to calculate. the second parameter, if integer, represents the partecipant_id I want to take into account. Instead of an integer, as the second parameter, I can pass a method name, for example, average, and then get the average Eagerness of all the participants (other methods are available but for shortness, I included only the average). \$\endgroup\$