Sometimes I end up with a function that requires checking multiple conditions to determine the outcome and the logic can get to be too much to handle on a single line. Here's an example of how I currently handle this:
function validateType($type,$data) {
if ( (isset($this->validationRules[$type]['requiredThing']) && !in_array($data['thing'],$this->validationRules[$type]['requiredThing']))
|| (isset($this->validationRules[$type]['restrictedThing']) && in_array($data['thing'],$this->validationRules[$type]['restrictedThing']))
|| (isset($this->validationRules[$type]['requiredGroups']) && !array_intersect($data['groups'],$this->validationRules[$type]['requiredGroups']))
|| (isset($this->validationRules[$type]['restrictedGroups']) && array_intersect($data['groups'],$this->validationRules[$type]['restrictedGroups']))) {
return false;
} else {
return true;
}
}
Previously I was doing something a bit more readable (?) but changed my approach due to the fact that it seems repetitive and a bit redundant:
function validateType($type,$data) {
if (isset($this->validationRules[$type]['requiredThing']) && !in_array($data['thing'],$this->validationRules[$type]['requiredThing']) {
return false;
} else if (isset($this->validationRules[$type]['restrictedThing']) && in_array($data['thing'],$this->validationRules[$type]['restrictedThing']) {
return false;
} else if (isset($this->validationRules[$type]['requiredGroups']) && !array_intersect($data['groups'],$this->validationRules[$type]['requiredGroups']) {
return false;
} else if (isset($this->validationRules[$type]['restrictedGroups']) && array_intersect($data['groups'],$this->validationRules[$type]['restrictedGroups'])) {
return false;
} else {
return true;
}
}
I'm looking for advice on how this can be improved for the most readable and efficient way to format complex conditions like these. I've seen different approaches out there but so far nothing strikes me as particularly elegant and personally I'm still not completely happy with what I'm doing.
EDIT:
As requested here is a bit more context with a complete functioning example of the type of validation that's going on here:
$data = [
'person' => [
'groups' => ['dancers','clowns'],
'thing' => 'corn'
],
'place' => [
'groups' => ['tropical','volcano'],
'thing' => 'house'
]
];
$validation = new Validation();
$data = $validation->processData($data);
print_r($data);
print_r($validation->errors);
class Validation {
public $errors = [];
private $validationRules = [
'person' => [
'restrictedGroups' => ['strangers','clowns','bankers'],
],
'place' => [
'restrictedGroups' => ['desert','tundra'],
'requiredThings' => ['house','lawn','chair']
]
];
private function validateType($type,$data) {
if ( (isset($this->validationRules[$type]['requiredThings']) && !in_array($data['thing'],$this->validationRules[$type]['requiredThings']))
|| (isset($this->validationRules[$type]['restrictedThings']) && in_array($data['thing'],$this->validationRules[$type]['restrictedThings']))
|| (isset($this->validationRules[$type]['requiredGroups']) && !array_intersect($data['groups'],$this->validationRules[$type]['requiredGroups']))
|| (isset($this->validationRules[$type]['restrictedGroups']) && array_intersect($data['groups'],$this->validationRules[$type]['restrictedGroups']))) {
return false;
} else {
return true;
}
}
public function processData($datas) {
foreach ($datas as $type => $data) {
if (!$this->validateType($type,$data)) {
unset($datas[$type]);
$this->errors[] = $type . ' FAILED VALIDATION';
}
}
return $datas;
}
}
OUTPUT:
Array
(
[place] => Array
(
[groups] => Array
(
[0] => tropical
[1] => volcano
)
[thing] => house
)
)
Array
(
[0] => person FAILED VALIDATION
)
1 Answer 1
The quick answer
Given your two approaches, I would probably lean towards a slight variation on the original. All things considered, there isn't really a whole lot of difference between them. Personally, I prefer the concept of "check this, then check this, then check this" as opposed to an if with a bunch of or
conditions. To go into far too much detail (this is codereview after all), I would consider this a balancing act between three things:
- You don't want any line to be too long
- You don't want to use vertical-space unecessarily
- You want your code to be easy to understand
The first two issues tend to lead to bugs, and the latter is just a necessity when writing code. I like your original solution because I find it easier to understand than a long if
condition, but that is more of a personal preference, and I don't see either one really being "better".
I would make one small change though, and pull out $this->validationRules[$type]
into its own separate variable, for the sake of brevity:
function validateType($type,$data) {
$rules = $this->validationRules[$type];
if (isset($rules['requiredThing']) && !in_array($data['thing'],$rules['requiredThing']) {
return false;
}
// more here ...
}
I think this little change would make either solution a lot easier to read. However, you do have a small problem. Your code will cause an error if your data is missing a bit of data, for instance:
$data = [
'person' => [
'groups' => ['dancers','clowns']
]
];
Presumably you have code elsewhere that makes sure that $data
has also necessary bits of data. However, considering the connection between type
and the required keys is hidden away in your class configuration, I bet it will be easy to accidentally leave something out, resulting in an error. I think your best bet is to explicitly check the data array. You could approach that in a couple different ways. The safest is to simply throw an exception if something is missing. Or you could treat it as "not having a value" which means you would check out okay for the restricted
tests, but return a validation error for the required
tests. Doing this means that neither of your current solutions will work: you'll need more code and this method will turn larger and very repetitive. That's okay though, because personally I think a different approach will work better for you anyway.
Another approach
One option is to make each check its own function. You could do it with anonymous functions or with methods, but you could imagine it looking something like this:
private $validationRules = [
'person' => [
'restrictedGroups' => ['strangers','clowns','bankers'],
],
'place' => [
'restrictedGroups' => ['desert','tundra'],
'requiredThings' => ['house','lawn','chair']
]
];
protected $validationRuleCallables = [
'restrictedGroups' => function($type, $data, $groups){
if (!isset($data['groups'])){
throw new Exception('A restrictedGroup rule was set, but no groups were found!');
}
if (!is_array($groups)){
throw new Exception("Invalid configuration for validation rule $type: it should be an array of restricted groups");
}
return (bool) array_intersect($data['groups'], $groups);
},
// more here
];
public function validateType($type, $data){
if (!isset($this->validationRules[$type])){
throw new Exception("Cannot validate type '$type' because it has no rules");
}
foreach ($this->validationRules[$type] as $ruleName => $ruleConfig){
$ruleChecker = $this->validationRuleCallables[$ruleName] ?? false;
if (!$ruleChecker){
throw new Exception("Type '$type' specifies validation rule '$ruleName', but that validation rule does not exist");
}
return call_user_func($ruleChecker, $type, $data, $ruleConfig);
}
}
This has more code, but it has more code for two good reasons: better separation of concerns, but more importantly, better input checking. A big weakness in your current setup is that it assumes that the rules are properly configured, and the data is properly arranged. If these are something you use or modify on a semi-regular basis, getting the configuration wrong is going to happen fairly regularly too. For instance, with the current setup, if someone calls validateType
with a type not defined in $this->validationRules
, you'll end up with an index error from PHP, and figuring out that the cause of the problem is a forgotten validation rule, or an invalid type, is going to be very hard to do. Especially in cases like this where your validation rules and data aren't directly connected, it's really easy for the developer to misconfigure something, and even more important for the code to provide clear error messages. This will save a ton of development time later: so much so that I would simply consider it a necessity, even if it takes more lines of code.
The best approach
Personally I'm not a fan of using functions in cases like this. I started with that because it is less of a change from where you already are. It can work okay if you really only have four validation rules, but if you have to start adding more "types" of rules, it will get unmanageable very quickly. Instead, the solution is to use classes. Create a namespace for your validationTypes, get a nice interface for all the classes in that namespace, and then the $type
passed in to validateType
can basically just be the name of a class in the given namespace. Load up an object of that class (use a factory as desired, especially to cache objects, since they will be stateless), call its validate
method with the necessary data, and return true/false depending on what it returns.
This requires a bit more code for plumbing, but using classes and namespaces makes it very easy to manage your validation rules. If you need to add a new validation rule, you just drop in another class in that namespace. If you want to know how a validation rule works, you just find it in the namespace with a file named after its type. Super easy to manage. Again, I understand that if you literally just have four validation rules, this would probably be a bit of overkill (although it would still help with your organization). In my experience though, these things rarely stay simple.
I'm going to leave it at that for now, but if this is of interest to you I could throw up some example code of what such a thing would look like.
Last Note
Finally, I think it is worth pointing out that this class of yours is definitely doing too much. In particular, you have the configuration of your validation rules (private $validationRules
), and the actual validation private function validateType
happening in the same class. The problem with this is that it means this can only ever validate one type of thing. Another approach would be for this to be a general-purpose validator, that either accepts the list of validation rules in its constructor, or which is designed to be extended for particular uses-cases, and the $validationRules
are hard-coded in the extending child class. In both cases, this would make this function much more re-usable. Currently, you have this entire validation class setup up to only validate one "kind" of thing. That's a big waste. It's better to set this up so that it's explicit job is "I will validate things, but you have to tell me the rules". Again, I'm not going to delve into too much detail on that point, right now, but I'm happy to add more details if this is something of interest to you.
Edit for class-based solution example
There are obviously lots of ways to do this, but I'll outline a variation on a pattern that I have frequently used that I find works well. I'll simplify a few points for the sake of a coherent example. The quick overview is:
- You have a reusable
validator
class - You have reusable
validatorRule
classes which you put together to perform actual validation - The
validator
class is responsible for receiving configuration, generatingvalidatorRule
objects to match, and calling them to validate input - For simplicity, I'm going to go with the version where the
validator
configuration comes in through the constructor. I normally go with subclassing myself, which I might explain briefly at the end.
All that in mind, let's pick an example and a context to focus on. I'm going to keep it typical: we are validating user input from a form to create a car record. We've got a car
model, our validator
class, and our validatorRule
classes, which live in the validation\rules
namespace (which maps to the app/validation/rules
directory in our project root). Our car
model has a few pieces of data that come in from the form, and their validation rules:
- Make: required, must be one of ['Honda','Toyota','Tesla']
- Year: required, must be a number, must be between 1950 and 2020
- Miles: required, must be a number, must be greater than zero
- Color: required
- Vin: required, unique
We can see some similarities: lots of "required" checks, whitelisting values, must be a number, number must be in a range, etc. Each of these will become a validationRule
class. For simplicity, our car model will have a check
method that generates the validator
object, calls a check
method on it, and returns the results. Let's assume that we want to return an array of validation error messages. I'm going to ignore factories/DI and have the objects generate their necessary child objects directly, for the sake of simplicity. Here is an excerpt from your car
model:
class car extends modelBase{
public function check($user_input){
$validator = new validator([
'make' => [ 'required' => [], 'whiteList' => [ 'Honda', 'Toyota', 'Tesla' ] ],
'year' => [ 'required' => [], 'numeric' => [], 'minValue' => 1950, 'maxValue' => 2020 ],
'miles' => [ 'required' => [], 'numeric' => [], 'minValue' => 0 ],
'color' => [ 'required' => [] ],
'vin' => [ 'required' => [], 'unique' => [] ]
]);
return $validator->check($user_input);
}
}
To be clear, the calling sequence of the validator
class is to pass in a configuration array with validation rules (much like your $this->validationRules
property). The key in the array is the name of the field being checked, and the value is an array of validation rules with their configurations. The key in the validation rules array corresponds to a class name in the validation\rules
namespace, and the value is whatever configuration information is needed by the validationRule
class to do its job. This will vary from one validation rule to the next, so documentation (and clear errors), help a lot.
Naturally, we have 6 files in our validation\rules
namespace:
- app/validation/rules/required.php
- app/validation/rules/whiteList.php
- app/validation/rules/numeric.php
- app/validation/rules/minValue.php
- app/validation/rules/maxValue.php
Possibly also an abstract base class and an interface. I would keep these stateless, and each would have a method called check
which receives the user input, the rule configuration, and the name of the field they are checking. They return true
if there are no errors, or a string with an error message. Some don't need any configuration, and are nice and simple. For example:
<?php
namespace validation\rules;
class required extends base{
public function check($field, $userInput, $config){
if (empty($userInput[$field])){
return "Missing required field $field";
}
return true;
}
}
Or for another:
<?php
namespace validation\rules;
class minValue extends base{
public function check($field, $userInput, $config){
// don't check the value if we don't have a value (that is the job of `required`)
// however, don't be fooled by PHP's loose comparison: 0 should return an error
// if the minValue is 1.
$incoming = $userInput[$field] ?? '';
if (!$incoming && $incoming !== '0' && $incoming !== 0){
return true;
}
if ($incoming < $config){
return "$field should be at least $config";
}
return true;
}
}
Each individual rule is quite simple, and easy to follow. It is also easy to see what your validation rules are, because you can just look in your app/validation/rules
folder and see exactly what they are. You can open each one up, and (presuming you put in good doc strings), you can see exactly what configuration settings are available for the given configuration rule. Now you just need the validator
class to put it all together:
<?php
namespace validation;
class validator{
protected $rules;
public function __construct($rules){
// check the rules real quick
foreach ($rules as $fieldName => $fieldRules){
foreach ($fieldRules as $ruleName => $ruleConfig){
// better to do this with a factory than directly: keeping it easy for the example
if (!class_exists("validation\rules\$ruleName"))
throw new \Exception("Invalid validation rule: could not find rule named $ruleName");
}
}
$this->rules = $rules;
}
public function check($userInput){
$errors = [];
foreach ($this->rules as $fieldName => $fieldRules){
foreach ($fieldRules as $ruleName => $ruleConfig){
// again, a factory would be perfect here, especially one that caches objects
$class = "validation\rules\$ruleName";
$ruleValidator = new $class();
$result = $ruleValidator->check($fieldName, $userInput, $ruleConfig);
if ($result === true)
continue;
// one error per field
$errors[$fieldName] = $result;
break;
}
}
}
}
This is fairly simplified of course. A lot of times you want to see not just the user input, but also the current model data (for instance, a required
check can allow non-existent input for a field that is being updated when the model already has a value), possibly a database connection (needed for a unique
check), and other details (user-controllable error messages). However, a system like this gives you a lot of flexibility to re-use validation rules to build validators quickly and easily.
-
\$\begingroup\$ thanks Conor. that's a lot to digest. I should clarify a couple things here: (1) this class is extending another class as part of a large application and although I agree that it's doing too much, it's not going to be realistic to create the ideal structure you are suggesting. (2) the above code is an example and the existence of those indexes is indeed checked. I see how that would be impossible to know given my example - and I am also aware that "examples" are discouraged here on codereview - but for a variety of reasons, writing something the demonstrates the concept was my best option. \$\endgroup\$You Old Fool– You Old Fool2017年12月09日 18:07:41 +00:00Commented Dec 9, 2017 at 18:07
-
\$\begingroup\$ ... with that in mind, yes I am indeed interested in an example of the class and namespace structure you mentioned above! I'm sure there will be plenty of opportunities in the future to benefit from a different approach. thanks for the time and energy \$\endgroup\$You Old Fool– You Old Fool2017年12月09日 18:09:35 +00:00Commented Dec 9, 2017 at 18:09
-
\$\begingroup\$ @billynoah Indeed, it's always hard to extract a code-review out of large classes. I think you gave a perfect amount of context. I threw up a large update regarding how you might organize a simple class-based solution. I obviously wouldn't consider this a final-answer, but rather a decent starting point. \$\endgroup\$Conor Mancone– Conor Mancone2017年12月10日 00:48:49 +00:00Commented Dec 10, 2017 at 0:48
passedRestriction($data, $type, $restriction)
andpassedRequirement($data, $type, $requirement)
) and let it return true or false for your checks. If all those checks return true yourvalidateType
function should return true. Also please don't name your variables$data
and$type
, it can get confusing. I don't know how those changes would affect performance however. \$\endgroup\$