I have this code to remove duplicates (all occurrences) from an associative array, does PHP have methods to do this ? Or is there a way to improve the code ?
I looked for array_unique, array_search, array_map, array_reduce...
$articles = [
[
"id" => 0,
"title" => "lorem",
"reference" => "A"
],
[
"id" => 1,
"title" => "ipsum",
"reference" => "B"
],
[
"id" => 2,
"title" => "dolor",
"reference" => "C"
],
[
"id" => 3,
"title" => "sit",
"reference" => "A"
]
];
$references = array_column($articles, "reference");
$duplicates = array_values(array_unique(array_diff_assoc($references, array_unique($references))));
foreach($duplicates as $duplicate) {
foreach($references as $index => $reference) {
if($duplicate === $reference) {
unset($articles[$index]);
}
}
}
/**
* $articles = [
* [
* "id" => 1,
* "title" => "ipsum",
* "reference" => "B"
* ],
* [
* "id" => 2,
* "title" => "dolor",
* "reference" => "C"
* ]
* ]
*/
2 Answers 2
This task can and should be done with a single loop with no pre-looping variable population and no inefficient in_array()
calls. Searching keys in php is always more efficient than searching values.
Code: (Demo)
$found = [];
foreach ($articles as $index => ['reference' => $ref]) {
if (!isset($found[$ref])) {
$found[$ref] = $index;
} else {
unset($articles[$index], $articles[$found[$ref]]);
}
}
var_export($articles);
Output:
array (
1 =>
array (
'id' => 1,
'title' => 'ipsum',
'reference' => 'B',
),
2 =>
array (
'id' => 2,
'title' => 'dolor',
'reference' => 'C',
),
)
I am using array destructuring syntax in my foreach()
for brevity and because I don't need the other column values.
Finally, it doesn't matter if there are triplicates (or more instances of a reference
value), the script will handle these in the same fashion. unset()
will not generate any notices, warnings, or errors if it is given a non-existent element (as a parameter) -- this is why it is safe to unconditionally unset the first found reference
potentially multiple times.
is there a way to improve the code ?
Instead of having two foreach
loops:
foreach($duplicates as $duplicate) { foreach($references as $index => $reference) { if($duplicate === $reference) { unset($articles[$index]); } } }
It can be simplified using in_array()
:
foreach($references as $index => $reference) {
if (in_array($reference, $duplicates, true)) {
unset($articles[$index]);
}
}
While it would still technically have the same complexity (i.e. two loops) it would have one less indentation level, and utilize a built-in function to check if the reference is in the list of duplicate references.
Another solution would be to use array_flip()
to map the last index to references, then loop through the articles and if the index of the current article does not match the index of the last reference (meaning its a duplicate) then remove both the article at the current index as well as the article at the last index that has the reference.
$references = array_column($articles, "reference");
$lastIndexes = array_flip($references);
foreach ($articles as $index => $article) {
if ($lastIndexes[$article['reference']] !== $index) {
unset($articles[$index], $articles[$lastIndexes[$article['reference']]]);
}
}
Or to make it more readable, the last index can be assigned to a variable:
foreach($articles as $index => $article) {
$lastIndex = $lastIndexes[$article['reference']];
if ($lastIndex !== $index) {
unset($articles[$index], $articles[$lastIndex]);
}
}