Extract each occurrence from an array of objects

Question 1

I have a huge list of 4k+ software products.

My goal is simple. I have a list of products. Each of those products has a software value. I would like to extract each software occurrence and covert it into an object in the format {text, id}.

The code below works as I need, but I would like to know if you have a better way to do it.

softwareProducts.reduce((acc = [], {software_id, software_name}) => {
 if (acc.find((row) => row.id === software_id) === undefined) {
 acc.push({
 text: software_name,
 id: software_id
 })
 }
 return acc;
}, []);

Question 2

acc.find() is doing a linear search and going all the way to the end of the array for any item it doesn't find. If the list is long, keeping a separate Set object of row ids may be a faster way to just determine if you've already added this one to the list or not.

Question 3

Where does the variable row come from? Is it supposed to come from the softwareProducts array?

Question 4

Please share a sample input.

Question 5

This is tagged both React and Node. Does the code run in the browser or the server? If it doesn't matter, I'd remove both tags.

Question 6

Your explanation of what the code does is a little bit unclear. The algorithm you've shown dedupes an array of objects based on ids, choosing the first occurrence of each id. It's not so much that you're choosing "each occurrence" as much as "first occurrence".

You can improve on the linear lookup (.find() loops over the whole acc every time), making the quadratic algorithm a linear one using a Set, which offers constant time lookups. This essentially removes the inner loop in exchange for a bit of extra space and a bit of allocation overhead, likely well worth it on 4k+ items (always benchmark to be sure):

const softwareProducts = [
 {software_id: 1, software_name: "foo"},
 {software_id: 2, software_name: "bar"},
 {software_id: 1, software_name: "ignore this"},
 {software_id: 3, software_name: "baz"},
 {software_id: 1, software_name: "ignore this"},
 {software_id: 3, software_name: "ignore this"},
];
const resultIds = new Set();
const result = [];
for (const {software_id: id, software_name: text} of softwareProducts) {
 if (!resultIds.has(id)) {
 resultIds.add(id);
 result.push({id, text});
 }
}
console.log(result);

.reduce feels a little overworked acting as a .map and .filter. It's usually clearer as an explicit loop as shown above.

Normally, JS uses camelCase, but I assume software_name comes from an API you don't have control over. In that case, it's acceptable to use, but use softwareName if the code is under your control.

acc = [] is unnecessary; just use acc. The default argument is a no-op.

The return value of softwareProducts.reduce needs to be assigned to something. I assume you're doing so in your actual code.

See also Create array of unique objects by property for other approaches.

Question 7

An array of objects is being created from an array of objects. The obvious performance improvement would be to not create the new array of objects but simply remove the duplicates from the original list. All I'm saying is that without knowing the input data format, nothing definitive can be said about performance tweaking.

Question 8

@radarbob Intuitive as that may seem, don't think that'll help. Removing one element from a list is O(n) unless it's the back element, but you can't make that guarantee here, so that would be quadratic. Feel free to post your own answer, but I'm pretty sure this will outperform it. The only microoptimization I see here is avoiding destructuring and new allocations of each object, but I doubt that's worth it so I didn't bother mentioning it.

ggorlen ggorlen 4,1572 gold badges19 silver badges28 bronze badges · Accepted Answer · 2024-06-14 23:18:43Z

Your explanation of what the code does is a little bit unclear. The algorithm you've shown dedupes an array of objects based on ids, choosing the first occurrence of each id. It's not so much that you're choosing "each occurrence" as much as "first occurrence".

You can improve on the linear lookup (.find() loops over the whole acc every time), making the quadratic algorithm a linear one using a Set, which offers constant time lookups. This essentially removes the inner loop in exchange for a bit of extra space and a bit of allocation overhead, likely well worth it on 4k+ items (always benchmark to be sure):

const softwareProducts = [
 {software_id: 1, software_name: "foo"},
 {software_id: 2, software_name: "bar"},
 {software_id: 1, software_name: "ignore this"},
 {software_id: 3, software_name: "baz"},
 {software_id: 1, software_name: "ignore this"},
 {software_id: 3, software_name: "ignore this"},
];
const resultIds = new Set();
const result = [];
for (const {software_id: id, software_name: text} of softwareProducts) {
 if (!resultIds.has(id)) {
 resultIds.add(id);
 result.push({id, text});
 }
}
console.log(result);

.reduce feels a little overworked acting as a .map and .filter. It's usually clearer as an explicit loop as shown above.

Normally, JS uses camelCase, but I assume software_name comes from an API you don't have control over. In that case, it's acceptable to use, but use softwareName if the code is under your control.

acc = [] is unnecessary; just use acc. The default argument is a no-op.

The return value of softwareProducts.reduce needs to be assigned to something. I assume you're doing so in your actual code.

See also Create array of unique objects by property for other approaches.

An array of objects is being created from an array of objects. The obvious performance improvement would be to not create the new array of objects but simply remove the duplicates from the original list. All I'm saying is that without knowing the input data format, nothing definitive can be said about performance tweaking.
@radarbob Intuitive as that may seem, don't think that'll help. Removing one element from a list is O(n) unless it's the back element, but you can't make that guarantee here, so that would be quadratic. Feel free to post your own answer, but I'm pretty sure this will outperform it. The only microoptimization I see here is avoiding destructuring and new allocations of each object, but I doubt that's worth it so I didn't bother mentioning it.

Stack Exchange Network

Extract each occurrence from an array of objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Extract each occurrence from an array of objects

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions