I have an ArrayList of custom objects (DTO) , the structure of DTO:
private String id;
private String text;
private String query;
private String locatorId;
private Collection<String> categories;
private Collection<String> triggers;
I have two task:
- Remove duplicates in Array (it seems OK, I should use HashSet)
- Find objects in ArrayList with the same id field and merge them into one object (I should merge fields categories and triggers) and create final List with merged objects.
What is the most efficient approach for such task? Also I'm interesting to use Lambda expression in my algorithm.
-
2How do you merge text, query and category?m0skit0– m0skit02015年08月03日 09:18:12 +00:00Commented Aug 3, 2015 at 9:18
-
This fields won't be merged (they will be the same, only difference in fields categories and triggers).Evgeniy Kruglov– Evgeniy Kruglov2015年08月03日 09:39:54 +00:00Commented Aug 3, 2015 at 9:39
5 Answers 5
It's quite easy to merge objects by specified key using the stream API. First, define a merge
method in your Entity
class like this:
public Entity merge(Entity other) {
this.categories.addAll(other.categories);
this.triggers.addAll(other.triggers);
return this;
}
Then you can build a custom grouping collector:
import static java.util.stream.Collectors.*;
public static Collection<Entity> mergeAll(Collection<Entity> input) {
return input.stream()
.collect(groupingBy(Entity::getId,
collectingAndThen(reducing(Entity::merge), Optional::get)))
.values();
}
Here we group Entity
elements by the result of getId
method and downstream collector just calls Entity.merge()
when the same id
is encountered (we need to unfold on Optional
additionally). No special hashCode()
or equals()
implementation is necessary for Entity
in this solution.
Note that this solution modifies the existing unmerged Entity
objects. If it's undesirable, create a new Entity
in the merge()
method and return it instead (as in @Marco13 answer).
-
1or
toMap(Entity::getId, e->e, Entity::merge)
as an alternative togroupingBy
-reducing
Misha– Misha2015年08月03日 11:49:03 +00:00Commented Aug 3, 2015 at 11:49 -
1@Misha, thanks. Probably I have too much
groupingBy
way of thinking and always forget abouttoMap
alternative.Tagir Valeev– Tagir Valeev2015年08月03日 11:55:21 +00:00Commented Aug 3, 2015 at 11:55 -
Note also that unless
categories
andtriggers
areSet
s, the merge method will possibly generate duplicate entries into saidCollections
, which may not be what the OP means by "merging".Mick Mnemonic– Mick Mnemonic2015年08月03日 11:57:49 +00:00Commented Aug 3, 2015 at 11:57 -
@MickMnemonic, OP already agreed in the question that using
HashSet
is fine for him.Tagir Valeev– Tagir Valeev2015年08月03日 11:58:39 +00:00Commented Aug 3, 2015 at 11:58 -
groupingBy
wouldn't be so bad if unwrapping theOptional
weren't so awkward. I wish we could doreducing(Entity::merge).andThen(Optional::get)
via a default method onCollector
so it would read more naturally left to right instead of thecollectingAndThen
static.Misha– Misha2015年08月03日 12:00:39 +00:00Commented Aug 3, 2015 at 12:00
Create Map<Integer, DTO>
and put your id as key and object as DTO. And before putting into map just check if it already contain that key and if it does contain that key then take out the DTO object for that key and merge categories and triggers with the old object.
One possible solution, as suggested in the answer by Naman Gala, is to use a Map
from the IDs to the entities, and manually merge the entities when they have the same ID.
This is implemented here in the mergeById
method, with some dummy/example input where
- two entities have to be merged (due to the same ID)
- two entities are equal (they will also be "merged", yielding the same result as one of the inputs)
.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Objects;
public class MergeById
{
public static void main(String[] args)
{
List<Entity> entities = new ArrayList<Entity>();
entities.add(new Entity("0", "A", "X", "-1",
Arrays.asList("C0", "C1"), Arrays.asList("T0", "T1")));
entities.add(new Entity("0", "A", "X", "-1",
Arrays.asList("C2", "C3"), Arrays.asList("T2")));
entities.add(new Entity("1", "B", "Y", "-2",
Arrays.asList("C0"), Arrays.asList("T0", "T1")));
entities.add(new Entity("1", "B", "Y", "-2",
Arrays.asList("C0"), Arrays.asList("T0", "T1")));
entities.add(new Entity("2", "C", "Z", "-3",
Arrays.asList("C0", "C1"), Arrays.asList("T1")));
System.out.println("Before merge:");
for (Entity entity : entities)
{
System.out.println(entity);
}
List<Entity> merged = mergeById(entities);
System.out.println("After merge:");
for (Entity entity : merged)
{
System.out.println(entity);
}
}
private static List<Entity> mergeById(Iterable<? extends Entity> entities)
{
Map<String, Entity> merged = new HashMap<String, Entity>();
for (Entity entity : entities)
{
String id = entity.getId();
Entity present = merged.get(id);
if (present == null)
{
merged.put(id, entity);
}
else
{
merged.put(id, Entity.merge(present, entity));
}
}
return new ArrayList<Entity>(merged.values());
}
}
class Entity
{
private String id;
private String text;
private String query;
private String locatorId;
private Collection<String> categories;
private Collection<String> triggers;
Entity()
{
categories = new LinkedHashSet<String>();
triggers = new LinkedHashSet<String>();
}
Entity(String id, String text, String query, String locatorId,
Collection<String> categories, Collection<String> triggers)
{
this.id = id;
this.text = text;
this.query = query;
this.locatorId = locatorId;
this.categories = categories;
this.triggers = triggers;
}
String getId()
{
return id;
}
static Entity merge(Entity e0, Entity e1)
{
if (!Objects.equals(e0.id, e1.id))
{
throw new IllegalArgumentException("Different id");
}
if (!Objects.equals(e0.text, e1.text))
{
throw new IllegalArgumentException("Different text");
}
if (!Objects.equals(e0.query, e1.query))
{
throw new IllegalArgumentException("Different query");
}
if (!Objects.equals(e0.locatorId, e1.locatorId))
{
throw new IllegalArgumentException("Different id");
}
Entity e = new Entity(e0.id, e0.text, e0.query, e0.locatorId,
new LinkedHashSet<String>(), new LinkedHashSet<String>());
e.categories.addAll(e0.categories);
e.categories.addAll(e1.categories);
e.triggers.addAll(e0.triggers);
e.triggers.addAll(e1.triggers);
return e;
}
@Override
public String toString()
{
return "Entity [id=" + id + ", text=" + text + ", query=" + query +
", locatorId=" + locatorId + ", categories=" + categories +
", triggers=" + triggers + "]";
}
}
The output is
Before merge:
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C0, C1], triggers=[T0, T1]]
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C2, C3], triggers=[T2]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=2, text=C, query=Z, locatorId=-3, categories=[C0, C1], triggers=[T1]]
After merge:
Entity [id=0, text=A, query=X, locatorId=-1, categories=[C0, C1, C2, C3], triggers=[T0, T1, T2]]
Entity [id=1, text=B, query=Y, locatorId=-2, categories=[C0], triggers=[T0, T1]]
Entity [id=2, text=C, query=Z, locatorId=-3, categories=[C0, C1], triggers=[T1]]
Regarding the request to do this with lambdas: It's probably possible to write some tricky entities.stream().collect(...)
application. But since this was not the main goal of the question, I'll leave this part of the answer to someone else (but won't omit this small hint: Just because you can does not mean that you have to. Sometimes, a loop is just fine).
Also note that this could easily be generalized, probably lending some vocabulary from databases. But I think that the main point of the question should be answered.
-
Thanks @Marco13, I'll take this approach as more efficient for me ( later add some Lambda stuff by myself)Evgeniy Kruglov– Evgeniy Kruglov2015年08月03日 11:16:10 +00:00Commented Aug 3, 2015 at 11:16
If you insist to use lambda expression, you can do the following:
Set<X> x = new TreeSet<>((o1, o2) ->
((X)o1).getId().equals(((X)o2).getId()) ? 0 : 1);
List<X> list = new ArrayList<>(set.addAll(x));
This will create a set with unique objects according to their ids. Next, for each object in list
, find the corresponding one from the original list and merge the internal collections.
Implement equals
and hashCode
based on the id
field in the DTO and store the DTOs in a Set
. This should fix both of your problems; given the way equality of your DTOs is now defined, no duplicates with the same id
can exist in the Set
.
EDIT:
As your requirement is to merge the categories and triggers of an existing DTO based on values from a new one, then a better suited data structure for storing DTO
s would be Map<DTO, DTO>
(because it's cumbersome to retrieve elements back from a Set
once you've put them in). Also, I think the categories and triggers in your DTO
should be defined as Set
s, disallowing duplicates; this will make the merge operation much simpler:
private Set<String> categories;
private Set<String> triggers;
Assuming the DTO
provides accessors (getCategories
/ getTriggers
) for the above fields (and that the fields are never null
), merging can now be implemented in the following way:
public static void mergeOrPut(Map<DTO,DTO> dtos, DTO dto) {
if (dtos.containsKey(dto)) {
DTO existing = dtos.get(dto);
existing.getCategories().addAll(dto.getCategories());
existing.getTriggers().addAll(dto.getTriggers());
} else {
dtos.put(dto, dto);
}
}
The above code can also be easily modified to work with a Map<Integer, DTO>
, in which case you don't need to override equals
and hashCode
in the DTO
class.
-
2This won't merge the entities with the same IDs. It will just omit one of those whose ID would otherwise be duplicated. Apart from that, I doubt that a sensible implementation of
hashCode
andequals
is possible when it is solely based on the ID.Marco13– Marco132015年08月03日 09:32:44 +00:00Commented Aug 3, 2015 at 9:32 -
Aggree with @Marco13 , I need not only remove duplicates,but merge objects with the same id .Evgeniy Kruglov– Evgeniy Kruglov2015年08月03日 09:43:25 +00:00Commented Aug 3, 2015 at 9:43
-
@Marco13, an
equals
implementation based on an id alone makes very much sense e.g. when the DTO models an entity that corresponds to a row in a database table.Mick Mnemonic– Mick Mnemonic2015年08月03日 09:50:17 +00:00Commented Aug 3, 2015 at 9:50 -
@Evgeniy, if you need to merge the entities, then you need to tell us how it should happen.Mick Mnemonic– Mick Mnemonic2015年08月03日 09:51:57 +00:00Commented Aug 3, 2015 at 9:51
-
@MickMnemonic One could argue about that, but intuitively, when two objects with the same "ID" have different properties, then this "ID" is not an "ID" but just an arbitrary value. (But I'm aware that this intuition may be wrong when referring to the roles of IDs in databases and this "primary key" stuff etc., which I'm not familiar with...)Marco13– Marco132015年08月03日 09:57:39 +00:00Commented Aug 3, 2015 at 9:57