What's an efficient data structure to make a lot of queries between parents/childs?

Question 1

I have the following scenario for which I've been experiencing performance problems:

The context is a level editor of a new engine for an old video game (whose source is unavailable) in Unity. Basically I'm writing a level editor for Unity to shape old data structures into contemporary, more usable ones, i.e. parts/entities instead of a bunch of unrelated models/polygons.

Here's a tree of how things are organized in the old game:

Container (dozens)
- Models (hundreds)
  - Polygons (thousands)

Here's what I'd like to achieve:

Group(s): for grouping related objects
- Object(s): a tree, a ship etc ...
  - Part(s): an object's parts with specific material etc ...

Depending what the final object looks like on the screen, the initial data (models/polygons) is a mess, where there are merged or unrelated parts in a model and its polygons.

Other problems emerge due to this, for instance, what used to be a simple pointer update to update a texture or color is simply not possible from within Unity.

Here things I need and what I've done:

I need to be able to track implicit and explicit references to models and polygons. This is to ensure that valid objects are built and therefore prevent incorrect content.

Basically the user can create a part from either the model or or one of its polygons, the opposite being disallowed depending what's chosen first.

Here's a picture with explanations:

Scene9 is of type Container
M123_456_ABC... are of type Model
P123_456_ABC... are of type Polygon
A green icon is an explicit reference
A yellow icon is an implicit reference
A red icon is a non-referenced item

enter image description here

Also, the user interface is augmented by elements getting disabled whenever appropriate to help user picking the right actions when building the level, etc ...

Here, the issues I am experiencing:

Querying for explicit/implicit references between around 300 models and 7000 polygons becomes slow over time as relationships gets created.

I've been using a couple of List<T> and do LINQ queries between them:

everything
all models
all polygons
implicit models
implicit polygons
explicit models
explicit polygons

Minus the spaghetti logic, it does works but it's slow. Then I've upgraded to HashSet<T> with custom Comparer<T> which greatly improved speed but it's still not fast enough. All this results in freezes in the user interface.

Main consideration is that within Unity, loops are rather tight, calls at high rate, immediate mode UI and no asynchronous calls features (Mono is roughly .NET 3.5).

I am starting to consider that my approach using lists is flawed and was thinking that a better data structure could be used for performant queries.

Can you suggest a more appropriate data structure to use for such scenario ?

EDIT

Here is a diagram showing the new approach I'm trying after your suggestions:

enter image description here

Notes:

IScenePrimitive consists of 3 int and an Enum
ISceneReference.IsRef is simply IsRefExplicit || IsRefImplicit
Missing things here are a dictionary in Scene for quick access to a model/polygon, when user selects something I retrieve these using hashes without having to enumerate, etc ...

Interesting parts in code:

SceneModel:

 public override bool IsRefExplicit
 {
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 Debug.Assert(_polygons.None(s => s.IsRefExplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = true;
 }
 else
 {
 Debug.Assert(_polygons.All(s => s.IsRefImplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = false;
 }
 }
 }
 public override bool IsRefImplicit
 {
 get { return _isRefImplicit; }
 set
 {
 _isRefImplicit = value;
 if (value)
 Debug.Assert(_polygons.Any(s => s.IsRefExplicit));
 }
 }

ScenePolygon:

 public override bool IsRefExplicit
 {
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 Debug.Assert(!Model.IsRefExplicit);
 Model.IsRefImplicit = true;
 }
 else
 {
 Debug.Assert(Model.IsRefImplicit);
 Model.IsRefImplicit = false;
 }
 }
 }
 public override bool IsRefImplicit
 {
 get { return _isRefImplicit; }
 set
 {
 _isRefImplicit = value;
 Debug.Assert(Model.IsRefExplicit);
 }
 }

That's all that really happens, LINQ on a dozen items instead of many more (300 models querying each 7000 polygons X). I still have to test it to see how well it performs but it should perform much faster, to be continued.

Question 2

"Mono is roughly .NET 3.5" -> Theraot.Core will give you all async goodness even in .NET 2.0 #ShamelessPromotion - I'm having trouble wrapping my head around the queries you are doing, yet I think you need some Memoization scheme.

Question 3

Thanks for the links, taking a look at them ! Well, I too but considering there are about 40 levels to build using mouse, each with ~10K polys, whatever 'helper' that can alleviate this error-prone process is welcome :)

Question 4

Try a B-tree. You can use a binary search to get nodes, which means you'll have O(log n) efficiency.

Question 5

Having worked on many graphical models with higher complexity, I am having difficulty understanding why you need queries at all, and especially why you would use LINQ. The most compact, fastest form of a "reference" is a pointer (reference in C#), but that can be replaced with an integer indexing an object array if you need to be able to swap an object without affecting references.

Question 6

Additionally, all child objects should have a reference (integer or pointer) to its parent.

Question 7

Let's take a look at SceneModel's IsRefExplicit. I've added comments:

SceneModel

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 // If the model has an explicit reference
 // then all its polygon have an implicit reference
 Debug.Assert(_polygons.None(s => s.IsRefExplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = true;
 }
 else
 {
 // If the model doesn't have an explicit reference
 // then all its polygon don't have an implicit reference
 Debug.Assert(_polygons.All(s => s.IsRefImplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = false;
 }
 }
}

This can be done by simply changing the implementation of ScenePolygon to query the Model:

SceneModel

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set { _isRefExplicit = value; }
}

ScenePolygon

public override bool IsRefImplicit
{
 get { return Model.IsRefExplicit; }
 set { throw new NotSupportedException(); }
}

No more loop.

Of course the above is under tha assumption that IsRefImplicit on ScenePolygon will only be set by SceneModel. If this assumption is not true, then perhaps you shouldn't do polygon.IsRefImplicit = false; on SceneModel because the IsRefImplicit may have been set to true by another code.

Now, let's take a look at ScenePolygon's IsRefExplicit:

ScenePolygon

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 // If the polygon has an explicit reference
 // then the model has an implicit reference
 Debug.Assert(!Model.IsRefExplicit);
 Model.IsRefImplicit = true;
 }
 else
 {
 // If the polygon doens't have an explicit reference
 // then the model doens't have an implicit reference?
 // ...
 // Could there be another polygon of the model with an explicit reference?
 // If there is, then the model should remain with an implicit reference
 Debug.Assert(Model.IsRefImplicit);
 Model.IsRefImplicit = false;
 }
 }
}

In this case what you need is reference counting. Keep in the SceneModel how many ScenePolygon does it have with IsRefExplicit and let ScenePolygon increment it or decrement it as needed.

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 if (_isRefExplicit == value)
 {
 return;
 }
 _isRefExplicit = value;
 if (value)
 {
 Model.IncrementImplicitRefCount();
 }
 else
 {
 Model.DecrementImplicitRefCount();
 }
 }
}

Then the model can implement IsRefImplicit by checking if the current reference count is greater than 0.

For abstract, we could write the rules like this:

Model.IsRefImplicit = contains at least one polygon with IsRefExplicit.
 Have a counter, and check if it is greater than 0.
 There is no setter.
Model.IsRefExplicit = all the polygons have IsRefImplicit.
 Store a bool backing field.
Polygon.IsRefImplicit = belongs to an model with IsRefExplicit.
 Read Model.IsRefExplicit
 There is no setter.
Polygon.IsRefExplicit = should mark the model IsRefImplicit.
 Store a bool backing field.
 The setter will increment the counter of the Model when set to true,
 and decrement it when set to false,
 do nothing if the value didn't change.

Then the concern is to annotate IsRefExplicit and let IsRefImplicit be populated by the code you have. So, you would be reading the source data and looking in some dictionary for the object it references, and annotating it. If the object is not in the dictionary you create it and add it.

If you change your code to use Interlocked (use Increment and Decrement for the reference count, and use CompareExchange and Exchange on an int set to 0 or 1 instead of the bool backing fields) then the resulting code will be thread-safe, and you will be able to have multiple threads writing IsRefExplicit.

If you change your lists/dictionaries to thread-safe solutions such as ConcurrentDictionary, then the structure can also be populated by multiple threads.

Question 8

Looks great. To make it clear, all primitives initial state is not referenced, I will only mark a primitive as explicit from the outside, I will never set implicit but just read it. First comes first served for explicit, between a model and one of its polygon (i.e. if model is used, polygons cannot; if polygon(s) is/are, model can't be; it's really a matter of preventing duplicates in the output). The counter thing is great and I'll use a dictionary so I can implement a select first non-referenced feature, etc... Going to improve from your answer and come back, thanks !

Question 9

@Aybe consider having a set (set, bag, list or dictionary) with all the elements that are created and non-referenced - then if you give the elements that set, they can remove themselves when they are marked as referenced, and finally you can iterate over all the non-referenced items using that set. Edit: it means that you won't need to iterate over the set of all elements and check if they are not referenced. Similar solution can be done for referenced elements.

Question 10

Yes, I'm on it, implementing counters, removing LINQ, adding global dictionaries ... I'll be back, thanks !

Theraot Theraot 9,2612 gold badges30 silver badges37 bronze badges · Accepted Answer · 2016-10-14 23:50:40Z

Let's take a look at SceneModel's IsRefExplicit. I've added comments:

SceneModel

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 // If the model has an explicit reference
 // then all its polygon have an implicit reference
 Debug.Assert(_polygons.None(s => s.IsRefExplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = true;
 }
 else
 {
 // If the model doesn't have an explicit reference
 // then all its polygon don't have an implicit reference
 Debug.Assert(_polygons.All(s => s.IsRefImplicit));
 foreach (var polygon in _polygons)
 polygon.IsRefImplicit = false;
 }
 }
}

This can be done by simply changing the implementation of ScenePolygon to query the Model:

SceneModel

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set { _isRefExplicit = value; }
}

ScenePolygon

public override bool IsRefImplicit
{
 get { return Model.IsRefExplicit; }
 set { throw new NotSupportedException(); }
}

No more loop.

Of course the above is under tha assumption that IsRefImplicit on ScenePolygon will only be set by SceneModel. If this assumption is not true, then perhaps you shouldn't do polygon.IsRefImplicit = false; on SceneModel because the IsRefImplicit may have been set to true by another code.

Now, let's take a look at ScenePolygon's IsRefExplicit:

ScenePolygon

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 _isRefExplicit = value;
 if (value)
 {
 // If the polygon has an explicit reference
 // then the model has an implicit reference
 Debug.Assert(!Model.IsRefExplicit);
 Model.IsRefImplicit = true;
 }
 else
 {
 // If the polygon doens't have an explicit reference
 // then the model doens't have an implicit reference?
 // ...
 // Could there be another polygon of the model with an explicit reference?
 // If there is, then the model should remain with an implicit reference
 Debug.Assert(Model.IsRefImplicit);
 Model.IsRefImplicit = false;
 }
 }
}

In this case what you need is reference counting. Keep in the SceneModel how many ScenePolygon does it have with IsRefExplicit and let ScenePolygon increment it or decrement it as needed.

public override bool IsRefExplicit
{
 get { return _isRefExplicit; }
 set
 {
 if (_isRefExplicit == value)
 {
 return;
 }
 _isRefExplicit = value;
 if (value)
 {
 Model.IncrementImplicitRefCount();
 }
 else
 {
 Model.DecrementImplicitRefCount();
 }
 }
}

Then the model can implement IsRefImplicit by checking if the current reference count is greater than 0.

For abstract, we could write the rules like this:

Model.IsRefImplicit = contains at least one polygon with IsRefExplicit.
 Have a counter, and check if it is greater than 0.
 There is no setter.
Model.IsRefExplicit = all the polygons have IsRefImplicit.
 Store a bool backing field.
Polygon.IsRefImplicit = belongs to an model with IsRefExplicit.
 Read Model.IsRefExplicit
 There is no setter.
Polygon.IsRefExplicit = should mark the model IsRefImplicit.
 Store a bool backing field.
 The setter will increment the counter of the Model when set to true,
 and decrement it when set to false,
 do nothing if the value didn't change.

Then the concern is to annotate IsRefExplicit and let IsRefImplicit be populated by the code you have. So, you would be reading the source data and looking in some dictionary for the object it references, and annotating it. If the object is not in the dictionary you create it and add it.

If you change your code to use Interlocked (use Increment and Decrement for the reference count, and use CompareExchange and Exchange on an int set to 0 or 1 instead of the bool backing fields) then the resulting code will be thread-safe, and you will be able to have multiple threads writing IsRefExplicit.

If you change your lists/dictionaries to thread-safe solutions such as ConcurrentDictionary, then the structure can also be populated by multiple threads.

Looks great. To make it clear, all primitives initial state is not referenced, I will only mark a primitive as explicit from the outside, I will never set implicit but just read it. First comes first served for explicit, between a model and one of its polygon (i.e. if model is used, polygons cannot; if polygon(s) is/are, model can't be; it's really a matter of preventing duplicates in the output). The counter thing is great and I'll use a dictionary so I can implement a select first non-referenced feature, etc... Going to improve from your answer and come back, thanks !
@Aybe consider having a set (set, bag, list or dictionary) with all the elements that are created and non-referenced - then if you give the elements that set, they can remove themselves when they are marked as referenced, and finally you can iterate over all the non-referenced items using that set. Edit: it means that you won't need to iterate over the set of all elements and check if they are not referenced. Similar solution can be done for referenced elements.
Yes, I'm on it, implementing counters, removing LINQ, adding global dictionaries ... I'll be back, thanks !

Stack Exchange Network

What's an efficient data structure to make a lot of queries between parents/childs?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

What's an efficient data structure to make a lot of queries between parents/childs?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions