Secondary indexes vs Using elastic search

Question 1

When does it make sense to put data in elastic search vs creating secondary indexing on Primary datastore? Elastic search with another primary store

Pros:

Primary datastore can be optimised for read write usecases.
Elastic search suports more than just key value matching like fuzzy match, etc.

Cons:

Out of sync with primary datastore
two more component to manage (ES as well a pipeline to insert in ES)
Would need some sort of Change data capture capability from Primary datastore.

Secondary indexes on Primary datastore

Pros:

Less moving parts.
Less consistency issues ( because secondary indexes can be eventually consistant)

Cons

Not all datastore support secondary indexing
Secondary index queries are more oftan scatter gather, doing it on higher QPS will limit read write qps on primary access patterns like read, write by PK

Are there other considerations while deciding this?

Question 2

I was given good advice by an architect, early in my career: design and extra complexity, to improve performance, before you’ve discovered where the actual bottle decks in the system are. Otherwise, you will likely end up optimizing the wrong area.

I’d recommend trying this out with full (fake?) volume on your primary data store with a secondary index to determine if that beat your performance requirements and is the actual bottleneck.

Question 3

It would depend on the scale of your problem.

If you have identified one new query in the business domain that will be be stable and used regularly, but is inefficient with the current schema, add the supporting index. Each DML must now keep this in sync with the base table, however. So the system, overall, has more work to do. Latency on everything will increase everso slightly.

If instead the requirement is to support arbitrary ad hoc queries over all tables something like Elasticsearch will be the answer. The cost being that of syncing the two stores and the latency of that process.

At some point the cumulative cost of incrementally adding those secondary indexes will be more than that of replicating to another storage engine. If you envisage that future as most likely you can design it from the outset. Otherwise go with the secondaries, monitor, and be prepared to switch.

I have maintained several large RDBMS applications. Often tables on latency-sensitive paths will have several secondary indexes, sometimes many. And I have chosen not to add indexes for non-sensitive workloads (say, overnight batch reporting) to minimise the impact on those same tables. There is a balance to be found according to what is important to this application, and there are no free lunches.

Egret Egret 4142 silver badges7 bronze badges · Answer 1 · 2023-03-26 17:02:36Z

I was given good advice by an architect, early in my career: design and extra complexity, to improve performance, before you’ve discovered where the actual bottle decks in the system are. Otherwise, you will likely end up optimizing the wrong area.

I’d recommend trying this out with full (fake?) volume on your primary data store with a secondary index to determine if that beat your performance requirements and is the actual bottleneck.

Michael Green Michael Green 9235 silver badges17 bronze badges · Answer 2 · 2025-01-12 20:42:31Z

It would depend on the scale of your problem.

If you have identified one new query in the business domain that will be be stable and used regularly, but is inefficient with the current schema, add the supporting index. Each DML must now keep this in sync with the base table, however. So the system, overall, has more work to do. Latency on everything will increase everso slightly.

If instead the requirement is to support arbitrary ad hoc queries over all tables something like Elasticsearch will be the answer. The cost being that of syncing the two stores and the latency of that process.

At some point the cumulative cost of incrementally adding those secondary indexes will be more than that of replicating to another storage engine. If you envisage that future as most likely you can design it from the outset. Otherwise go with the secondaries, monitor, and be prepared to switch.

I have maintained several large RDBMS applications. Often tables on latency-sensitive paths will have several secondary indexes, sometimes many. And I have chosen not to add indexes for non-sensitive workloads (say, overnight batch reporting) to minimise the impact on those same tables. There is a balance to be found according to what is important to this application, and there are no free lunches.

Stack Exchange Network

Secondary indexes vs Using elastic search

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Secondary indexes vs Using elastic search

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions