InfoQ Homepage News RavenDB 5 Improves Distributed Time-Series, Document Compression, and Indexing
RavenDB 5 Improves Distributed Time-Series, Document Compression, and Indexing
This item in japanese
Aug 03, 2020 2 min read
Write for InfoQ
Feed your curiosity. Help 550k+ globalsenior developers
each month stay ahead.Get in touch
RavenDB, a NoSQL document database with multi-document ACID transactions, smart document compression, adds distributed time-series support, and enhanced indexing in the RavenDB version 5 release.
The compression of documents is rarely about single large documents. As explained by Oren Eini, CEO of RavenDB:
Documents in RavenDB can be of arbitrary size. Technically, they are limited to 2 GB in size, but if you get anywhere near that, you have other issues. The worst case I have seen is a 700+ MB file, but RavenDB will issue warnings if you have documents that exceed the 5 MB range. This is mostly because the cost of sending those MB range documents back and forth. RavenDB itself is doing quite fine with those documents. However, most documents tend to be much smaller. The typical data size of documents is in the order of few to low dozens of KBs.
Instead, the challenges are typically related to compressing values inside RavenDB. Users of RavenDB would complain about repeating the same JSON structure on every document due to RavenDB not having a schema, and running into storage issues with a large number of rarely touched documents.
Other databases, such as PostgreSQL and MySQL, have mechanisms for value compression. For RavenDB 5, Zstd provides fast and efficient compression. Eini explains that,
If we can train the algorithm on the documents, we can get great benefits from removing redundancies across documents. What ends up happening is that as you write documents into a compressed collection, RavenDB watches your data and learn how to best compress it. The more you write, the more information RavenDB has to find the optimal dictionary to compress your data. This way, we are able to individually compress and decompress documents, while still retaining great compression rates.
RavenDB 5 also introduces handling data with time-series data-points with values ordered by time. Integrated into the RavenDB document model and distributed environment, time-series behavior extends specific documents to preserve context and keep operations simple. Time-series data get kept separate from the documents they extend to modify these data without changing the document rapidly. Distributed clients and nodes modify time-series concurrently, and modifications get merged without conflicts.
Time-series support in RavenDB 5 includes new APIs and GUI management, transactional guarantees, efficient querying and aggregation against large datasets, etc.
Indexing improves in RavenDB 5 to create static indexes for time-series and distributed counters and supports the use of compare-exchange values within indexes.
RavenDB 5 adds static indexing support for distributed counters values and compare-exchange keys from an index.
The RavenDB client API adds refinements for attachments, bulk insertion, compare-exchange, load balancing, patching, subscriptions, and serialization.
Further details on all RavenDB 5 improvements and changes are available in the RavenDB 5 changelog. Or watch Eini demonstrate key features of RavenDB 5:
[フレーム]
RavenDB provides on-premise or cloud services options with AWS and Azure. RavenDB provides many open-source clients for various environments, including Node.js, Python, Java, Ruby, C++ and more.
The RavenDB client is open-source software available under the MIT license for communicating with the RavenDB application. All other RavenDB usage occurs under the AGPLv3 license. RavenDB commercial licenses including a free option are available for those who do not wish to follow the terms of the AGPLv3 license.
Contributions are welcome via the RavenDB contributions guindelines which includes a code of conduct.
This content is in the Cloud topic
Related Topics:
-
Related Editorial
-
Related Sponsors
-
Popular across InfoQ
-
AWS Introduces ECS Managed Instances for Containerized Applications
-
Producing a Better Software Architecture with Residuality Theory
-
GitHub Introduces New Embedding Model to Improve Code Search and Context
-
Google DeepMind Introduces CodeMender, an AI Agent for Automated Code Repair
-
Building Distributed Event-Driven Architectures across Multi-Cloud Boundaries
-
Elena Samuylova on Large Language Model (LLM)-Based Application Evaluation and LLM as a Judge
-
Related Content
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example