[Question] Why we use relational/SQL based backend storage in GraphDB? · apache/hugegraph · Discussion #2531

Github-Polaris
Apr 16, 2024

Problem Type (问题类型)

None

Before submit

我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: xx vertices, xx edges

Your Question (问题描述)

hugegraph的backend采用了mysql,rocksdb,cassandra、hbase等数据库,连最底层的数据存储都没有脱离关系型数据库,为什么可以叫Graph DB?我理解,所谓的Graph DB相当于在关系型数据库上做了一层图形化界面,把关系型数据库中的数据展示为图,底层数据的存储结构与形式仍未脱离关系型数据库,号称的检索速度等优势是如何体现的?

The backend of hugegraph uses mysql, rocksdb, cassandra, hbase and other databases, and even the lowest data storage is not separated from the relational database, why can it be called Graph DB? DB is equivalent to making a graphical interface on the relational database, displaying the data in the relational database as a graph, and the storage structure and form of the underlying data are still not separated from the relational database.

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

Replies: 6 comments

The core of HugeGraph's designation as a graph database lies not in its storage mechanism but in how it models, stores, and queries data. HugeGraph uses a graph data model, representing data as vertices (nodes) and edges (relationships), which fundamentally differs from the tabular model of relational databases. This model facilitates direct and efficient querying of relationships and patterns within the data, leveraging graph-specific operations and algorithms.

HugeGraph supports various backends like MySQL, RocksDB, Cassandra, and HBase, not as a limitation to relational models but as a flexible foundation for storing graph data. On top of these storage systems, HugeGraph implements graph-specific features such as schema management, indexing strategies for efficient query processing, and graph traversal optimizations. These features enable HugeGraph to provide the advantages of graph databases, such as improved search speeds and efficient data relationship exploration, regardless of the underlying storage system.

The use of traditional databases for storage allows HugeGraph to leverage their robustness, scalability, and persistence capabilities while abstracting their complexities away from the user. This abstraction is achieved through a unified graph processing interface that translates graph operations into database-specific queries and updates, ensuring that the graph model's benefits are fully realized.

In essence, HugeGraph's ability to serve as a true graph database is derived from its implementation of graph-specific operations, optimizations, and its support for graph query languages, rather than the nature of its storage backend. This approach allows HugeGraph to offer the unique advantages of graph databases, including efficient handling of complex queries and relationships, which are central to the graph data model.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

imbajin
Apr 16, 2024
Collaborator

Simply put, in addition to the content mentioned by @dosu-bot, HugeGraph's main storage design relies on the binary KV structure in disk & graph structure in memory, but is compatible with relational databases such as MySQL, making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases, which are not equivalent to KV storage)

From an abstract perspective, the essence of a storage system is to provide the semantics of KV/Map, just like the data structure of a graph is essentially composed of array or linked-list. Therefore, building a graph system based on stable/reliable KV storage is a mainstream/common approach, and it is also easier to expand to distributed environments

0 replies

dosubot[bot]
Bot Apr 16, 2024

The discussion around HugeGraph's use of relational/SQL-based backend storage systems like MySQL, RocksDB, Cassandra, and HBase for graph data storage raises important points about the nature and capabilities of graph databases. The essence of HugeGraph being classified as a graph database is not strictly about the storage mechanism it employs but rather about how it models, manages, and queries data. By representing data as vertices and edges, HugeGraph adopts a graph data model that is fundamentally different from the tabular model used in relational databases. This model is optimized for efficiently querying relationships and patterns within the data, which is a core advantage of graph databases.

HugeGraph's support for various backend systems is a strategic choice that allows it to leverage the robustness, scalability, and persistence features of these established databases. On top of these systems, HugeGraph implements graph-specific functionalities such as schema management, indexing for efficient query processing, and optimizations for graph traversal. These capabilities enable HugeGraph to offer the key benefits associated with graph databases, such as enhanced search speeds and more effective data relationship exploration, irrespective of the underlying storage system.

The clarification provided by the maintainer, @imbajin, further enriches our understanding by highlighting that HugeGraph's design is centered around a binary key-value (KV) structure on disk and a graph structure in memory. This approach is not solely dependent on relational databases but is compatible with them, facilitating easier migration or understanding for users familiar with relational models. It's important to note that the use of relational databases is not equivalent to KV storage but rather a compatible option.

From a broader perspective, the choice to build a graph system on top of stable and reliable KV storage is a common and mainstream approach in the industry. This strategy not only ensures the reliability and efficiency of the storage system but also simplifies expansion into distributed environments. The underlying KV/Map semantics provided by the storage system serve as a foundational layer upon which the graph data model is implemented, allowing HugeGraph to fully leverage the unique advantages of graph databases for handling complex queries and relationships.

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

Github-Polaris
Apr 16, 2024
Author

Simply put, in addition to the content mentioned by @dosu-bot, HugeGraph's main storage design relies on the binary KV structure in disk & graph structure in memory, but is compatible with relational databases such as MySQL, making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases, which are not equivalent to KV storage)

From an abstract perspective, the essence of a storage system is to provide the semantics of KV/Map, just like the data structure of a graph is essentially composed of array or linked-list. Therefore, building a graph system based on stable/reliable KV storage is a mainstream/common approach, and it is also easier to expand to distributed environments

Thank you! The "making it easier for ordinary users to understand/migrate to the graph. (It should be noted that HG does not rely on relational databases" which is part of you sentence answered my question.

0 replies

dosubot[bot]
Bot Apr 16, 2024

I'm glad the explanation provided clarity on HugeGraph's approach to storage and its classification as a graph database. Understanding that HugeGraph's design is not strictly tied to relational databases but rather utilizes a binary KV structure for disk storage and a graph structure in memory helps highlight its flexibility and efficiency in handling graph data. This design choice facilitates easier migration for users familiar with relational databases while fully leveraging the advantages of graph databases for complex queries and relationship exploration. If you have any more questions or need further clarification on HugeGraph or graph databases in general, feel free to ask!

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

dosubot[bot]
Bot Apr 22, 2024

HugeGraph qualifies as a Graph DB through its design and architecture, which provide graph database functionalities regardless of the storage backend it uses, such as MySQL, RocksDB, Cassandra, or HBase. It's built on the Apache TinkerPop3 framework and supports the Gremlin query language, enabling it to perform complex graph operations and analyses characteristic of graph databases.

The system's architecture includes a Core Module for the graph engine, a Backend Module for physical data storage optimized for graph operations, and an API Module for user interaction through RESTful APIs and Gremlin queries. This modular design allows HugeGraph to offer graph-oriented computation, data abstraction, and a rich set of graph operations, making it suitable for applications requiring deep relationship analysis and knowledge graph construction.

HugeGraph's designation as a graph database stems from its graph-specific operations, optimizations, and support for graph query languages, rather than the nature of its storage backend. This approach enables it to leverage the robustness, scalability, and persistence capabilities of traditional databases while providing the unique advantages of graph databases, such as improved search speeds and efficient data relationship exploration.

For more details on HugeGraph's architecture and capabilities, you can refer to the architectural guide and introduction in the HugeGraph documentation, as well as the source code for insights on how it interacts with backend storage systems.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

0 replies

Uh oh!

[Question] Why we use relational/SQL based backend storage in GraphDB? #2531

Uh oh!

Github-Polaris Apr 16, 2024

Problem Type (问题类型)

Before submit

Environment (环境信息)

Your Question (问题描述)

Vertex/Edge example (问题点 / 边数据举例)

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

Replies: 6 comments

Uh oh!

Uh oh!

dosubot[bot] Bot Apr 16, 2024

Sources

About Dosu

Uh oh!

imbajin Apr 16, 2024 Collaborator

Uh oh!

dosubot[bot] Bot Apr 16, 2024

About Dosu

Uh oh!

Github-Polaris Apr 16, 2024 Author

Uh oh!

dosubot[bot] Bot Apr 16, 2024

About Dosu

Uh oh!

dosubot[bot] Bot Apr 22, 2024

Details

Github-Polaris
Apr 16, 2024

dosubot[bot]
Bot Apr 16, 2024

imbajin
Apr 16, 2024
Collaborator

dosubot[bot]
Bot Apr 16, 2024

Github-Polaris
Apr 16, 2024
Author

dosubot[bot]
Bot Apr 16, 2024

dosubot[bot]
Bot Apr 22, 2024