Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Consistency model / ACID #2913

Answered by linas
alexandergunnarson asked this question in Q&A
Discussion options

Hey @linas et al. —

I'm wondering what the consistency model of an AtomSpace is (specifically, the ACI in ACID). I see mutexes used (IIRC, a per-atom mutex, but I may have seen a per-AtomSpace mutex floating around), and I believe I read somewhere about a QueryLink being executed in a "thread-safe" way. But does this mean e.g. QueryLinks are executed atomically, with all consequent atoms either being added or not added? Does it further mean that an AtomSpace is strictly serializable (that would be good news!) or merely guarantees a read-committed isolation level? I'd love AtomSpace to pass the Jepsen test!

This brings me to the durability question. It seems that RocksDB is the preferred persistence backend for AtomSpace, which is fine. But what about failure conditions? If a write to the AtomSpace succeeds, might a write to a connected persistence backend fail? (I can imagine Client A writing to an AtomSpace, but persistence failing and connection to OpenCog server terminating; then client B reading from the AtomSpace an unpersisted value.) If so, this introduces a consistency issue. If the AtomSpace process crashes, or its underlying hardware fails, what happens? I assume there's no notion of a writeahead log, but I may be wrong. Just trying to plan for eventualities.

You must be logged in to vote

Hi Alex,

consistency model of an AtomSpace is (specifically, the ACI in ACID)

I cannot give you a clear, easy answer, because the AtomSpace doesn't fall into either the conventional relationalDB model, nor into the key-value model (BASE). The following might illuminate the situation:

  • Atoms are immutable; you can add or delete them, you cannot modify them.
  • There is one specific type, the StateLink which does provide a peculiar form of atomic update.
  • Attached to each Atom is something called Values, (in retrospect a terrible name choice, should have been "properties".) Values can, in a certain sense, be modified (they're still atomic and thread-safe, though.) The prototypical Value is the ...

Replies: 3 comments 2 replies

Comment options

Hi Alex,

consistency model of an AtomSpace is (specifically, the ACI in ACID)

I cannot give you a clear, easy answer, because the AtomSpace doesn't fall into either the conventional relationalDB model, nor into the key-value model (BASE). The following might illuminate the situation:

  • Atoms are immutable; you can add or delete them, you cannot modify them.
  • There is one specific type, the StateLink which does provide a peculiar form of atomic update.
  • Attached to each Atom is something called Values, (in retrospect a terrible name choice, should have been "properties".) Values can, in a certain sense, be modified (they're still atomic and thread-safe, though.) The prototypical Value is the TruthValue; in certain sense, this is "really" where the computations happen. In a different sense, the Values provide an alternative to Atoms, with different performance properties, different mutabiility constraints, and different searchability. One can think of Atoms as pipes, and the Values as the fluid in the pipes. (thus relatively immutable and fixed vs endlessly changing). There are other ways to think of it, too.
  • The mutexes are all about thread safety and atomic access. Once a thread has a handle to an Atom or Value, it will not disappear from under that thread.
  • Temporary results during query are held in temporary AtomSpaces, that are invisible to the user. They don't "leak". Search results can be placed in the AtomSpace, if you use BindLink, GetLink, or not, if you use MeetLink / QueryLink. So the latter two do not alter the AtomSpace at all.

Jepson Test

This seems to be about distributed datastores. The "raw" atomspace is not distributed, so this does not apply. However... There is an API called StorageNode It allows any given AtomSpace to connect to ... storage! Currently, this is RocksDB, Postgres, flat files, or other remote AtomSpaces on the network. What is saved/restored, or exchanged on the network, is up to the user. The network storage node is just a peer-to-peer API: what Atoms the peers trade with one-another is up to them. I don't know what the Jepsen test implies for this. If a remote peer crashes ... well, whatever Atoms you are holding are what they are. They're not damaged.

p.s. creating new StorageNodes to other systems is really pretty easy. It'll take more than a day or two, but should take less than a week or two. So if you have some favorite system .. have at it! It's not hard.

RocksDB

This uses the StorageNode mechanism, so the only things saved there are what you explicitly write out. It's designed that way, because saving everything automatically, all the time would be a huge performance bottleneck, esp. for rapidly changing data. (If you're clever you can write a thread to do this, if you feel you really need to. But if you really need to, you are probably mis-designing your system.)

If you kill -9 the atomspace while it's writing to RocksDB, well ... Rocks does have write-ahead logs and all kinds of fancy anti-corruption/self-repair stuff in it. You'd have to read about Rocks. I'm not sure what would happen if the AtomSpace tries to work with a partially-written Atom in rocks. It won't crash, but there are some strange corner cases where it might behave oddly in some subtle ways. Maybe. No one has explored this; there are no unit tests that intentionally try to write corrupt data, and then try to recover from that.

For me, the atomspace never crashes: I run multi-month-long processing jobs. What I do get are thunderstorms that knock out power, and the solution for that is UPS power supplies. (I mean, sure, you can write code that crashes, while it has the atomspace open. I think its unlikely to corrupt your data, but its kind of on you if you are working regularly in that scenario. The AtomSpace was not designed for banking applications.)

You must be logged in to vote
0 replies
Answer selected by linas
Comment options

I just converted this from a github "issue" to a github "discussion". I've never used discussions before. Hope this works.

You must be logged in to vote
0 replies
Comment options

Appreciate your detailed response, @linas! Sounds like it's a linearizable consistency model, with thread-safety on a per-atom basis (with the caveat of the possible case of the partially-written atom), but not across atoms. I'd imagine that if for whatever reason a BindLink fails mid-execution due to e.g. a machine shutdown on AWS, the state of the AtomSpace will reflect whatever atoms have been written to that point, but this may be inconsistent with respect to the overall system and will require manual repairing.

You must be logged in to vote
2 replies
Comment options

BindLink only "stores" it's result in RAM, and not to disk. So if you kill everything half-way through, nothing on disk will have been altered. There are no disk-writes going on. Upon restart, what you would see is whatever was on disk.

For the atomspace, there is no distinction between "local disk" and "remote network node". Both look "the same" to it.

As to "linearizability", see the next note.

Comment options

I'm skimming what Jepsen wrote there, and, umm...

I have no clue what he's talking about, it seems patently wrong; I don't understand the definitions and terms that he is using; and whatever the heck "linearizability" is supposed to be (I don't get it), the AtomSpace is totally and completely not linearizable. Jepsen seems to be ignoring basic concepts like threads and locks and fences. The stuff about network partitions and forward progress sounds like bunkum to me. I don't know what he's talking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
Converted from issue

This discussion was converted from issue #2912 on December 12, 2021 00:09.

AltStyle によって変換されたページ (->オリジナル) /