Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

shifting or self extending hybrid memory? #16708

Unanswered
leok7v asked this question in Q&A
Discussion options

Hi folks,

I have a question about hybrid memory models "group attention" / self extend / shift code.
I am experimenting with several hybrid but not fully recurrent models.
In the code (main.cpp/server.cpp/passkey.cpp) I see the same sequences for shifting content and group attention context reduction via llama_memory_seq_rm(), llama_memory_seq_add() for shift and llama_memory_seq_div(), llama_memory_seq_add() for SelfExtend
LFM2-VL-450M-Q8_0.gguf
and also falcon-h1-0.5b-instruct-q8_0.gguf

The LFM2-VL-450M-Q8_0.gguf model returns:

is_recurrent: 0
is_hybrid: 1
can_shift: 1

as it should and reading the code:

bool llama_memory_hybrid::get_can_shift() const {
 // Shifting is trivially supported for recurrent
 return mem_attn->get_can_shift();
}

one might expect that llama_memory_seq_rm() would work shifting the KV cache content but it does not because of:

bool llama_memory_seq_rm(
 llama_memory_t mem,
 llama_seq_id seq_id,
 llama_pos p0,
 llama_pos p1) {
 if (!mem) {
 return true;
 }
 return mem->seq_rm(seq_id, p0, p1);
}

calling:

bool llama_memory_hybrid::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1) {
 // Try removing from the recurrent cache first since it may fail. If it does
 // fail, the cache will not have been mutated.
 if (!mem_recr->seq_rm(seq_id, p0, p1)) {
 return false;
 }
 return mem_attn->seq_rm(seq_id, p0, p1);
}

and understandably failing in:

bool llama_memory_recurrent::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos p1) {
 //printf("[DEBUG] calling llama_memory_recurrent::seq_rm` with `seq_id=%d, p0=%d, p1=%d`\n", seq_id, p0, p1);
 uint32_t new_head = size;
 if (p0 < 0) {
 p0 = 0;
 }
 if (p1 < 0) {
 p1 = std::numeric_limits<llama_pos>::max();
 }
 // models like Mamba or RWKV can't have a state partially erased
 if (seq_id >= (int64_t) size) {
 // could be fatal
 return false;
 }
 if (0 <= seq_id) {
 int32_t & tail_id = cells[seq_id].tail;
 if (tail_id >= 0) {
 const auto & cell = cells[tail_id];
 // partial intersection is invalid
 if ((0 < p0 && p0 < cell.pos) || (0 < p1 && p1 <= cell.pos)) {
 //printf("[DEBUG] inside `llama_memory_recurrent::seq_rm`: partial intersection is invalid, so returning false\n");
 return false;
 }
 // invalidate tails which will be cleared
 if (p0 <= cell.pos && cell.pos < p1) {
 tail_id = -1;
 }
 }
 } else {
 ...
 }
}

checking condition: (0 < p0 && p0 < cell.pos) || (0 < p1 && p1 <= cell.pos)

SelfExtend code path with grp_attn_n and grp_attn_w silently succeeds but make further calls to init_batch() fail with unable to find_slot()

Questions:

  1. Is this expected behavior?
  2. Can I detect situation like this earlier than attempting call to llama_memory_seq_rm() that is failing and returning false?
  3. In all the code in main.cpp, server.cpp, passkey.cpp I do not see bool results of llama_memory_seq_rm() calls and any error reporting/recovery/mitigation for failure - which is probably means that surrounding code path has guarantees that the call will succeed but I've failed to find such guarantees. What am I missing there?
  4. I did not deeply investigate grp_attn_n and grp_attn_w for the reason that I believe the hybrid models in question maybe do not need it at all if I correctly detect "thou cannot/should not shift, self extend" condition. If that is incorrect - any hints on what could go wrong for hybrid models?
  5. Should there be checks and at least error logging for all llama_memory_seq_rm() calls that return false on failure?

Any help and clarification would be greatly appreciated.

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /