Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multishot recv with Bundle Recv + Incremental Buffers - are they compatible? #1566

Unanswered
stonebrakert6 asked this question in Q&A
Discussion options

[UPDATED - for better understanding of the issue]
Hi,
I have already gone through issues 1409 and 1423.
My setup(kernel version 6.17)

  1. Provided buffers have incremental flag on i.e io_uring_setup_buf_ring(ring_, nr_bufs, bgid, IOU_PBUF_RING_INC, &ret);
  2. sqe is configured to use bundle recv i.e sqe->ioprio |= IORING_RECVSEND_BUNDLE;
  3. Assume that buffer size is 124 bytes.

My question is that can a given cqe span-across/use 2 buffers in above setting(because bundle recv is enabled)? From the assertion in test recv-inc-ooo.c. It seems that with incremental buffers a given cqe won't copy more than buffer_size len but can this still use 2 buffers?

i.e is the following sequence of cqes possible?

  1. First cqe has res = 64 bytes, buffer_id = 0 and flags has IORING_CQE_F_BUF_MORE set.
  2. Second cqe has res = 124 bytes, buffer_id = 0 (i.e pick 2 buffers since bundle recv is enabled)
    My guess is no, since we have this comment
/*
+		 * Limit incremental buffers to 1 segment. No point trying
+		 * to peek ahead and map more than we need, when the buffers
+		 * themselves should be large when setup with
+		 * IOU_PBUF_RING_INC.
+		 */

Questions

  1. If above is possible what would be the status of IORING_CQE_F_BUF_MORE in the second cqe?
  2. If it is not possible then can I say that bundle recv is incompatible/mutually-exclusive with incremental buffer consumption i.e either use big buffers for incremental consumption or enable bundle recv if using smaller buffer sizes.
  3. With incremental buffers is it guaranteed that io_uring would completely use a buffer's memory fully before it uses the next buffer? If this is guaranteed then isn't the flag IORING_CQE_F_BUF_MORE redundant since the application is tracking the offset/cursor and would know that the buffer has been exhausted and so next bytes would be available in the next buffer.
You must be logged in to vote

Replies: 5 comments 2 replies

Comment options

I'm OOO for the next week, I'll take a look when I'm back.

You must be logged in to vote
0 replies
Comment options

Just to provide more context, it turns out that my guess(mentioned above) is wrong and when bundle recv is enabled, it is possible to get res such that the bytes are spanning across 2 buffers(and hence possibly more as well).

Is this expected behaviour? If yes, then in this case the flag IORING_CQE_F_BUF_MORE is associated with which buffer id? The buffer id present in the cqe->flags or the buffer id we get after traversing res bytes from the cqe's buffer-id?

You must be logged in to vote
0 replies
Comment options

@axboe I apologize for tagging you Jens. I would be grateful if you could help provide answers to above questions or just point me in right direction.

You must be logged in to vote
0 replies
Comment options

I think that there is something wrong with setting IORING_CQE_F_BUF_MORE when both bundle recv and incremental buffer consumption are used.

Here is an example where I use buf ring with 4 buffers of 8 bytes size. Then I write 2 bytes 10 times in the loop then 3 bytes 10 times and so on. Each loop step is write to the one side of tcp connection then recv on the other side (no multishot recv).

Here is buffers content after each loop. I reset all buffers to the 0xff and the data I write are 0,1,2... so I can inspect what part of the buffer is used.

1. 00 01 ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x11, IORING_CQE_F_BUF_MORE: true, buffers pos: 2 0 0 0 
2. ff ff 00 01 ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x11, IORING_CQE_F_BUF_MORE: true, buffers pos: 4 0 0 0 
3. ff ff ff ff 00 01 ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x11, IORING_CQE_F_BUF_MORE: true, buffers pos: 6 0 0 0 
4. ff ff ff ff ff ff 00 01 | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
release buffer 0 
cqe.flags: 0x1, IORING_CQE_F_BUF_MORE: false, buffers pos: 0 0 0 0 
5. ff ff ff ff ff ff ff ff | 00 01 ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x10011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 2 0 0 
6. ff ff ff ff ff ff ff ff | ff ff 00 01 ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x10011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 4 0 0 
7. ff ff ff ff ff ff ff ff | ff ff ff ff 00 01 ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x10011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 6 0 0 
8. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff 00 01 | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
release buffer 1
cqe.flags: 0x10001, IORING_CQE_F_BUF_MORE: false, buffers pos: 0 0 0 0 
9. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | 00 01 ff ff ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x20011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 0 2 0 
10. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff 00 01 ff ff ff ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x20011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 0 4 0 
11. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff 00 01 02 ff | ff ff ff ff ff ff ff ff
cqe.flags: 0x20011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 0 7 0 
12. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff 00 | 01 02 ff ff ff ff ff ff
release buffer 2
release buffer 3
cqe.flags: 0x20001, IORING_CQE_F_BUF_MORE: false, buffers pos: 0 0 0 0 
13. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | 00 01 02 ff ff ff ff ff
cqe.flags: 0x30011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 0 0 3 
14. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff 00 01 02 ff ff
cqe.flags: 0x30011, IORING_CQE_F_BUF_MORE: true, buffers pos: 0 0 0 6 
15. ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff ff ff | ff ff ff ff ff ff 00 
release buffer 3
cqe.flags: 0x30005, IORING_CQE_F_BUF_MORE: false, buffers pos: 0 0 0 0 

After this state of the io_uring_buf structures are:

buf 0 .{ .addr = 140289904345112, .len = 8, .bid = 3, .resv = 9 }
buf 1 .{ .addr = 140289904345096, .len = 8, .bid = 1, .resv = 0 }
buf 2 .{ .addr = 140289904345104, .len = 8, .bid = 2, .resv = 0 }
buf 3 .{ .addr = 140289904345118, .len = 0, .bid = 3, .resv = 0 }

Buffer 3 is now in both slot 0 and slot 3, slot 3 is unusable because len is 0.

Look at the loop step 12, 3 bytes are received across 2 buffers, f_buf_more is false suggesting that I should release both buffers to the kernel (although buffer 3 is only partially used). I call buf_ring_add for both (offset 0,1) and then buf_ring_advance with nr_buffers 2.
This breaks sync with the kernel, kernel uses buffer 3 for the following recv instead of buffer 0. Next release, releases buffer 3 to the slot 0!

If I change release logic to ignore f_buf_more and release last buffer only if it is fully consumed everything works. Kernel works like the f_buf_more was true in the loop step 12.

Kernel 7.0.3.

You must be logged in to vote
1 reply
Comment options

I think that there is something wrong with setting IORING_CQE_F_BUF_MORE when both bundle recv and incremental buffer consumption are used.

I've observed the same

If I change release logic to ignore f_buf_more and release last buffer only if it is fully consumed everything works.

Yes, I am using the exact work around. That's why I asked the following question(3)
"With incremental buffers is it guaranteed that io_uring would completely use a buffer's memory fully before it uses the next buffer"

I think IORING_CQE_F_BUF_MORE can be made redundant if it is guaranteed that incremental buffers are fully consumed before using the next buffer.

Comment options

There have been more discussions about this from a slightly different angle recently here: #1433 (comment)

axboe: It's a wording issue, really. It says that when more completions should be expected for a given bid, IORING_CQE_F_BUF_MORE will be set. It's a hint. Applications should check if it's the same as the current bid, and if not, update their expected buffer address.

(the entire thread is worth reading for context)

(sorry for spam I put this answer in the wrong textbox woops!)

You must be logged in to vote
1 reply
Comment options

Thanks for sharing this Francis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /