Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

C++: vector of tables, optimization #7040

m5k8 started this conversation in General
Jan 27, 2022 · 5 comments · 7 replies
Discussion options

Hi, thanks for the great library! I'm successfully using it for my needs, due to low memory usage overhead design. But still, I'd like to optimize a little bit more. I have the following usage example:

table FooItem
{
	id:uint32;
	name:string;
}
table FooList
{
	items:[FooItem];
}
root_type FooList;

And C++ code:

struct foo_item
{
	uint32_t id;
	std::string name;
}
using foo_list = std::list<foo_item>;
serialize(const foo_list& foos)
{
	FlatBufferBuilder fbb{};
	std::vector<Offset<FooItem>> fb_items;
	fb_items.reserve(foos.size());
	for (const auto& foo : foos)
	{
		auto fb_id = foo.id;
		auto fb_name = fbb.CreateString(foo.name);
		auto fb_item = CreateFooItem(fbb, fb_id, fb_name);
		fb_items.push_back(fb_item);
	}
	auto fb_items_vector = fbb.CreateVector(fb_items);
	auto fb_list = CreateFooList(fbb, fb_items_vector);
	fbb.Finish(fb_list);
	[...]
}

And that's mostly fine, but is it really required to use fb_items temporary vector? It's wasteful, unnecessary memory allocation and loss of time. I experimented with CreateUnitializedVector(), but the documentation is lacking, and I've come to the conclusion that it's designed for POD scalar types, not tables.

Hot to get rid of this temporary vector, could you advise?

You must be logged in to vote

Replies: 5 comments 7 replies

Comment options

Well, you have to create all the "leaf" objects first before you can start making the vector, so someone has to store the values. It's either the app (i.e., you) or we store them internally. Flatbuffers only rarely (like 3 cases that I can recall) creates any heap memory, so we don't want to store a dynamic sized array. So we prefer to pass the offsets back to the app and they can determine the best way to store it. There are some tricks we can play when when serializing vectors of strings where we use the 'scratch buffer' of the backing buffer to store the offsets, but this might cause a buffer resize and be just as expensive.

I wouldn't worry about the speed/size of this vector, it should be tiny and scale with the size of foo_list. If you are really concerned about the vector, just add a new Offset<FootItem> offset field to your struct foo_item and you could store the result directly in foos.

You must be logged in to vote
0 replies
Comment options

Thanks for your attention and answer.
std::vector is acceptable on the PC side, but on the embedded side, it's troublesome at least. I have no malloc/new, so no STL. Adding offset to my foo_item is out of question, that was simplified example, in reality it's quite complex struct with many subfields, that are arrays, imported from other modules and adding any foreign fields to them is out of question. I probably could use some scratch area for that, but unfortunately some of these tables are dynamic in size and this becomes very complicated.

How about this CreateUnitializedVector() function? Let's say, I already know size of the table during serialization call, so first I can create UnitializedVector() with known size (but offsets not set yet), then serialize every table item, and after every item when I get the offset I put this offset into previously allocated vector, settings offsets one by one. This way, vector comes before table items, is that a problem? As we are operating on offsets, isn't it possible? That would solve this whole issue.

You must be logged in to vote
1 reply
Comment options

That is something you can try (re: CreateUninitializedVector), though offsets are unsigned, so they they only point forward (https://google.github.io/flatbuffers/flatbuffers_internals.html) in the buffer, so I'm thinking it probably won't work.

No malloc/new on the embedded side? Do you use a custom allocator for the flatbufferbuilder then? The default one uses new/delete

// DefaultAllocator uses new/delete to allocate memory regions
class DefaultAllocator : public Allocator {
public:
uint8_t *allocate(size_t size) FLATBUFFERS_OVERRIDE {
return new uint8_t[size];
}
void deallocate(uint8_t *p, size_t) FLATBUFFERS_OVERRIDE { delete[] p; }
static void dealloc(void *p, size_t) { delete[] static_cast<uint8_t *>(p); }
};
Comment options

I started to use flatbuffers on PC side, and I like it a lot. I'm embedded guy at heart and it's almost perfect, very far from allocation fest (like some other serialization protocols), less that vector issue. But it's acceptable there.
But in the embedded scenario it looks like more trouble. I'm not using it yet, just evaluating, based on positive experience on the PC side. For now, let's say, bummer :) Allocator for flatbufferbuilder doesn't pose a problem, I can supply own allocator, or just modify slightly the source code to use some static buffer. That's minor issue. I can even set a hardware guard watching for the write to the byte just after the supplied buffer, stopping the process with exception (Freescale Coldfire MCU has that), so I can bu sure there's no overflow pat the static buffer. But this vector of offsets in a nested loop for tables within tables, that's problematic. CreateUninitializedVector almost solves it...
Anyway, thanks for the support and otherwise great piece of code!

You must be logged in to vote
1 reply
Comment options

Here is another strategy that some embedded people use. Cap the total N of foos to some compile-time constant and just build flatbuffers with that N cap. Then you can statically allocate an array of size N and use it to store the offsets on the stack. Once you reach N, build the buffer and send it off, then start building a new buffer with the remaining foos.

Comment options

I'm also trying to avoid double-copy when building a flatbuffers vector. In my case it is a simple uint8 array:

table Row {
 id: uint16;
 mask: uint32;
 payload: [uint8];
}

So if I could access the flatbuffers-allocated payload vector and write to that directly. This way I could do a single memcpy to set the entire payload into the flatbuffer in one hit.

Any API calls to achieve this? Maybe accessing vector_downward directly?

Update: looks like CreateUninitializedVector is designed for exactly what I am describing 🤔

You must be logged in to vote
5 replies
Comment options

CreateVector is probably preferred, as it handle endianness for you.

Comment options

Oof good call, thank you I forgot about endianness. Still for my case I want max performance so CreateUninitializedVector() might be worth the hassle. Flatbuffers expects little endian?

Comment options

The wire format is little endian, which is the most common system we build for.

The CreateVector I posted above uses memcpy internally.

Comment options

Gotcha that got me, hopefully helps someone...

 uint8_t* payload_buf = nullptr;
 auto payload_offset = fbb_.CreateUninitializedVector(length, &payload_buf);
 BOOST_ASSERT(payload_buf);
 std::memcpy(payload_buf, buffer, length);
 protocol::CreateRow(fbb_, row_id, pmask, payload_offset); // <--- This modifies the memory pointed to by payload_buf!
 BOOST_ASSERT(0 == std::memcmp(payload_buf, buffer, length)); // Occasionally fails.

So when the docs say "Write the data any time later to the returned buffer pointer buf" I think they really mean "any time before using the flatbuffer builder to do other stuff that might modify the underlying vector_downward".

Comment options

Yeah, i'm not sure the exact history of the CreateUninitializedVector so I guess the doc comment could be more clear. It seems to be more slated for an internal API than a public one, but the cat is out of the bag already.

I still think the vanilla CreateVector() is what you want. It uses memcpy internally and should be performant for you needs.

Comment options

I'm using CreateUninitializedVector to allocate space for vector, and fill it later. But for raw byte stream, nothing sophisticated, particulariy not to optimize out temporary vector (main case in this thread). With such simple usage, it certainly works without any data corruption, as far as my case goes. Tested, no problems. But I'm not using CreateRow().

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /