We use some essential cookies to make our website work.

We use optional cookies, as detailed in our cookie policy, to remember your settings and understand how you use our website.

16 posts • Page 1 of 1
C_Payne
Posts: 5
Joined: Sun Nov 12, 2017 11:23 am

OMX_IndexParamBrcmNALSSeparate

Sun Nov 12, 2017 1:37 pm

In an attempt to further reduce latency in my streaming solution I am trying to implement slicing in raspivid.

I have successfully added a configuration option to set MMAL_PARAMETER_MB_ROWS_PER_SLICE.
So far working well.. my frames are now split into a configurable amount of slices.
However still multiple NAL's containing the complete frame are spit out together in one output buffer, making the effort useless as I cant start transmitting before the compression of the complete frame is finished.

I have found a parameter "OMX_IndexParamBrcmNALSSeparate" which seems useful, however it is not broken out in MMAL.

I have added an enum

Code: Select all

MMAL_PARAMETER_VIDEO_ENCODE_NALSSeparate
in mmal_parameters_video.h

and

Code: Select all

 MMALOMX_PARAM_BOOLEAN_PORTLESS(MMAL_PARAMETER_VIDEO_ENCODE_NALSSeparate,
 OMX_IndexParamBrcmNALSSeparate)
in mmalomx_util_params_video.c

However

Code: Select all

mmal_port_parameter_set_boolean(encoder_output, MMAL_PARAMETER_VIDEO_ENCODE_NALSSeparate, 1)
returns MMAL_ENOSYS

I have then tried to modifiy hello_encode and added

Code: Select all

OMX_PARAM_U32TYPE testtype;
	memset(&testtype, 0, sizeof(OMX_PARAM_U32TYPE));
 	testtype.nSize = sizeof(OMX_PARAM_U32TYPE);
 	testtype.nVersion.nVersion = OMX_VERSION;
	testtype.nPortIndex = 201;
 	testtype.nU32 = 12;
	printf("RETVAL: %i\n\n",OMX_SetParameter(ILC_GET_HANDLE(video_encode),
 OMX_IndexConfigBrcmVideoEncoderMBRowsPerSlice, &testtype));
	OMX_CONFIG_BOOLEANTYPE testtype2;
	memset(&testtype2, 0, sizeof(OMX_CONFIG_BOOLEANTYPE));
 	testtype2.nSize = sizeof(OMX_CONFIG_BOOLEANTYPE);
 	testtype2.nVersion.nVersion = OMX_VERSION;
 	testtype2.bEnabled = 1;
	printf("RETVAL: %i\n\n",OMX_SetParameter(ILC_GET_HANDLE(video_encode),
 OMX_IndexParamBrcmNALSSeparate, &testtype2));
Slicing is working, however the buffers returned are still complete frames.
OMX_SetParameter returning 0 here.

What am I missing?
Is there another way to make the h264 encoder output slices into seperate buffers?
Any help would be greatly appreciated!!

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 35174
Joined: Sat Jul 30, 2011 7:41 pm

Re: OMX_IndexParamBrcmNALSSeparate

Sun Nov 12, 2017 4:54 pm

Not sure why this isn't working, but perhaps using a smaller buffer size might force the system to split more frequently?

Note that not all OMX commands are implemented, even if they might be defined in a header somewhere.
Software guy, working in the applications team.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18476
Joined: Wed Dec 04, 2013 11:27 am

Re: OMX_IndexParamBrcmNALSSeparate

Sun Nov 12, 2017 9:59 pm

I haven't got source code to hand, but I wonder if this is related to the minimise fragmentation parameter (look in the headers - it should be obvious). It's on by default as too many apps objected to getting partial buffers (even though that is completely legit and the buffers are correctly signalled). It may be that the logic around that is retaining the nals until the frame is complete.
Software Engineer at Raspberry Pi Ltd. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

C_Payne
Posts: 5
Joined: Sun Nov 12, 2017 11:23 am

Re: OMX_IndexParamBrcmNALSSeparate

Sun Nov 12, 2017 10:03 pm

6by9 wrote:
Sun Nov 12, 2017 9:59 pm
I haven't got source code to hand, but I wonder if this is related to the minimise fragmentation parameter (look in the headers - it should be obvious). It's on by default as too many apps objected to getting partial buffers (even though that is completely legit and the buffers are correctly signalled). It may be that the logic around that is retaining the nals until the frame is complete.
Sounds like a good idea.. i will try that out tomorrow. Thanks.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18476
Joined: Wed Dec 04, 2013 11:27 am

Re: OMX_IndexParamBrcmNALSSeparate

Mon Nov 13, 2017 12:58 pm

Having checked the firmware source, you are right that OMX_IndexParamBrcmNALSSeparate is the key parameter, and it isn't currently broken out in MMAL (I'm just fixing that one).
Plumbing now done, and I'm getting buffers out of raspivid along the lines of:

Code: Select all

Buffer length 18, flags 1020,pts 0
Buffer length 9, flags 1024,pts 0
Buffer length 6524, flags 1008,pts 193960183
Buffer length 6047, flags 100C,pts 193960183
Buffer length 5925, flags 1000,pts 193994888
Buffer length 5190, flags 1004,pts 193994888
Buffer length 6172, flags 1000,pts 194029592
Buffer length 4793, flags 1004,pts 194029592
Buffer length 4552, flags 1000,pts 194064295
Buffer length 2495, flags 1004,pts 194064295
Buffer length 3963, flags 1000,pts 194098999
Buffer length 2715, flags 1004,pts 194098999
Buffer length 4158, flags 1000,pts 194133703
Buffer length 3098, flags 1004,pts 194133703
Buffer length 3306, flags 1000,pts 194168407
Buffer length 2564, flags 1004,pts 194168407
Buffer length 3793, flags 1000,pts 194203110
Buffer length 2570, flags 1004,pts 194203110
Buffer length 2924, flags 1000,pts 194237814
(640x480, and 15 MB rows/slice, so two slices per frame).

We've gained a new define of MMAL_BUFFER_HEADER_FLAG_NAL_END (1<<12).
My only reservation is if people have been naughty and not treated the flags field as a bitfield then their app will break, but I can't think of a way to leave the OMX_BUFFERFLAG_ENDOFNAL flag on all the IL buffers without doing a nasty hack in mmal_il to only set it if OMX_BUFFERFLAG_ENDOFFRAME isn't set.

I'll submit my firmware patches, so they should come out in the next rpi-update release.

I will say that I'm not convinced how much you'll improve the latency. Yes the data is available earlier for transmission, and so the decoder can start a smidge earlier, but I suspect you'll be looking at maybe 10ms improvement assuming 1080P. (Encoding a whole 1080P frame takes about 40ms, but is pipelined so that the next frame is started before the previous one completed, hence being able to achieve 30fps).
Software Engineer at Raspberry Pi Ltd. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

C_Payne
Posts: 5
Joined: Sun Nov 12, 2017 11:23 am

Re: OMX_IndexParamBrcmNALSSeparate

Mon Nov 13, 2017 4:13 pm

6by9 wrote:
Mon Nov 13, 2017 12:58 pm
I'll submit my firmware patches, so they should come out in the next rpi-update release.
Wow this is prompt. Thank you so much.
What is the release schedule for this?

In your example did you also set MMAL_PARAMETER_MINIMISE_FRAGMENTATION to 0 ?
I am wondering why it also didnt work in my OMX example?
6by9 wrote:
Mon Nov 13, 2017 12:58 pm
I will say that I'm not convinced how much you'll improve the latency. Yes the data is available earlier for transmission, and so the decoder can start a smidge earlier, but I suspect you'll be looking at maybe 10ms improvement assuming 1080P. (Encoding a whole 1080P frame takes about 40ms, but is pipelined so that the next frame is started before the previous one completed, hence being able to achieve 30fps).
Well that rather depends on when the encoder starts to spit out NALUs doesnt it?
If it takes 40msec for a complete frame it would then ideally output buffers at 10,20,30and40msec, which would then reduce transmission latency by a factor of nearly 4 on my highly bandwidth limited radio channel.
Same is for the decoder.

So if I had a latency of 40(encoder)+30(transmission)+30(decoder) = 100msec (this is about what I am seeing right now)
I could reduce this to 40+7.5+7.5 = 55msec
This is all well theoretical but seems very worth a try.

Thanks for your prompt help!

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18476
Joined: Wed Dec 04, 2013 11:27 am

Re: OMX_IndexParamBrcmNALSSeparate

Mon Nov 13, 2017 7:21 pm

C_Payne wrote:
Mon Nov 13, 2017 4:13 pm
Wow this is prompt. Thank you so much.
What is the release schedule for this?
Pass. rpi-updates happen when there is something worthwhile to release, either on the firmware or kernel sides. Generally they're less than a fortnight apart though.
C_Payne wrote:In your example did you also set MMAL_PARAMETER_MINIMISE_FRAGMENTATION to 0 ?
I am wondering why it also didnt work in my OMX example?
Yes I did. I didn't try it without so can't comment on whether it was actually required or not.
OMX should have worked but I avoid working with it unless I really have to :D
C_Payne wrote:
6by9 wrote:
Mon Nov 13, 2017 12:58 pm
I will say that I'm not convinced how much you'll improve the latency. Yes the data is available earlier for transmission, and so the decoder can start a smidge earlier, but I suspect you'll be looking at maybe 10ms improvement assuming 1080P. (Encoding a whole 1080P frame takes about 40ms, but is pipelined so that the next frame is started before the previous one completed, hence being able to achieve 30fps).
Well that rather depends on when the encoder starts to spit out NALUs doesnt it?
If it takes 40msec for a complete frame it would then ideally output buffers at 10,20,30and40msec, which would then reduce transmission latency by a factor of nearly 4 on my highly bandwidth limited radio channel.
Same is for the decoder.

So if I had a latency of 40(encoder)+30(transmission)+30(decoder) = 100msec (this is about what I am seeing right now)
I could reduce this to 40+7.5+7.5 = 55msec
This is all well theoretical but seems very worth a try.
Worth a shot, yes, but I'm not sure how much of a saving you'll get in reality. Splitting the frame into slices also increases the setup overhead so it really isn't a linear thing.
As I remember it, motion estimation and the main encoding is all done within around 30ms, and it is CABAC that is the last stage and done in the overlapping time. I suspect slicing it will start CABAC earlier, but I really couldn't say for certain.
Software Engineer at Raspberry Pi Ltd. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

C_Payne
Posts: 5
Joined: Sun Nov 12, 2017 11:23 am

Re: OMX_IndexParamBrcmNALSSeparate

Tue Nov 14, 2017 10:02 am

I have tried OMX again today and suceeded. It is indeed necessary to set both:

Code: Select all

OMX_IndexConfigMinimiseFragmentation = 0 
OMX_IndexParamBrcmNALSSeparate = 1
In order to get one buffer per nalu.

Cant wait for the release in MMAL.

Just wondering where the translation MMAL->OMX happens.
I thought MMAL is just sitting completely on top of OMX.

Thats why i tried to add:

MMALOMX_PARAM_BOOLEAN_PORTLESS(MMAL_PARAMETER_VIDEO_ENCODE_NALSSeparate,
OMX_IndexParamBrcmNALSSeparate)
in /mmal/openmaxil/mmalomx_util_params_video.c

Is there any way to set OMX parameters directly in MMAL? Or is there always closed source firmware modifications necessary?

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18476
Joined: Wed Dec 04, 2013 11:27 am

Re: OMX_IndexParamBrcmNALSSeparate

Tue Nov 14, 2017 10:45 am

C_Payne wrote:
Tue Nov 14, 2017 10:02 am
I have tried OMX again today and suceeded. It is indeed necessary to set both:

Code: Select all

OMX_IndexConfigMinimiseFragmentation = 0 
OMX_IndexParamBrcmNALSSeparate = 1
In order to get one buffer per nalu.
Useful to know. Thanks for investigating.
C_Payne wrote:Cant wait for the release in MMAL.

Just wondering where the translation MMAL->OMX happens.
I thought MMAL is just sitting completely on top of OMX.

Thats why i tried to add:

MMALOMX_PARAM_BOOLEAN_PORTLESS(MMAL_PARAMETER_VIDEO_ENCODE_NALSSeparate,
OMX_IndexParamBrcmNALSSeparate)
in /mmal/openmaxil/mmalomx_util_params_video.c

Is there any way to set OMX parameters directly in MMAL? Or is there always closed source firmware modifications necessary?
MMAL and IL sit on top of an API called RIL (for Reduced IL).
Within IL there is a huge amount of stuff related to state and buffer management that is common to all components, so there is a common core block that handles that stuff and talks RIL down to the components.
MMAL came in and reused that RIL API, but replaced IL common code. Within the MMAL adaptation code there is a set of conversion functions for parameters, and that is what I'm updating. No, there's no way to bypass it and send an IL parameter into RIL via MMAL. I'll raise an issue to go through and confirm that we have a mapping for all the parameters as these ones are a touch annoying.

The interface/mmal/openmaxil library was added later, actually with the intention of making MMAL look like IL as that is what Android used for the codec API. It never actually got deployed as plans changed. Other than a couple of demo apps I suspect it has never been used! If we can add a deprecated tag to it then it would make sense.
Software Engineer at Raspberry Pi Ltd. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

C_Payne
Posts: 5
Joined: Sun Nov 12, 2017 11:23 am

Re: OMX_IndexParamBrcmNALSSeparate

Tue Nov 14, 2017 10:16 pm

Thank you for clearing that up and for ur prompt help. It is very much appreciated.

Chris

Consti10^100
Posts: 67
Joined: Mon Aug 31, 2015 6:54 pm

Re: OMX_IndexParamBrcmNALSSeparate

Wed Jan 20, 2021 5:47 pm

Hello,
Sliced encoding was recently added as a prameter to raspivid. However, I cannot find OMX_IndexParamBrcmNALSSeparate or its MMAL equivalent, respective. Did this parameter get lost ?

I am so keen on it, because my testing shows that adding sliced encoding actually increases encoding latency quite a lot.
For 720p60, without sliced encoding latency is ~20ms at 500mhz isp/h264 core frequency
But for 720p60 with slices=4 for example the encoding latency increases to 50-60ms.

I think that sliced encoding should give lower encoding latencies in general, since the pipelining of the encoder should have less of an effect on encoding latency.

Consti10^100
Posts: 67
Joined: Mon Aug 31, 2015 6:54 pm

Re: OMX_IndexParamBrcmNALSSeparate

Wed Jan 20, 2021 6:09 pm

So the parameter is there in MMAL:
MMAL_PARAMETER_VIDEO_ENCODE_SEPARATE_NAL_BUFS
https://github.com/raspberrypi/userland ... deo.h#L105
It is just not set in raspivid. However, setting it has no effect on the encoding latency.

Consti10^100
Posts: 67
Joined: Mon Aug 31, 2015 6:54 pm

Re: OMX_IndexParamBrcmNALSSeparate

Wed Jan 20, 2021 6:44 pm

6by9 wrote:
Mon Nov 13, 2017 12:58 pm
Having checked the firmware source, you are right that OMX_IndexParamBrcmNALSSeparate is the key parameter, and it isn't currently broken out in MMAL (I'm just fixing that one).
Plumbing now done, and I'm getting buffers out of raspivid along the lines of:

Code: Select all

Buffer length 18, flags 1020,pts 0
Buffer length 9, flags 1024,pts 0
Buffer length 6524, flags 1008,pts 193960183
Buffer length 6047, flags 100C,pts 193960183
Buffer length 5925, flags 1000,pts 193994888
Buffer length 5190, flags 1004,pts 193994888
Buffer length 6172, flags 1000,pts 194029592
Buffer length 4793, flags 1004,pts 194029592
Buffer length 4552, flags 1000,pts 194064295
Buffer length 2495, flags 1004,pts 194064295
Buffer length 3963, flags 1000,pts 194098999
Buffer length 2715, flags 1004,pts 194098999
Buffer length 4158, flags 1000,pts 194133703
Buffer length 3098, flags 1004,pts 194133703
Buffer length 3306, flags 1000,pts 194168407
Buffer length 2564, flags 1004,pts 194168407
Buffer length 3793, flags 1000,pts 194203110
Buffer length 2570, flags 1004,pts 194203110
Buffer length 2924, flags 1000,pts 194237814
(640x480, and 15 MB rows/slice, so two slices per frame).

We've gained a new define of MMAL_BUFFER_HEADER_FLAG_NAL_END (1<<12).
My only reservation is if people have been naughty and not treated the flags field as a bitfield then their app will break, but I can't think of a way to leave the OMX_BUFFERFLAG_ENDOFNAL flag on all the IL buffers without doing a nasty hack in mmal_il to only set it if OMX_BUFFERFLAG_ENDOFFRAME isn't set.

I'll submit my firmware patches, so they should come out in the next rpi-update release.

I will say that I'm not convinced how much you'll improve the latency. Yes the data is available earlier for transmission, and so the decoder can start a smidge earlier, but I suspect you'll be looking at maybe 10ms improvement assuming 1080P. (Encoding a whole 1080P frame takes about 40ms, but is pipelined so that the next frame is started before the previous one completed, hence being able to achieve 30fps).
Does the use of slices have an effect on the pipelining ? Since in my testing, enabling it rather increases encoding latency than decreasing it.

Consti10^100
Posts: 67
Joined: Mon Aug 31, 2015 6:54 pm

Re: OMX_IndexParamBrcmNALSSeparate

Wed Jan 20, 2021 7:06 pm

Since all this is aimed at reducing latency: I have found an important workaround that greatly decreases 720p60 encoding latency:
https://github.com/OpenHD/Open.HD/issues/406

In stock raspbian, and at 720p60, the isp/h264 encoder is clocked at 300mhz for this resolution. Perhaps this is the minimum frequency needed to do 720p60. However, by manually clocking the isp/encoder (e.g. 400mhz) the encoding latency for 720p60 is reduced from ~44ms to ~22ms.

Note that 400mhz is well in the supported range for most pi models, but as already mentioned earlier, this frequency has to be set manually.

Consti10^100
Posts: 67
Joined: Mon Aug 31, 2015 6:54 pm

Re: OMX_IndexParamBrcmNALSSeparate

Mon Jan 25, 2021 1:36 pm

Hello,
I've added the "sliced" mode to gst-rpicamsrc:
https://github.com/Consti10/gst-rpicams ... re.c#L1440

However, when I enable sliced encoding the encoding latency rather increases than decreases, making it kinda useless for our use case.

Am I missing something ?
Also, since the theoretical latency improvement of sliced encoding should be higher for low framerate modes, it would be great if the rpi could do sliced encoding for resolutions higher than 720p.

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 18476
Joined: Wed Dec 04, 2013 11:27 am

Re: OMX_IndexParamBrcmNALSSeparate

Mon Jan 25, 2021 2:14 pm

Clocks get boosted by what is required for the defined use case, not to the max possible for the platform. You may be able to boost it further if you wish to burn more power.

Something in the codec setup didn't like >1280 wide. There was a brief investigation previously but it wasn't obvious even which block was stalling. It is such a niche use case that it wasn't going to get a huge amount of time invested in it. The same is true for getting the last macroblock column working for 2048 wide encoding instead of a max of 2032.

As to the latency difference, there is no guarantee within the codec setup that the NALs get passed as soon as generated. Generally getting an output buffer would mean that the input frames are released for reuse which is not the case until the end of frame (not end of NAL). Again it's not one that warrants huge amounts of effort to be invested.
Software Engineer at Raspberry Pi Ltd. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

16 posts • Page 1 of 1

Return to "OpenMAX"

AltStyle によって変換されたページ (->オリジナル) /