H.264 image encoding using Media Foundation .NET

Question 1

We have some video analysis software written in c# .NET that uses OpenCV via the Emgu.CV wrappers. The video frames come from a GiGEVision camera (not a normal capture device) which are then analysed, graphically annotated, and then encoded to a video file.

Previously we have used the OpenCV VideoWriter class to encode the video. However, the VideoWriter class uses video-for-windows codecs and often corrupts the indexing of the output file.

After much searching I am yet to find another .NET implementation of encoding frames to H264 video, so I decided to write my own. The code below is based on the MediaFoundation C++ SinkWriter tutorial and implemented in .NET using the MediaFoundation.NET wrapper.

The main changes I have made are:

Everything is in a single thread, due to problems accessing the WriteFrame method from other threads. I believe this is due to interacting with the underlying COM object but I've no experience with that.
New frames are passed to the thread using a BlockingCollection
IDisposable was implemented to make sure Stop() is called.

Some questions:

Is the thread implementation using CancellationTokenSource appropriate?
Is BlockingCollection the best way to pass the frames in?
Is it possible to reuse the IMFMediaBuffer and IMFSample objects? If so, should I do this? Will it improve efficiency?
Is the implementation of IDisposable correct?

Code:

class MFVideoEncoder : IDisposable
{
 private int videoBitRate = 800000;
 const int VIDEO_FPS = 30;
 const int BYTES_PER_PIXEL = 3;
 const long TICKS_PER_SECOND = 10 * 1000 * 1000;
 const long VIDEO_FRAME_DURATION = TICKS_PER_SECOND / VIDEO_FPS;
 public bool HasStarted = false;
 private IMFSinkWriter sinkWriter;
 private int streamIndex = 0;
 private int frameSizeBytes = 0;
 private long frames = 0;
 private int videoWidth = 0;
 private int videoHeight = 0;
 private string outputFile = "//output.mp4";
 private CancellationTokenSource encodeTaskCTS;
 private Thread encodeThread;
 BlockingCollection<Emgu.CV.Mat> FrameQueue = new BlockingCollection<Emgu.CV.Mat>();
 public MFVideoEncoder()
 {
 }
 public void Start(String outputFile, int width, int height, int bitRate)
 {
 this.videoWidth = width;
 this.videoHeight = height;
 this.outputFile = outputFile;
 this.videoBitRate = bitRate;
 frames = 0;
 frameSizeBytes = BYTES_PER_PIXEL * videoWidth * videoHeight;
 HasStarted = false;
 encodeTaskCTS?.Dispose();
 encodeTaskCTS = new CancellationTokenSource();
 var token = encodeTaskCTS.Token;
 encodeThread = new Thread(() => EncodeTask(token));
 encodeThread.Priority = ThreadPriority.Highest;
 //encodeThread.SetApartmentState(ApartmentState.STA);
 encodeThread.Start();
 }
 public void Start(String outputFile, int width, int height, double compressionFactor)
 {
 int bitRate = (int) (VIDEO_FPS * width * height * BYTES_PER_PIXEL / compressionFactor);
 Console.WriteLine("# Bit rate: {0}", bitRate);
 Start(outputFile, width, height, bitRate);
 }
 public void Stop()
 {
 if (HasStarted)
 {
 encodeTaskCTS.Cancel(); 
 }
 }
 public void AddFrame(Mat frame)
 {
 Mat flippedFrame = new Mat(frame.Size, frame.Depth, frame.NumberOfChannels);
 CvInvoke.Flip(frame, flippedFrame, Emgu.CV.CvEnum.FlipType.Vertical);
 FrameQueue.TryAdd(flippedFrame);
 }
 private void EncodeTask(CancellationToken token)
 {
 Mat frame;
 // Start up
 int hr = MFExtern.MFStartup(0x00020070, MFStartup.Full);
 if (Succeeded(hr))
 {
 hr = InitializeSinkWriter(outputFile, videoWidth, videoHeight);
 }
 HasStarted = Succeeded(hr);
 // Check encoder running
 if (!HasStarted)
 {
 Console.WriteLine("! Encode thread didn't start");
 return;
 }
 //Write frames
 var exit = false;
 while (!exit)
 {
 try
 {
 token.ThrowIfCancellationRequested();
 if (FrameQueue.TryTake(out frame, 200))
 {
 WriteFrame(frame);
 }
 }
 catch (Exception ex)
 {
 Console.WriteLine("! Thread exit: " + ex.Message);
 exit = true;
 }
 }
 //Clean up
 sinkWriter.Finalize_();
 COMBase.SafeRelease(sinkWriter);
 MFExtern.MFShutdown();
 }
 private int InitializeSinkWriter(String outputFile, int videoWidth, int videoHeight)
 { 
 IMFMediaType mediaTypeIn = null;
 IMFMediaType mediaTypeOut = null;
 IMFAttributes attributes = null;
 int hr = 0;
 if (Succeeded(hr)) hr = MFExtern.MFCreateAttributes(out attributes, 1);
 if (Succeeded(hr)) hr = attributes.SetUINT32(MFAttributesClsid.MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, 1);
 //if (Succeeded(hr)) hr = attributes.SetUINT32(MFAttributesClsid.MF_SINK_WRITER_DISABLE_THROTTLING, 1);
 if (Succeeded(hr)) hr = attributes.SetUINT32(MFAttributesClsid.MF_LOW_LATENCY, 1);
 // Create the sink writer 
 if (Succeeded(hr)) hr = MFExtern.MFCreateSinkWriterFromURL(outputFile, null, attributes, out sinkWriter);
 // Create the output type
 if (Succeeded(hr)) hr = MFExtern.MFCreateMediaType(out mediaTypeOut);
 if (Succeeded(hr)) hr = mediaTypeOut.SetGUID(MFAttributesClsid.MF_MT_MAJOR_TYPE, MFMediaType.Video);
 if (Succeeded(hr)) hr = mediaTypeOut.SetGUID(MFAttributesClsid.MF_MT_SUBTYPE, MFMediaType.H264);
 if (Succeeded(hr)) hr = mediaTypeOut.SetUINT32(MFAttributesClsid.MF_MT_AVG_BITRATE, videoBitRate);
 if (Succeeded(hr)) hr = mediaTypeOut.SetUINT32(MFAttributesClsid.MF_MT_INTERLACE_MODE, (int) MFVideoInterlaceMode.Progressive);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeSize(mediaTypeOut, MFAttributesClsid.MF_MT_FRAME_SIZE, videoWidth, videoHeight);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeRatio(mediaTypeOut, MFAttributesClsid.MF_MT_FRAME_RATE, VIDEO_FPS, 1);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeRatio(mediaTypeOut, MFAttributesClsid.MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
 if (Succeeded(hr)) hr = sinkWriter.AddStream(mediaTypeOut, out streamIndex);
 // Create the input type 
 if (Succeeded(hr)) hr = MFExtern.MFCreateMediaType(out mediaTypeIn);
 if (Succeeded(hr)) hr = mediaTypeIn.SetGUID(MFAttributesClsid.MF_MT_MAJOR_TYPE, MFMediaType.Video);
 if (Succeeded(hr)) hr = mediaTypeIn.SetGUID(MFAttributesClsid.MF_MT_SUBTYPE, MFMediaType.RGB24);
 if (Succeeded(hr)) hr = mediaTypeIn.SetUINT32(MFAttributesClsid.MF_MT_INTERLACE_MODE, (int)MFVideoInterlaceMode.Progressive);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeSize(mediaTypeIn, MFAttributesClsid.MF_MT_FRAME_SIZE, videoWidth, videoHeight);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeRatio(mediaTypeIn, MFAttributesClsid.MF_MT_FRAME_RATE, VIDEO_FPS, 1);
 if (Succeeded(hr)) hr = MFExtern.MFSetAttributeRatio(mediaTypeIn, MFAttributesClsid.MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
 if (Succeeded(hr)) hr = sinkWriter.SetInputMediaType(streamIndex, mediaTypeIn, null);
 // Start accepting data
 if (Succeeded(hr)) hr = sinkWriter.BeginWriting();
 COMBase.SafeRelease(mediaTypeIn);
 COMBase.SafeRelease(mediaTypeOut);
 return hr; 
 }
 private int WriteFrame(Mat frame)
 {
 if (!HasStarted) return -1;
 IMFSample sample = null;
 IMFMediaBuffer buffer = null;
 IntPtr data = new IntPtr();
 int bufferMaxLength;
 int bufferCurrentLength;
 int hr = MFExtern.MFCreateMemoryBuffer(frameSizeBytes, out buffer);
 if (Succeeded(hr)) hr = buffer.Lock(out data, out bufferMaxLength, out bufferCurrentLength);
 if (Succeeded(hr))
 {
 using (AutoPinner ap = new AutoPinner(frame.Data))
 {
 hr = MFExtern.MFCopyImage(data, videoWidth * BYTES_PER_PIXEL, frame.DataPointer, videoWidth * BYTES_PER_PIXEL, videoWidth * BYTES_PER_PIXEL, videoHeight);
 }
 }
 if (Succeeded(hr)) hr = buffer.Unlock();
 if (Succeeded(hr)) hr = buffer.SetCurrentLength(frameSizeBytes);
 if (Succeeded(hr)) hr = MFExtern.MFCreateSample(out sample);
 if (Succeeded(hr)) hr = sample.AddBuffer(buffer);
 if (Succeeded(hr)) hr = sample.SetSampleTime(TICKS_PER_SECOND * frames / VIDEO_FPS);
 if (Succeeded(hr)) hr = sample.SetSampleDuration(VIDEO_FRAME_DURATION);
 if (Succeeded(hr)) hr = sinkWriter.WriteSample(streamIndex, sample);
 if (Succeeded(hr)) frames++;
 COMBase.SafeRelease(sample);
 COMBase.SafeRelease(buffer);
 return hr;
 }
 private bool Succeeded(int hr)
 {
 return hr >= 0;
 }
 #region IDisposable Support
 private bool disposedValue = false;
 protected virtual void Dispose(bool disposing)
 {
 if (!disposedValue)
 {
 if (disposing)
 {
 if (HasStarted)
 {
 Stop();
 }
 }
 disposedValue = true;
 }
 }
 public void Dispose()
 {
 Dispose(true);
 }
 #endregion
}

Question 2

What all additional resources does this require to get working besides MediaFoundation.NET, EmguCV? For example, what is AutoPinner?

Question 3

@geometrikal, did you ever implement the changes suggested? If so, would you be willing, to post your updated code / link to a to a repository containing it?

Question 4

@BradleyMoxon-Holt No, we ended up sticking with the opencv version. It was faster for the same bitrate, and we could control the speed by changing the compression in the codec settings, and this was easier for the users. The problem with corrupted files was due to not waiting for all the frames to be written before disposing the VideoWriter. With the answer below, IIRC the first point was important.

Question 5

Well I got this code working, so kudos for writing code that works. I don't have a lot of experience with Media Foundation or EmguCV, but I did want to share a few observations on your implementation.

It appears to be a feature of this implementation that Start and Stop can be called more than once, possible to produce several output files over the lifecycle of a single MFVideoEncoder instance. However, this is weird and probably not completely implemented.
- What happens when you call Start a second time? Your encodeTaskCTS is disposed, which is nice, but not cancelled first, and also you handed that cancellation token into another thread which is probably still alive, so there's a resource which will may be referenced after disposal.
- Several properties, such as HasStarted and frames, are written in both encodeThread and the calling thread. They're not set up for cross thread access (eg, safe order of operations with locking), so this is almost certainly not going to work correctly unless Start is called once and only once over the MFVideoEncoder lifecycle.
- encodeThread will be set to the reference to the newly created thread, so the previous thread will be GC'd at some unpredictable point in the future
- FrameQueue doesn't get cleared, so your subsequent output files will probably contain frames from previous batches.
- Recommendation: It doesn't appear to be particularly expensive to instantiate new MFVideoEncoder's, so I'd say refactor it to enforce 1 output file per lifecycle. The simplest diff would be to make Start and Stop private, and call Start from your constructor. Then everything will be a lot safer.
I think a BlockingCollection is perfectly fine to move frames onto the encoding thread.
- You can actually take advantage of it more than you are already. For example, you could use IsCompleted in your while loop in EncodeTask, instead of doing token.ThrowIfCancellationRequested(). This would have the extra benefit of draining out the frame queue when Stop is called, instead of instantaneously halting frame processing as it does now.
- It's possible to call AddFrame after Stop, currently. But if Stop calls FrameQueue.CompleteAdding(), then this will be prevented automatically.
- AddFrame returns void, but FrameQueue.TryAdd reports whether adding the frame was successful or not. So your API consumers don't know if their frames will be processed or not. If whatever your scenario is legitimately allows for frames to not be added for processing, then it sounds important enough to report back to the caller.
- Recommendation: Use more of BlockingCollection. Also, either change the signature of AddFrame to bool and return FrameQueue.TryAdd, or, use more of BlockingCollection & model add failures as exceptions thrown by BlockingCollection.
In EncodeTask, what should happen if MFStartup doesn't succeed? Right now everything proceeds as if there were no error, except there's no output. That's kind of weird.
- Recommendation: since MFStartup is critical to success, I'd throw an exception if it can't succeed.
What happens if you call Stop instantly after Start? It's possible the thread won't start up & initialize MFStartup and set HasStarted in time, and then Stop won't cancel the CTS, which means that when it does get called, your frame write loop will wait forever. Probably not what you want.
In EncodeTask, you're catching Exception in your loop. But the only exception you're expecting is OperationCanceledException, right?
- Recommendation: Only catch the exceptions you're expecting, and let the rest bubble.
You're pretty diligent about checking if return values succeeded for your Media Foundation calls, which is good. However, there's a couple cases where you're expecting out vars to exist even if the call wasn't successful. For example, in WriteFrame, what happens if MFExtern.MFCreateSample(out sample); fails? The adjacent calls will be skipped due to all the if (Succeeded(hr))'s, but what will COMBase.SafeRelease(sample); do if sample is unassigned?
- Recommendation: ensure that either uses of out variables can survive working with unassigned vars, or, guard those uses more cautiously.
All the if (Succeeded(hr))'s sure are awkward. I get not wanting to throw exceptions in hot code paths, but if everything is working normally, won't these calls all succeed?
- Recommendation: investigate swapping if (Succeeded(hr))'s for something like ThrowUnlessOK(hr), which you could wrap your MF calls with, to reduce the amount of conditions and noise associated with checking each return value. I can't say for sure that this would end up being better, but worth investigating.

Factor Mystic Factor Mystic 2484 silver badges8 bronze badges · Accepted Answer · 2016-11-05 22:15:54Z

Well I got this code working, so kudos for writing code that works. I don't have a lot of experience with Media Foundation or EmguCV, but I did want to share a few observations on your implementation.

It appears to be a feature of this implementation that Start and Stop can be called more than once, possible to produce several output files over the lifecycle of a single MFVideoEncoder instance. However, this is weird and probably not completely implemented.
- What happens when you call Start a second time? Your encodeTaskCTS is disposed, which is nice, but not cancelled first, and also you handed that cancellation token into another thread which is probably still alive, so there's a resource which will may be referenced after disposal.
- Several properties, such as HasStarted and frames, are written in both encodeThread and the calling thread. They're not set up for cross thread access (eg, safe order of operations with locking), so this is almost certainly not going to work correctly unless Start is called once and only once over the MFVideoEncoder lifecycle.
- encodeThread will be set to the reference to the newly created thread, so the previous thread will be GC'd at some unpredictable point in the future
- FrameQueue doesn't get cleared, so your subsequent output files will probably contain frames from previous batches.
- Recommendation: It doesn't appear to be particularly expensive to instantiate new MFVideoEncoder's, so I'd say refactor it to enforce 1 output file per lifecycle. The simplest diff would be to make Start and Stop private, and call Start from your constructor. Then everything will be a lot safer.
I think a BlockingCollection is perfectly fine to move frames onto the encoding thread.
- You can actually take advantage of it more than you are already. For example, you could use IsCompleted in your while loop in EncodeTask, instead of doing token.ThrowIfCancellationRequested(). This would have the extra benefit of draining out the frame queue when Stop is called, instead of instantaneously halting frame processing as it does now.
- It's possible to call AddFrame after Stop, currently. But if Stop calls FrameQueue.CompleteAdding(), then this will be prevented automatically.
- AddFrame returns void, but FrameQueue.TryAdd reports whether adding the frame was successful or not. So your API consumers don't know if their frames will be processed or not. If whatever your scenario is legitimately allows for frames to not be added for processing, then it sounds important enough to report back to the caller.
- Recommendation: Use more of BlockingCollection. Also, either change the signature of AddFrame to bool and return FrameQueue.TryAdd, or, use more of BlockingCollection & model add failures as exceptions thrown by BlockingCollection.
In EncodeTask, what should happen if MFStartup doesn't succeed? Right now everything proceeds as if there were no error, except there's no output. That's kind of weird.
- Recommendation: since MFStartup is critical to success, I'd throw an exception if it can't succeed.
What happens if you call Stop instantly after Start? It's possible the thread won't start up & initialize MFStartup and set HasStarted in time, and then Stop won't cancel the CTS, which means that when it does get called, your frame write loop will wait forever. Probably not what you want.
In EncodeTask, you're catching Exception in your loop. But the only exception you're expecting is OperationCanceledException, right?
- Recommendation: Only catch the exceptions you're expecting, and let the rest bubble.
You're pretty diligent about checking if return values succeeded for your Media Foundation calls, which is good. However, there's a couple cases where you're expecting out vars to exist even if the call wasn't successful. For example, in WriteFrame, what happens if MFExtern.MFCreateSample(out sample); fails? The adjacent calls will be skipped due to all the if (Succeeded(hr))'s, but what will COMBase.SafeRelease(sample); do if sample is unassigned?
- Recommendation: ensure that either uses of out variables can survive working with unassigned vars, or, guard those uses more cautiously.
All the if (Succeeded(hr))'s sure are awkward. I get not wanting to throw exceptions in hot code paths, but if everything is working normally, won't these calls all succeed?
- Recommendation: investigate swapping if (Succeeded(hr))'s for something like ThrowUnlessOK(hr), which you could wrap your MF calls with, to reduce the amount of conditions and noise associated with checking each return value. I can't say for sure that this would end up being better, but worth investigating.

Stack Exchange Network

H.264 image encoding using Media Foundation .NET

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

H.264 image encoding using Media Foundation .NET

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions