1

Whist using "ffmpeg version n4.0.1" I've noticed that on CentOS6 host within a VMware container the video transcoding takes almost twice as long as "ffmpeg version 2.2.1".

Benchmarks below, ran 3 iterations, fastest time only below.

File tested with is the same 2.8mb stock video.

All VM's running CentOS release 6.10.

| VM | FFMpeg version| Time |
| Virtualbox | 4.0.1 | 11 secs |
| Virtualbox | 2.2.1 | 18 secs |
| VMWare | 2.2.1 | 29 secs |
| VMWare | 4.0.1 | 1 minuite |

I litterally have no idea why this is different and can not find any logical reason for this - any FFMpeg / VMWare boffins out there have any clue what might be going on?

4.01 is compiled from source, 2.2.1 is as per EPEL.

Just to add - VMWare cpu info is as follows -

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 1064
cpu MHz : 2100.000
cache size : 15360 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Versus VirtualBox CPU info reported as

rocessor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
stepping : 9
cpu MHz : 2903.925
cache size : 8192 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand lahf_lm abm 3dnowprefetch fsgsbase avx2 invpcid rdseed
bogomips : 5807.85
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

The above demonstrates that the newer version performs better and worse on different architectures.

To be 100% clear I've re-ran some benchmarks below - this is different VM's in the same cloud with identical setups -

FFMPeg 4 - 122.861 seconds

[root@proofing test]# ./benchmark.sh
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 1064
cpu MHz : 2100.000
cache size : 15360 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 1064
cpu MHz : 2100.000
cache size : 15360 KB
physical id : 2
siblings : 1
core id : 0
cpu cores : 1
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
ffmpeg version n4.0.1 Copyright (c) 2000-2018 the FFmpeg developers
 built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-23)
 configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static --extra-cflags=-I/root/ffmpeg_build/include --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs=-lpthread --extra-libs=-lm --bindir=/usr/bin --enable-gpl --enable-libfdk_aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libtheora --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree
 libavutil 56. 14.100 / 56. 14.100
 libavcodec 58. 18.100 / 58. 18.100
 libavformat 58. 12.100 / 58. 12.100
 libavdevice 58. 3.100 / 58. 3.100
 libavfilter 7. 16.100 / 7. 16.100
 libswscale 5. 1.100 / 5. 1.100
 libswresample 3. 1.100 / 3. 1.100
 libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/test/test.mov':
 Metadata:
 major_brand : isom
 minor_version : 1
 compatible_brands: isomavc1mp42
 creation_time : 2016年11月03日T20:11:18.000000Z
 Duration: 00:00:09.33, start: 0.000000, bitrate: 20807 kb/s
 Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 20805 kb/s, 30 fps, 30 tbr, 30 tbn, 60 tbc (default)
 Metadata:
 creation_time : 2016年11月03日T20:11:08.000000Z
Stream mapping:
 Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x2d05c40] using SAR=1/1
[libx264 @ 0x2d05c40] frame MB size (120x68)> level limit (1620)
[libx264 @ 0x2d05c40] DPB size (1 frames, 3133440 bytes)> level limit (0 frames, 3110400 bytes)
[libx264 @ 0x2d05c40] MB rate (244800)> level limit (40500)
[libx264 @ 0x2d05c40] using cpu capabilities: none!
[libx264 @ 0x2d05c40] profile Constrained Baseline, level 3.0
[libx264 @ 0x2d05c40] 264 - core 120 r2151 a3f4407 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=1:0:0 analyse=0x1:0x111 me=umh subme=8 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=50 rc=crf mbtree=1 crf=26.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=1500 vbv_bufsize=3000 crf_max=0.0 nal_hrd=none ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/root/test/out.mp4':
 Metadata:
 major_brand : isom
 minor_version : 1
 compatible_brands: isomavc1mp42
 encoder : Lavf58.12.100
 Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 30 fps, 15360 tbn, 30 tbc (default)
 Metadata:
 creation_time : 2016年11月03日T20:11:08.000000Z
 encoder : Lavc58.18.100 libx264
 Side data:
 cpb: bitrate max/min/avg: 1500000/0/0 buffer size: 3000000 vbv_delay: -1
[mp4 @ 0x2d04680] Starting second pass: moving the moov atom to the beginning of the file.0726x
frame= 280 fps=2.3 q=-1.0 Lsize= 1865kB time=00:00:09.30 bitrate=1642.5kbits/s speed=0.0757x
video:1863kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.103491%
[libx264 @ 0x2d05c40] frame I:2 Avg QP:37.09 size: 36746
[libx264 @ 0x2d05c40] frame P:278 Avg QP:38.97 size: 6594
[libx264 @ 0x2d05c40] mb I I16..4: 79.3% 0.0% 20.7%
[libx264 @ 0x2d05c40] mb P I16..4: 0.6% 0.0% 0.2% P16..4: 17.3% 2.7% 1.5% 0.0% 0.0% skip:77.7%
[libx264 @ 0x2d05c40] coded y,uvDC,uvAC intra: 30.9% 23.0% 0.2% inter: 3.0% 1.5% 0.0%
[libx264 @ 0x2d05c40] i16 v,h,dc,p: 33% 28% 9% 31%
[libx264 @ 0x2d05c40] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 4% 13% 16% 21% 14% 14% 8% 6%
[libx264 @ 0x2d05c40] i8c dc,h,v,p: 85% 8% 7% 1%
[libx264 @ 0x2d05c40] kb/s:1634.35
122.861 seconds to complete

FFMpeg 2 - 32.378 seconds

[root@staging test]# ./benchmark.sh
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 1064
cpu MHz : 2100.000
cache size : 15360 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
stepping : 4
microcode : 1064
cpu MHz : 2100.000
cache size : 15360 KB
physical id : 2
siblings : 1
core id : 0
cpu cores : 1
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep
bogomips : 4200.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
ffmpeg version 2.2.1 Copyright (c) 2000-2014 the FFmpeg developers
 built on Apr 13 2014 13:00:18 with gcc 4.4.6 (GCC) 20120305 (Red Hat 4.4.6-4)
 configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --mandir=/usr/share/man --enable-shared --enable-runtime-cpudetect --enable-gpl --enable-version3 --enable-postproc --enable-avfilter --enable-pthreads --enable-x11grab --enable-vdpau --disable-avisynth --enable-frei0r --enable-libopencv --enable-libdc1394 --enable-libgsm --enable-libmp3lame --enable-libnut --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --extra-cflags='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIC' --disable-stripping
 libavutil 52. 66.100 / 52. 66.100
 libavcodec 55. 52.102 / 55. 52.102
 libavformat 55. 33.100 / 55. 33.100
 libavdevice 55. 10.100 / 55. 10.100
 libavfilter 4. 2.100 / 4. 2.100
 libswscale 2. 5.102 / 2. 5.102
 libswresample 0. 18.100 / 0. 18.100
 libpostproc 52. 3.100 / 52. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/test/test.mov':
 Metadata:
 major_brand : isom
 minor_version : 1
 compatible_brands: isomavc1mp42
 creation_time : 2016年11月03日 20:11:18
 Duration: 00:00:09.33, start: 0.000000, bitrate: 20807 kb/s
 Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 20805 kb/s, 30 fps, 30 tbr, 30 tbn, 60 tbc (default)
 Metadata:
 creation_time : 2016年11月03日 20:11:08
[libx264 @ 0x2139060] using SAR=1/1
[libx264 @ 0x2139060] frame MB size (120x68)> level limit (1620)
[libx264 @ 0x2139060] DPB size (5 frames, 40800 mbs)> level limit (0 frames, 8100 mbs)
[libx264 @ 0x2139060] MB rate (244800)> level limit (40500)
[libx264 @ 0x2139060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0x2139060] profile Constrained Baseline, level 3.0
[libx264 @ 0x2139060] 264 - core 142 - H.264/MPEG-4 AVC codec - Copyleft 2003-2014 - http://www.videolan.org/x264.html - options: cabac=0 ref=5 deblock=1:0:0 analyse=0x1:0x111 me=umh subme=8 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=50 rc=crf mbtree=1 crf=26.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=1500 vbv_bufsize=3000 crf_max=0.0 nal_hrd=none filler=0 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/root/test/out.mp4':
 Metadata:
 major_brand : isom
 minor_version : 1
 compatible_brands: isomavc1mp42
 encoder : Lavf55.33.100
 Stream #0:0(und): Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, max. 1500 kb/s, 15360 tbn, 30 tbc (default)
 Metadata:
 creation_time : 2016年11月03日 20:11:08
Stream mapping:
 Stream #0:0 -> #0:0 (h264 -> libx264)
Press [q] to stop, [?] for help
[mp4 @ 0x21357e0] Starting second pass: moving the moov atom to the beginning of the file
frame= 280 fps=8.7 q=-1.0 Lsize= 1861kB time=00:00:09.33 bitrate=1633.7kbits/s
video:1859kB audio:0kB subtitle:0 data:0 global headers:0kB muxing overhead 0.100889%
[libx264 @ 0x2139060] frame I:2 Avg QP:36.29 size: 46508
[libx264 @ 0x2139060] frame P:278 Avg QP:38.34 size: 6512
[libx264 @ 0x2139060] mb I I16..4: 75.4% 0.0% 24.6%
[libx264 @ 0x2139060] mb P I16..4: 0.5% 0.0% 0.2% P16..4: 18.6% 2.7% 1.8% 0.0% 0.0% skip:76.3%
[libx264 @ 0x2139060] coded y,uvDC,uvAC intra: 31.4% 22.8% 0.3% inter: 2.7% 1.1% 0.0%
[libx264 @ 0x2139060] i16 v,h,dc,p: 33% 28% 9% 30%
[libx264 @ 0x2139060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 4% 13% 17% 20% 14% 14% 8% 6%
[libx264 @ 0x2139060] i8c dc,h,v,p: 85% 7% 7% 1%
[libx264 @ 0x2139060] ref P L0: 77.6% 9.9% 8.2% 1.8% 2.5%
[libx264 @ 0x2139060] kb/s:1631.45
32.378 seconds to complete
asked Jul 16, 2018 at 15:34
5
  • Are you sure you're doing proper apples-to-apples comparisons? For example, there are two entirely different CPUs (Xeon vs i7), and one is clearly configured to use multiple cores whereas one isn't. Were the ffmpeg 4 vs ffmpeg 2 comparisons performed in the same VM on the same host? Commented Jul 18, 2018 at 8:14
  • The CPU's are different yes - and that may be a factor - however the comparisons are on the same host - and the ffmpeg command was set to use 1 core only, comparatively the test on the same machines are proportionately slower on the Xeon/VMware setup than the local i7/Virtualbox - the core count is misleading and I dont think setting these up identical will make much difference, however I will check to be sure (i.e. as were looking at the comparison of one version to the another comparatively on the same machine). Testing again today. Commented Aug 3, 2018 at 9:19
  • I've added a test on identical VM's to demonstrate the point - see the latests two tests on VMWare with full detail, when ran on alternative setup (virtualbox) the opposite is experienced. Commented Aug 3, 2018 at 9:41
  • The full outputs are shown above already, command was a simple transcode of an mov to mp4 for web playback - same command used in both tests. Commented Aug 14, 2018 at 10:28
  • using cpu capabilities: none! indicates that your linked libx264 may have been compiled with --disable-asm or some other issue causing it not to use assembly optimizations. Commented Aug 15, 2018 at 0:54

1 Answer 1

0

Testing with 4.0.2 the issue has gone away, whilst its now fractionally faster its nowhere near as fast as on the local setup but that we can put down to CPU.

I can only conclude that whatever was causing the slowdown was a limited to that particular version - (4.0.1).

Whilst this is not the answer as to what was the cause since a minor version update fixes the problem I dont see the millage in trying to work out the cause.

answered Aug 14, 2018 at 10:29

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.