Whist using "ffmpeg version n4.0.1" I've noticed that on CentOS6 host within a VMware container the video transcoding takes almost twice as long as "ffmpeg version 2.2.1".
Benchmarks below, ran 3 iterations, fastest time only below.
File tested with is the same 2.8mb stock video.
All VM's running CentOS release 6.10.
| VM | FFMpeg version| Time | | Virtualbox | 4.0.1 | 11 secs | | Virtualbox | 2.2.1 | 18 secs | | VMWare | 2.2.1 | 29 secs | | VMWare | 4.0.1 | 1 minuite |
I litterally have no idea why this is different and can not find any logical reason for this - any FFMpeg / VMWare boffins out there have any clue what might be going on?
4.01 is compiled from source, 2.2.1 is as per EPEL.
Just to add - VMWare cpu info is as follows -
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 1064 cpu MHz : 2100.000 cache size : 15360 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
Versus VirtualBox CPU info reported as
rocessor : 0 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz stepping : 9 cpu MHz : 2903.925 cache size : 8192 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx rdrand lahf_lm abm 3dnowprefetch fsgsbase avx2 invpcid rdseed bogomips : 5807.85 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
The above demonstrates that the newer version performs better and worse on different architectures.
To be 100% clear I've re-ran some benchmarks below - this is different VM's in the same cloud with identical setups -
FFMPeg 4 - 122.861 seconds
[root@proofing test]# ./benchmark.sh processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 1064 cpu MHz : 2100.000 cache size : 15360 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 1064 cpu MHz : 2100.000 cache size : 15360 KB physical id : 2 siblings : 1 core id : 0 cpu cores : 1 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ffmpeg version n4.0.1 Copyright (c) 2000-2018 the FFmpeg developers built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-23) configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static --extra-cflags=-I/root/ffmpeg_build/include --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs=-lpthread --extra-libs=-lm --bindir=/usr/bin --enable-gpl --enable-libfdk_aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libtheora --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree libavutil 56. 14.100 / 56. 14.100 libavcodec 58. 18.100 / 58. 18.100 libavformat 58. 12.100 / 58. 12.100 libavdevice 58. 3.100 / 58. 3.100 libavfilter 7. 16.100 / 7. 16.100 libswscale 5. 1.100 / 5. 1.100 libswresample 3. 1.100 / 3. 1.100 libpostproc 55. 1.100 / 55. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/test/test.mov': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1mp42 creation_time : 2016年11月03日T20:11:18.000000Z Duration: 00:00:09.33, start: 0.000000, bitrate: 20807 kb/s Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 20805 kb/s, 30 fps, 30 tbr, 30 tbn, 60 tbc (default) Metadata: creation_time : 2016年11月03日T20:11:08.000000Z Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264)) Press [q] to stop, [?] for help [libx264 @ 0x2d05c40] using SAR=1/1 [libx264 @ 0x2d05c40] frame MB size (120x68)> level limit (1620) [libx264 @ 0x2d05c40] DPB size (1 frames, 3133440 bytes)> level limit (0 frames, 3110400 bytes) [libx264 @ 0x2d05c40] MB rate (244800)> level limit (40500) [libx264 @ 0x2d05c40] using cpu capabilities: none! [libx264 @ 0x2d05c40] profile Constrained Baseline, level 3.0 [libx264 @ 0x2d05c40] 264 - core 120 r2151 a3f4407 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=1:0:0 analyse=0x1:0x111 me=umh subme=8 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=50 rc=crf mbtree=1 crf=26.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=1500 vbv_bufsize=3000 crf_max=0.0 nal_hrd=none ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to '/root/test/out.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1mp42 encoder : Lavf58.12.100 Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 30 fps, 15360 tbn, 30 tbc (default) Metadata: creation_time : 2016年11月03日T20:11:08.000000Z encoder : Lavc58.18.100 libx264 Side data: cpb: bitrate max/min/avg: 1500000/0/0 buffer size: 3000000 vbv_delay: -1 [mp4 @ 0x2d04680] Starting second pass: moving the moov atom to the beginning of the file.0726x frame= 280 fps=2.3 q=-1.0 Lsize= 1865kB time=00:00:09.30 bitrate=1642.5kbits/s speed=0.0757x video:1863kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.103491% [libx264 @ 0x2d05c40] frame I:2 Avg QP:37.09 size: 36746 [libx264 @ 0x2d05c40] frame P:278 Avg QP:38.97 size: 6594 [libx264 @ 0x2d05c40] mb I I16..4: 79.3% 0.0% 20.7% [libx264 @ 0x2d05c40] mb P I16..4: 0.6% 0.0% 0.2% P16..4: 17.3% 2.7% 1.5% 0.0% 0.0% skip:77.7% [libx264 @ 0x2d05c40] coded y,uvDC,uvAC intra: 30.9% 23.0% 0.2% inter: 3.0% 1.5% 0.0% [libx264 @ 0x2d05c40] i16 v,h,dc,p: 33% 28% 9% 31% [libx264 @ 0x2d05c40] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 4% 13% 16% 21% 14% 14% 8% 6% [libx264 @ 0x2d05c40] i8c dc,h,v,p: 85% 8% 7% 1% [libx264 @ 0x2d05c40] kb/s:1634.35 122.861 seconds to complete
FFMpeg 2 - 32.378 seconds
[root@staging test]# ./benchmark.sh processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 1064 cpu MHz : 2100.000 cache size : 15360 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 1064 cpu MHz : 2100.000 cache size : 15360 KB physical id : 2 siblings : 1 core id : 0 cpu cores : 1 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts xtopology tsc_reliable nonstop_tsc aperfmperf unfair_spinlock pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand hypervisor lahf_lm arat epb xsaveopt pln pts dtherm pti retpoline fsgsbase smep bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ffmpeg version 2.2.1 Copyright (c) 2000-2014 the FFmpeg developers built on Apr 13 2014 13:00:18 with gcc 4.4.6 (GCC) 20120305 (Red Hat 4.4.6-4) configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --mandir=/usr/share/man --enable-shared --enable-runtime-cpudetect --enable-gpl --enable-version3 --enable-postproc --enable-avfilter --enable-pthreads --enable-x11grab --enable-vdpau --disable-avisynth --enable-frei0r --enable-libopencv --enable-libdc1394 --enable-libgsm --enable-libmp3lame --enable-libnut --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --enable-libxvid --extra-cflags='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIC' --disable-stripping libavutil 52. 66.100 / 52. 66.100 libavcodec 55. 52.102 / 55. 52.102 libavformat 55. 33.100 / 55. 33.100 libavdevice 55. 10.100 / 55. 10.100 libavfilter 4. 2.100 / 4. 2.100 libswscale 2. 5.102 / 2. 5.102 libswresample 0. 18.100 / 0. 18.100 libpostproc 52. 3.100 / 52. 3.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/root/test/test.mov': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1mp42 creation_time : 2016年11月03日 20:11:18 Duration: 00:00:09.33, start: 0.000000, bitrate: 20807 kb/s Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 20805 kb/s, 30 fps, 30 tbr, 30 tbn, 60 tbc (default) Metadata: creation_time : 2016年11月03日 20:11:08 [libx264 @ 0x2139060] using SAR=1/1 [libx264 @ 0x2139060] frame MB size (120x68)> level limit (1620) [libx264 @ 0x2139060] DPB size (5 frames, 40800 mbs)> level limit (0 frames, 8100 mbs) [libx264 @ 0x2139060] MB rate (244800)> level limit (40500) [libx264 @ 0x2139060] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX [libx264 @ 0x2139060] profile Constrained Baseline, level 3.0 [libx264 @ 0x2139060] 264 - core 142 - H.264/MPEG-4 AVC codec - Copyleft 2003-2014 - http://www.videolan.org/x264.html - options: cabac=0 ref=5 deblock=1:0:0 analyse=0x1:0x111 me=umh subme=8 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=50 rc=crf mbtree=1 crf=26.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=1500 vbv_bufsize=3000 crf_max=0.0 nal_hrd=none filler=0 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to '/root/test/out.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1mp42 encoder : Lavf55.33.100 Stream #0:0(und): Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, max. 1500 kb/s, 15360 tbn, 30 tbc (default) Metadata: creation_time : 2016年11月03日 20:11:08 Stream mapping: Stream #0:0 -> #0:0 (h264 -> libx264) Press [q] to stop, [?] for help [mp4 @ 0x21357e0] Starting second pass: moving the moov atom to the beginning of the file frame= 280 fps=8.7 q=-1.0 Lsize= 1861kB time=00:00:09.33 bitrate=1633.7kbits/s video:1859kB audio:0kB subtitle:0 data:0 global headers:0kB muxing overhead 0.100889% [libx264 @ 0x2139060] frame I:2 Avg QP:36.29 size: 46508 [libx264 @ 0x2139060] frame P:278 Avg QP:38.34 size: 6512 [libx264 @ 0x2139060] mb I I16..4: 75.4% 0.0% 24.6% [libx264 @ 0x2139060] mb P I16..4: 0.5% 0.0% 0.2% P16..4: 18.6% 2.7% 1.8% 0.0% 0.0% skip:76.3% [libx264 @ 0x2139060] coded y,uvDC,uvAC intra: 31.4% 22.8% 0.3% inter: 2.7% 1.1% 0.0% [libx264 @ 0x2139060] i16 v,h,dc,p: 33% 28% 9% 30% [libx264 @ 0x2139060] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 4% 4% 13% 17% 20% 14% 14% 8% 6% [libx264 @ 0x2139060] i8c dc,h,v,p: 85% 7% 7% 1% [libx264 @ 0x2139060] ref P L0: 77.6% 9.9% 8.2% 1.8% 2.5% [libx264 @ 0x2139060] kb/s:1631.45 32.378 seconds to complete
1 Answer 1
Testing with 4.0.2 the issue has gone away, whilst its now fractionally faster its nowhere near as fast as on the local setup but that we can put down to CPU.
I can only conclude that whatever was causing the slowdown was a limited to that particular version - (4.0.1).
Whilst this is not the answer as to what was the cause since a minor version update fixes the problem I dont see the millage in trying to work out the cause.
using cpu capabilities: none!indicates that your linked libx264 may have been compiled with--disable-asmor some other issue causing it not to use assembly optimizations.