Intel Atom processors are appearing in a number of Android devices. When running existing ARM apps that are compiled to produce native code, rather than via Java, Android, for these devices, has a compatibility layer, called Houdini, that maps ARM instructions into X86 instructions. This is known to produce poor performance, with questions on battery drain.
My existing Android benchmarks were produced on Linux Ubuntu based PCs, using Eclipse. Many use a Java front end, with C/C++ code compiled using a Java Native Interface. These projects can be downloaded from Android Benchmarks.zip, Android Graphics Benchmarks.zip, Android NEON Benchmarks.zip, and Android MP Benchmarks.zip.
The JNI directory contains the C/C++ code and an Application.mk file that tells the compiler which platform to produce machine code for. The mk file, for original benchmarks, had parameters APP_ABI := armeabi-v7a, for ARM V7 CPUs, or = armeabi armeabi-v7a, to include earlier technology, the appropriate one being selected at run time.
I was surprised to find that gcc 4.8 provided parameters to produce native Intel code, and others. Those currently available are arm64-v8a, armeabi, armeabi-v7a, mips, mips64, x86 and x86-64. I use APP_ABI := all, to at least run the programs via ARM and Intel CPUs. Although the Atom is a 64 bit CPU, the currently installed Android 4.4 will not run x86-64 compilations. Eclipse projects for the new compilations are in Android Intel-ARM Benchmarks.zip
Initial comparisons provided are for tablets with Intel Atom, ARM Cortex-A9 and ARM Cortex-A15 CPUs, plus via BlueStacks Emulator running under Windows 7, on a 3.0 GHz Phenom, and Windows 8 on a 3.7 GHz Core i7. The results are for the original ARM only compilations and the latest with ARM and Intel native instructions.
These benchmarks should also run on 64 bit CPUs with 64 bit versions of Android. Some slight changes are being included in the programs to identify which section of the software is being used. They are being run on a Lenovo Tab 2 A8-50, 8 Inch Tablet, with a 1.3 GHz MediaTek mt8161 quad core processor (64 bit ARM Cortex-A53) and Android 5.0.2. Further details are in
Android 64 Bit Benchmarks.htm
and results are included below.
All the benchmarks were run on an Asus MeMO Pad 7 ME176CX that has a quad core Intel Atom Z3745, rated as 1.33 GHz but mainly running at the Turbo Boost Speed of 1.86 GHz. All benchmarks have an option save results via Email, and this includes details of system used. Following are example details provided for this Asus MeMo Pad 7.
Similar details of other Android deices are in
Android Benchmarks.htm. Those provided later are a brief summary.
Intel CPU Code Device Asus K013 Screen pixels w x h 800 x 1216 Android Build Version 4.4.2 d : 0, siblings : 4, core id : 3, cpu cores : 4, apicid : 6, initial apicid : 6 fdiv_bug : no, f00f_bug : no, coma_bug : no, fpu : yes, fpu_exception : yes cpuid level : 11, wp : yes flags : fpu vme + numerous others including up to SSE4 bogomips : 2666.77 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 55 model name : Intel(R) Atom(TM) CPU Z3745 @ 1.33GHz stepping : 8 microcode : 0x81b cpu MHz : 1862.000 cache size : 1024 KB physical i Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP PREEMPT Tue Sep 16 10:49:37 CST 2014 With ARM CPU Code Screen pixels w x h 800 x 1216 Android Build Version 4.4.2 Processor : ARMv7 processor rev 1 (v7l) BogoMIPS : 1500.0 Features : neon vfp swp half thumb fastmult edsp vfpv3 CPU implementer : 0x69 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x001 CPU revision : 1 Hardware : placeholder Revision : 0001 Serial : 0000000000000001 Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP PREEMPT Tue Sep 16 10:49:37 CST 2014
This provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point) and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming languages, see Whetstone Benchmark Results on PCs.
Native Intel code produced average performance gains of 1.93 times using Atom A1. The original version was slow running on the Phenom based BlueStacks Android emulator, not the case with the later BlueStacks version, running on the 3.7 GHz Core i7, with both being much faster on the newer benchmark, apparently running native Intel instructions, rather than conversion to ARM. With the later ARM code, MWIPS was much lower on the Cortex CPUs, entirely due to the slow EXP functions test.
July 2015 - ARM/Intel version speeds are similar to the original on ARM CPUs reported here, except the COS tests on T7 and T11 which produces significant impact on the overall MWIPS rating.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Results at 32 and 64 bits were not that different.
System ARM MHz Android MWIPS ------MFLOPS------- ------------MOPS-------------- See CPU Build 1 2 3 COS EXP FIXPT IF EQUAL Original ARM Version A1 Z3745 1866 4.4.2 1075.4 373.8 311.5 284.5 21.9 14.2 1421.1 1839.2 797.0 T7 v7-A9 1200 4.1.2 1115.0 271.3 250.7 256.4 25.8 14.6 1190.0 1797.0 1198.7 T22 v8-A53 1300 5.0.2 1433.7 348.0 319.3 308.2 36.3 19.8 1551.4 1861.9 611.0 T11 v7-A15 1700 4.2.2 1477.7 363.9 220.6 307.5 39.7 18.0 1690.5 2527.9 1127.9 T21 QU-800 2150 4.4.3 2035.1 665.7 640.0 531.6 45.2 23.1 3535.2 3180.4 2120.0 BS1 Emul Phen 3000 2.3.4 103.6 36.9 32.6 37.7 1.8 1.4 130.2 414.0 374.1 BS2 Emul i7 3700 4.4.2 844.5 428.6 351.8 343.6 14.6 10.9 1909.1 533.5 478.8 ARM/Intel 32 Bit Version A1 Z3745 1866 4.4.2 1888.4 665.8 504.4 492.0 35.7 27.5 3191.4 3585.8 2146.7 T7 v7-A9 1200 4.1.2 731.1 273.6 253.0 252.8 28.0 5.0 1185.2 2383.4 1192.1 T11 v7-A15 1700 4.2.2 907.4 363.3 327.1 303.1 33.6 6.3 1506.9 2476.5 1122.6 T21 QU-800 2150 4.4.3 1973.8 679.6 648.4 525.6 44.7 21.9 3516.7 3147.2 1567.7 T22 v8-A53 1300 5.0.2 834.7 348.9 312.7 310.9 36.7 5.4 1556.7 1867.2 570.5 BS1 Emul Phen 3000 2.3.4 2992.3 897.2 707.4 623.6 76.3 37.8 3705.9 4423.1 2281.5 BS2 Emul i7 3700 4.4.2 5086.9 1066.7 1120.0 963.2 166.4 56.4 6300.0 11436.5 3786.9 ARM/Intel 64 Bit Version T22 v8-A53 1300 5.0.2 1494.2 347.1 307.0 305.9 37.5 20.6 1552.2 1863.7 1239.1
The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in Dhrystone Results.htm. The ratio MIPS/MHz is often quoted, but this depends on compiler optimisation (or over-optimisation)
The new version, with native Intel code, produces a 33% gain in performance, with BlueStacks Emulator 9.2 times faster. Arm Cortex speeds are somewhat slower.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation produced a significant improvement.
System ARM MHz Android Vax MIPS See MIPS /MHz Original ARM Version A1 Z3745 1866 4.4.2 1840 0.99 T7 v7-A9 1200 4.1.2 1610 1.34 T22 v8-A53 1300 5.0.2 1683 1.29 T11 v7-A15 1700 4.2.2 3189 1.88 T21 QU-800 2150 4.4.3 3854 1.79 BS1 Emul Phen 3000 2.3.4 484 0.16 BS2 Emul i7 3700 4.4.2 746 0.20 ARM/Intel 32 Bit Version A1 Z3745 1866 4.4.2 2451 1.31 T7 v7-A9 1200 4.1.2 1317 1.10 T22 v8-A53 1300 5.0.2 1423 1.09 T11 v7-A15 1700 4.2.2 2551 1.50 T21 QU-800 2150 4.4.3 3319 1.54 BS1 Emul Phen 3000 2.3.4 4464 1.49 BS2 Emul i7 3700 4.4.2 8841 2.39 ARM/Intel 64 Bit Version T22 v8-A53 1300 5.0.2 2569 1.98
The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. A version was produced using NEON functions, that only provides single precision operation. So, for comparison purposes, an available C code option, to define single precision data, was used to produce a new version and this has usually lead to a higher MFLOPS speed. Results from various hardware and software platforms can be found in Linpack Results.htm.
Performance of the Linpack benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i]. Later ARM processors include vfpv4 instructions that execute fused multiply-accumulate instructions, possibly doubling performance. Compilation of these seems to have appeared in compiler gcc 4.8. Tablet T11 has vfpv4 but T7 does not - See System Details. The result is that the T11 DP benchmark runs much faster on the recompiled code (same with T21). The Intel Native code compilation, running on A1, was more than twice as fast as the original, produced by gcc 4.4. Some of the gain is due to using the new compiler, with conversion to ARM instructions, and others due to native Intel code.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation increased speed by almost 2 times with double precision calculations and 2.7 times at single precision.
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with SP speed of 1277 MFLOPS at 64 bits.
BlueStacks is particularly fast running with the native Intel version.
System ARM MHz Android LinpackDP LinpackSP See MFLOPS MFLOPS Original ARM Version A1 Z3745 1866 4.4.2 168.16 296.63 T7 v7-A9 1200 4.1.2 151.05 201.30 T22 v8-A53 1300 5.0.2 156.70 184.09 T11 v7-A15 1700 4.2.2 459.17 803.04 T21 QU-800 2150 4.4.3 389.52 751.95 BS1 Emul Ph 3000 2.3.4 16.61 26.53 BS2 Emul i7 3700 4.4.2 138.85 227.42 GCC 4.8 ARM Version A1 Z3745 1866 4.4.2 282.29 ARM/Intel 32 Bit Version A1 Z3745 1866 4.4.2 362.63 408.87 T7 v7-A9 1200 4.1.2 159.34 199.84 T22 v8-A53 1300 5.0.2 172.28 180.64 T11 v7-A15 1700 4.2.2 826.36 952.88 T21 QU-800 2150 4.4.3 629.92 790.83 BS1 Emul Ph 3000 2.3.4 1808.57 1474.70 BS2 Emul i7 3700 4.4.2 3390.95 1886.36 ARM/Intel 64 Bit Version T22 v8-A53 1300 5.0.2 340.18 482.43 P33 QU-810 2000 5.0.2 1277.76
The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS. A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. As for other of these benchmarks, details and results are provided, in this case, in Livermore Loops Results.htm.
This time, the new compiler produces some slower results on Tablet T11, with the Atom, running native code, being faster on average, and 2.56 times faster than via that ARM conversion Houdini layer. T21 MFLOPS can also be different.
August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Here, 64 bit/32 bit geometric mean performance ratio is 1.5.
System ARM MHz Android See Max Average Geomean Harmean Min Original ARM Version A1 Z3745 1866 4.4.2 535.8 201.9 172.4 146.7 48.8 T7 v7-A9 1200 4.1.2 391.9 202.1 181.3 160.9 68.1 T11 v7-A15 1700 4.2.2 1252.8 476.0 375.8 288.8 90.8 T21 QU-800 2150 4.4.3 1075.5 437.1 356.7 284.4 100.3 BS2 Emul i7 3700 4.4.2 321.7 134.4 118.1 101.8 29.3 ARM/Intel 32 Bit Version A1 Z3745 1866 4.4.2 1031.2 480.0 429.8 378.6 154.7 T22 v8-A53 1300 5.0.2 393.4 188.3 158.3 124.6 27.1 T7 v7-A9 1200 4.1.2 396.6 207.6 175.6 136.1 26.8 T11 v7-A15 1700 4.2.2 1411.4 471.2 342.1 219.5 34.3 T21 QU-800 2150 4.4.3 1159.4 446.9 356.0 280.3 112.3 BS2 Emul i7 3700 4.4.2 5422.6 2232.1 1784.4 1372.7 350.5 ARM/Intel 64 Bit Version T22 v8-A53 1300 5.0.2 772.2 265.9 232.5 206.3 97.8
This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second (MIPS) can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. For more details and further results see MemSpeed in Android Benchmarks.htm.
The native ARM/Intel results, on Intel Atom based A1, averaged 44% faster via L1 cache data, 27% using L2 and 14% from RAM. Result on tablets T7. T11 and T21 showed some gains and some losses. The Intel native code is particularly demonstrated by results using the BlueStacks App Player, running on an Intel Core i7 based PC.
August 2015 - Results provided for 64 bit T22. The 64 bit compilation was nearly twice as fast as the 32 bit version with double precision floating point calculations, using cached data, and provided a 33% increase from RAM. Corresponding single precision ratios were 2.6 and 2.0 times and integer ratios of 2.2 and 1.5.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MemSpeed Benchmark 1.1 01-Feb-2015 10.06 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 2773 1745 2821 5993 3274 3094 L1 32 3088 1690 2451 4849 2769 2896 64 3066 1694 2245 3883 2434 2568 L2 128 3084 1695 2261 3886 2466 2524 256 3158 1732 2285 3964 2264 2176 512 2666 1721 2295 3959 2505 2561 1024 2938 1659 2163 3567 2356 2443 4096 2775 1653 2123 3055 2307 2395 RAM 16384 2827 1659 2121 3208 2321 2411 65536 2840 1661 2112 3248 2314 2406 Total Elapsed Time 10.8 seconds #################### A1 ARM-Intel ###################### ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 11.46 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 3287 1859 4560 9789 4688 7316 32 3233 1856 3807 6633 3990 4030 64 3304 1860 2965 4457 2996 3894 128 3303 1855 3006 4463 3113 3992 256 3306 1860 2978 4463 3093 3946 512 3307 1862 2964 4377 3097 3958 1024 3031 1778 2766 3993 2867 3472 4096 2863 1776 2692 3129 2763 3046 16384 2857 1776 2702 3063 2768 3050 65536 2865 1765 2702 3176 2782 3087 Total Elapsed Time 10.1 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2000 MHz Cortex-A15, Android 4.2.2 Measured 1700 MHz Android MemSpeed Benchmark 1.1 09-Aug-2013 17.04 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 7296 4159 3513 9375 5453 6211 L1 32 7253 4540 3882 7364 4873 4839 64 6902 4265 3878 7026 4373 4274 L2 128 6735 4032 2480 4005 2797 3288 256 5859 3775 2192 4527 3263 3676 512 5795 3781 3568 6282 3819 3818 1024 2609 1757 1754 2607 1805 1825 4096 1614 1422 1471 1654 1342 1441 RAM 16384 1624 1412 1474 1642 1336 1443 65536 1617 1408 1479 1368 1321 1423 Total Elapsed Time 10.7 seconds #################### T11 ARM-Intel #################### ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 12.26 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 6540 4359 4580 10119 6292 6502 32 8185 5132 4682 8729 4622 4465 64 5770 3530 3473 5780 3447 3782 128 5311 3386 3475 5225 3441 3451 256 5667 3642 3678 5805 3643 3726 512 5047 3318 3334 4869 3303 3337 1024 2015 1469 1423 2050 1452 1386 4096 1535 1322 1342 1598 1381 1385 16384 1505 1379 1406 1584 1387 1384 65536 1509 1306 1332 1585 1387 1382 Total Elapsed Time 10.8 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android MemSpeed Benchmark 1.1 02-Jun-2015 11.01 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 8922 4635 3566 12412 5648 3774 L1 32 5116 3542 2773 7594 4827 3657 L2 64 5174 3393 2684 5652 3757 3130 128 5286 3387 2648 5443 3758 3194 256 4937 3446 2889 7469 4624 3449 512 4941 3459 2915 7452 4566 3724 1024 4837 3449 2848 7065 4455 3722 4096 2840 2606 2343 2581 2458 2567 RAM 16384 2606 2423 2232 2395 2238 2338 65536 2653 2453 2257 2457 2312 2420 Total Elapsed Time 9.7 seconds Maximum SP MFLOPS 1159 Integer MIPS 2802 #################### T21 ARM-Intel #################### ARM/Intel MemSpeed Benchmark 1.1 02-Jun-2015 11.27 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 8074 4831 2603 11252 5065 3892 L1 32 5302 4138 3709 7252 4985 3693 L2 64 4801 3510 2832 5739 3684 3015 128 4502 3783 3577 5991 3914 3547 256 4907 3913 3934 6876 4280 4056 512 4686 3883 3921 6236 4215 4060 1024 4716 3808 3823 6131 4185 3942 4096 2691 2603 2679 2249 2634 2709 RAM 16384 2227 2223 2420 1798 2191 2445 65536 2099 2106 2306 1738 2040 2346 Total Elapsed Time 9.9 seconds Maximum SP MFLOPS 1207 Integer MIPS 2898 ###################### T22 32 Bit ###################### ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.16 Compiled for 32 bit ARM v7a Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 1940 971 1693 2470 1278 2084 L1 32 1879 955 1676 2378 1255 1967 64 1801 938 1615 2254 1218 1912 L2 128 1706 941 1620 2279 1224 1872 256 1818 935 1570 2291 1155 1875 512 1633 884 1451 2008 1132 1704 1024 1276 781 1181 1454 938 1324 RAM 4096 1335 808 1260 1533 1010 1386 16384 1342 813 1270 1487 1013 1419 65536 1346 809 1274 1546 1031 1252 Total Elapsed Time 11.7 seconds ###################### T22 64 Bit ###################### ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.29 Compiled for 64 bit ARM v8a Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 4092 2198 3951 5293 3611 4408 32 3753 2496 3630 4651 3300 3992 64 3407 2388 3368 3715 3023 3677 128 3496 2462 3521 4137 3139 3844 256 3535 2481 3573 4199 3322 3911 512 3054 2248 3126 3556 2548 3372 1024 1714 1704 2029 2069 1854 2099 4096 1832 1595 1841 1914 1780 1897 16384 1844 1601 1850 1925 1798 1891 65536 1859 1608 1837 1921 1795 1812 Total Elapsed Time 10.2 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 1 GB DDR3 RAM Measured 1200 MHz Android MemSpeed Benchmark 17-Oct-2012 20.19 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 1735 888 2456 2726 1364 2818 L1 32 1448 760 1474 1700 1039 1648 64 1318 719 1290 1468 952 1385 L2 128 1279 715 1289 1443 944 1336 256 1268 714 1279 1435 943 1313 512 1158 691 1204 1321 892 1228 1024 729 553 735 772 632 742 4096 445 392 425 442 421 439 RAM 16384 435 390 428 435 412 431 65536 445 404 393 450 432 449 Total Elapsed Time 12.2 seconds #################### T7 ARM-Intel ##################### ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.24 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 1856 1019 2537 2913 1459 2544 32 1416 832 1327 1508 920 1345 64 1286 779 1198 1418 908 1296 128 1282 781 1195 1424 912 1305 256 1278 774 1190 1433 878 1298 512 1197 752 1122 1340 862 1216 1024 833 626 822 903 695 857 4096 463 420 456 463 440 459 16384 459 426 453 455 435 458 65536 463 430 411 462 443 452 Total Elapsed Time 11.5 seconds #################### BS2 Original ###################### BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 Android MemSpeed Benchmark 1.1 25-Apr-2015 12.58 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 1523 1777 731 1406 1939 1163 32 1306 1641 787 1641 1939 1023 64 1524 1230 511 1422 1662 1143 128 1524 1707 787 1641 1641 948 256 1456 1670 853 1525 1708 1094 512 1527 1642 853 1642 1779 948 1024 1528 1646 853 1646 1713 1094 4096 1535 1809 853 1809 1945 1194 16384 1638 1638 819 1774 1872 1170 65536 1404 1747 819 1747 1820 1156 Total Elapsed Time 12.5 seconds #################### BS2 ARM-Intel ##################### ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.47 Reading Speed in MBytes/Second Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m] KBytes Dble Sngl Int Dble Sngl Int 16 35555 9309 14065 30476 19393 19394 32 30476 19394 14222 35555 18518 17066 64 26666 16623 17778 30476 18286 16410 128 26667 17778 17778 29092 18286 19051 256 25098 16675 16327 27354 19395 18825 512 25100 13063 12190 26666 19395 17793 1024 24631 17589 16415 24623 16415 16415 4096 24638 17783 16644 24638 17093 17783 16384 14745 12639 11000 14000 13611 12834 65536 14043 11359 12336 15490 10649 10649 Total Elapsed Time 12.6 seconds
This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be estimated by multiplying this minimum value by 16. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data. For more details and further results see BusSpeed in Android Benchmarks.htm.
The native code ARM/Intel version provided no real performance improvement on tablet A1, with the Atom Z3745 CPU. In ARM mode, there was also little difference on Tablets T21, T11 and T7. The main reason for these similarities is the long sequence of identical C arithmetic statements is easy to convert for efficient processing. BlueStacks speed on the Intel CPU were again outstanding.
August 2015 - Results provided for 64 bit T22. Reading all data, 64/32 bit comparison ratios were up to 2.0 from L1 cache, 1.5 from L2 cache and 1.25 from RAM.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android BusSpeed Benchmark 1.1 v7 21-Dec-2014 16.06 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 4178 3473 6270 6713 6759 6869 L1 32 1420 1529 2252 2686 3702 5108 64 1385 1498 2276 2629 3657 5108 L2 128 1394 1542 2278 2614 3640 5092 256 1410 1576 2258 2607 3259 5110 512 1417 1574 2274 2602 3700 5119 1024 349 428 888 1431 2848 4306 RAM 4096 215 265 593 1181 2289 3891 16384 210 266 596 1181 2278 3897 65536 220 272 600 1193 2346 3886 Total Elapsed Time 5.1 seconds #################### A1 ARM-Intel ###################### Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 4845 5705 6403 6926 7094 7167 L1 32 1407 1716 2255 2646 3713 5094 64 1395 1703 2257 2689 3754 4843 L2 128 1283 1571 2108 2620 3671 5135 256 1416 1753 2288 2679 3687 5178 512 1439 1372 2251 2510 3679 5183 1024 350 409 942 1696 2792 4403 4096 213 253 564 1188 2173 3631 RAM 16384 219 259 600 1189 2330 3920 65536 218 259 599 1102 2323 3716 Total Elapsed Time 5.1 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/sec Android BusSpeed Benchmark 1.1 v7 09-Aug-2013 17.07 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3193 3451 4412 5272 5389 6191 L1 32 1298 1558 1990 3478 4264 4420 64 804 928 1209 2442 3263 3426 L2 128 784 904 1175 2321 3148 3333 256 780 908 1181 2336 3142 3327 512 788 907 1165 2312 3120 3300 1024 360 387 384 803 1348 1744 4096 145 146 194 507 648 1378 RAM 16384 141 136 190 507 638 1373 65536 142 141 191 506 643 1371 Total Elapsed Time 5.3 seconds #################### T11 ARM-Intel #################### ARM/Intel BusSpeed Benchmark 1.1 v7 23-Apr-2015 12.15 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 2085 3208 4055 4553 5272 5758 32 1282 1811 2498 4182 4867 5163 64 600 864 1309 2974 3504 3841 128 614 892 1310 3027 3500 3826 256 614 892 1337 3050 3509 3828 512 618 888 1319 3042 3382 3811 1024 425 479 444 1244 1803 2291 4096 146 146 191 590 1050 1751 16384 141 139 186 585 1039 1725 65536 139 139 187 585 1039 1721 Total Elapsed Time 5.3 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 1382 1350 3122 4300 4938 5283 L1 32 1106 1118 2026 2637 3786 5210 L2 64 1064 1118 2058 2679 3820 5251 128 1123 1170 2081 2688 3669 4166 256 1121 1196 2109 2623 3873 3429 512 940 1127 2050 2684 3777 4795 1024 951 1124 2038 2655 3759 4950 4096 239 375 472 806 1486 2679 RAM 16384 239 370 464 806 1476 2656 65536 239 368 495 854 1537 2792 Total Elapsed Time 5.0 seconds #################### T21 ARM-Intel #################### ARM/Intel BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 1328 1442 2797 4291 4699 5685 L1 32 1165 1100 1933 2848 3603 5844 L2 64 1147 1055 2007 2846 3586 5890 128 1181 1136 2008 2711 3600 5878 256 1185 1126 2018 2716 3568 5873 512 1022 1026 1805 2525 3378 5611 1024 796 843 1584 2202 3088 5053 4096 199 294 431 657 1166 2409 RAM 16384 200 299 430 659 1167 2408 65536 205 301 436 668 1173 2380 Total Elapsed Time 5.2 seconds ###################### T22 32 Bit ###################### T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 10.57 Compiled for 32 bit ARM v7a Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 874 932 1814 2302 2355 2263 L1 32 758 803 1309 1820 2323 2386 64 653 671 1203 1741 2206 2332 L2 128 603 620 1107 1693 2222 2351 256 574 589 1075 1711 2211 2327 512 332 372 681 1075 1863 2120 1024 137 193 371 578 1322 2129 RAM 4096 172 179 351 567 1151 2126 16384 172 178 351 504 1117 2136 65536 172 177 349 478 882 2129 Total Elapsed Time 5.3 seconds ###################### T22 64 Bit ###################### T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 11.02 Compiled for 64 bit ARM v8a Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 3188 3635 3937 4327 4372 4462 32 1478 1607 2246 3382 3853 4144 64 600 622 1163 2011 2972 3585 128 558 575 1056 1889 2892 3525 256 538 550 1028 1826 2837 3260 512 371 425 813 1490 2403 3202 1024 136 196 382 728 1423 2750 4096 170 177 346 669 1340 2652 16384 169 174 341 678 1352 2663 65536 168 174 341 676 1347 2611 Total Elapsed Time 5.2 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, 1 GB DDR3 RAM Android BusSpeed Benchmark 19-Oct-2012 17.29 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 2723 2420 3044 3364 3499 3500 L1 32 1054 1087 1061 1382 1565 2145 64 436 433 419 652 751 1160 L2 128 345 337 337 542 633 943 256 329 309 322 522 614 961 512 339 299 311 506 574 937 1024 170 168 180 269 349 629 4096 59 55 84 127 176 338 RAM 16384 56 56 83 125 173 335 65536 56 56 82 125 174 334 Total Elapsed Time 5.7 seconds #################### T7 ARM-Intel ##################### ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.30 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 2940 3344 3625 3866 3862 3893 32 698 707 682 1071 1208 1826 64 448 477 465 726 851 1357 128 367 355 292 542 657 1070 256 334 344 341 546 651 1059 512 326 336 336 531 629 1025 1024 169 175 197 309 411 749 4096 58 58 83 131 191 395 16384 56 57 83 129 189 392 65536 56 48 82 129 187 388 Total Elapsed Time 5.6 seconds #################### BS2 Original ###################### BS 2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 Android BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.57 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 1428 1280 1280 1422 1333 1489 32 1428 1280 1280 1365 1706 1602 64 1066 1481 1600 1463 1463 1707 128 1666 1365 1489 1463 1463 1833 256 1429 1706 1293 1425 1466 1823 512 1333 1463 1603 1425 1468 1565 1024 1280 1463 1710 1468 1565 1730 4096 1282 1367 1475 1730 1310 1617 16384 412 943 958 1258 1398 1677 65536 449 958 1078 1304 1677 1677 Total Elapsed Time 6.8 seconds #################### BS2 ARM-Intel ##################### ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.49 Reading Speed 4 Byte Words in MBytes/Second Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read KBytes Words Words Words Words Words All 16 13333 12800 22222 13675 18285 14224 32 10666 10666 12190 21333 21367 21334 64 6666 6666 10666 13333 21333 21337 128 6826 6400 10240 17067 21335 18290 256 4266 5120 8533 13654 18290 20483 512 2667 2667 5335 9103 16386 20515 1024 2560 2560 5692 9105 15608 22806 4096 2673 2752 5470 9175 17126 21880 16384 741 943 2070 4404 8808 14680 65536 542 838 1572 3595 6710 11930 Total Elapsed Time 6.5 seconds
RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see RandMem in Android Benchmarks.htm.
On A1 Atom based tablet, the native code ARM/Intel version results showed gains of around 25% on all reading tests, but no difference with writing and reading. The same benchmark, running on Tablets T11 and T21, showed some improvement, using cache based data, but a variability in comparative performance on T7.
August 2015 - Results provided for 64 bit T22 showing 32 bit and 64 bit versions were not that different overall, each one slightly faster on some tests.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android RandMem Benchmark 1.1 01-Feb-2015 10.12 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 3434 5064 3462 5113 L1 32 2833 4042 2652 3645 64 2837 4058 2068 2561 L2 128 2822 4041 1809 2205 256 2828 4040 1435 1755 512 2816 3997 1245 1456 1024 2578 3256 379 445 4096 2412 1946 209 268 RAM 16384 2485 2039 179 217 65536 2457 2041 140 170 Total Elapsed Time 11.8 seconds #################### A1 ARM-Intel ###################### ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 17.27 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 4291 5626 4584 5630 32 3217 3792 3492 3783 64 3677 4253 2629 2644 128 3666 4241 2299 2289 256 3688 3930 1829 1850 512 3682 4189 1522 1592 1024 3285 3558 562 667 4096 2999 2007 272 274 16384 3019 2065 210 220 65536 2989 2068 141 186 Total Elapsed Time 8.8 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android RandMem Benchmark 1.1 13-Aug-2013 17.29 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 2881 2478 3388 3650 L1 32 4301 2968 3197 3249 64 3669 2511 2201 2249 L2 128 3566 2560 1571 1566 256 3557 2461 1334 1256 512 3524 2547 1136 1098 1024 1933 1144 534 513 4096 1993 1064 184 173 RAM 16384 1970 1086 141 144 65536 1973 1117 106 104 Total Elapsed Time 9.1 seconds #################### T11 ARM-Intel #################### ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 20.42 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 3642 3102 5464 4114 32 5462 3409 4096 3737 64 4800 2785 2028 2064 128 4308 2575 1572 1589 256 4381 2574 1332 1260 512 4311 2544 1215 1097 1024 2033 1156 513 471 4096 1891 1042 213 178 16384 2028 1032 154 139 65536 2033 1055 109 106 Total Elapsed Time 9.2 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android RandMem Benchmark 1.1 10-Jun-2015 12.43 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 4407 4704 3995 4900 32 2611 3071 2207 2703 64 2496 2797 1821 2139 128 2080 3173 1668 1758 256 2425 3183 1439 1520 512 2359 3116 1193 1355 1024 2366 3117 368 382 4096 2293 2280 201 209 16384 2293 2237 170 175 65536 2299 2261 146 150 Total Elapsed Time 8.5 seconds #################### T21 ARM-Intel #################### ARM/Intel RandMem Benchmark 1.1 10-Jun-2015 12.45 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 5005 4626 4067 4863 32 3253 2994 2246 2622 64 3223 2855 1986 2072 128 2861 3128 1912 1776 256 3246 3174 1666 1523 512 3195 3111 1469 1372 1024 3190 3079 369 383 4096 3027 2381 212 213 16384 3065 2300 174 177 65536 3080 2281 150 150 Total Elapsed Time 8.6 seconds ###################### T22 32 Bit ###################### T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.29 Compiled for 32 bit ARM v7a MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 2807 3606 2753 3595 L1 32 2719 3433 1429 1930 64 2615 3266 914 1166 L2 128 2592 3243 705 828 256 2570 3223 637 720 512 2367 2684 237 347 1024 2137 1855 120 163 RAM 4096 1918 1658 83 97 16384 2152 1665 74 85 65536 2104 1652 72 64 Total Elapsed Time 11.6 seconds ###################### T22 64 Bit ###################### T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.32 Compiled for 64 bit ARM v8a MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 3865 3033 3798 3027 32 3622 2760 3105 2734 64 3094 2803 1011 1077 128 3074 2740 776 801 256 3050 2771 718 693 512 2420 2463 270 371 1024 1322 1853 131 164 4096 1754 1598 87 100 16384 1791 1586 75 91 65536 1856 1609 57 68 Total Elapsed Time 14.6 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android RandMem Benchmark 20-Oct-2012 11.14 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 2788 3041 2795 3041 L1 32 2769 3011 2767 3020 64 1027 1038 839 911 L2 128 916 918 616 649 256 904 905 514 538 512 899 907 475 499 1024 712 699 345 354 4096 323 284 92 88 RAM 16384 316 282 73 70 65536 314 281 65 62 Total Elapsed Time 10.9 seconds #################### T7 ARM-Intel ##################### ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.33 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 2521 3175 2490 3038 32 1427 1451 1218 1446 64 1133 1052 853 907 128 1039 871 646 650 256 1028 909 543 518 512 1025 895 499 502 1024 700 489 242 236 4096 487 282 90 88 16384 483 281 71 70 65536 478 274 63 62 Total Elapsed Time 11.3 seconds #################### BS2 Original ###################### BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 Android RandMem Benchmark 1.1 25-Apr-2015 12.59 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 4069 5008 4069 2174 32 4439 5426 4069 1953 64 3974 5682 3552 1860 128 3721 5209 3758 1717 256 4342 5210 3157 1204 512 4167 5342 2845 1141 1024 4350 5208 2606 1000 4096 3475 5709 1938 867 16384 4343 5120 747 400 65536 3657 5818 533 256 Total Elapsed Time 14.2 seconds #################### BS2 ARM-Intel ##################### ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.50 BlueStacks on 3.9 GHz Core i7 MBytes/Second Transferring 4 Byte Words Memory Serial....... Random....... KBytes Read Rd/Wrt Read Rd/Wrt 16 23252 24414 19148 29593 32 25432 27127 25432 24038 64 21552 23674 14533 9301 128 21702 20834 12020 8140 256 22727 19934 9470 6513 512 22321 17362 5953 5686 1024 20840 18945 5691 4815 4096 21053 16693 2291 2291 16384 12308 10057 1067 1018 65536 10667 10338 753 711 Total Elapsed Time 8.3 seconds
The benchmarks are recompilations of those in www.roylongbottom.org.uk/Android MultiThreading Benchmarks.htm. The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run. The numeric results start with values of 1.0, with subsequent calculations reducing the values, the amount depending on the number of calculations.
An example of results for MP-MFLOPSi, from the log file, is provided below. showing identical numeric results, independent of the number of threads used (as it should be). This original version became too fast for later technology, producing inconsistent MFLOPS performance ratios. Versions with longer running versions were produced, to avoid this problem, in this case MP-MFLOPS2i with 50 times more calculations, producing the expected reduction in result values. The numeric results from ARM processors are slightly different, due to rounding effects (see Short and Long below).
Examination of disassembled code, using default compile parameters, showed that Intel SIMD and ARM NEON instructions were not being produced. These could execute such as four linked multiply and add instructions simultaneously, providing MFLOPS speeds of up to eight times CPU MHz, per core. The type of instructions used are shown below, where Intel varieties used only one word out of four in SSE registers (Single Instruction Single Data - SISD), and ARM code employed single word scalar registers. The latter were vector type, using three registers, including such as floating-point multiply-accumulate single precision (fmacs).
The released versions were recompiled, using the compile options shown below, but made no difference to the type of code used. Intel compilations used more registers that produced faster speeds at 32 operations per word. ARM code was virtually identical, producing similar performance.
Intel CPU Short - 5000 Repeat Passes ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 642 717 658 1053 1026 987 2T 1052 1366 1016 2018 2108 2063 4T 1752 2483 956 3817 3676 3894 8T 1436 2217 992 3213 3428 3289 Results x 100000, 0 indicates ERRORS 1T 86735 98519 99984 79894 97641 99975 2T 86735 98519 99984 79894 97641 99975 4T 86735 98519 99984 79894 97641 99975 8T 86735 98519 99984 79894 97641 99975 Total Elapsed Time 3.6 seconds Intel CPU Long - 100000 Repeat Passes 1T-8T 40392 76406 99700 35296 66012 99521 ###################################################### ARM CPU Short 1T-8T 86735 98519 99984 79897 97638 99975 ARM CPU Long 1T-8T 40392 76406 99700 35218 66014 99520 ###################################################### Android.mk LOCAL_CFLAGS ifeq ($(TARGET_ARCH_ABI),x86) LOCAL_CFLAGS += -ffast-math -mtune=atom -mssse3 -mfpmath=sse endif ifeq ($(TARGET_ARCH_ABI),x86_64) LOCAL_CFLAGS += -ffast-math -mtune=slm -msse4.2 endif ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) LOCAL_ARM_NEON := true LOCAL_CFLAGS += -mfpu=neon endif ifeq ($(TARGET_ARCH_ABI),arm64-v8a) LOCAL_CFLAGS += -DHAVE_NEON64=1 endif ###################################################### Intel SSE SISD Instructions - not SIMD mulss 36(%esp), %xmm2 addss %xmm1, %xmm2 ARM Vector Instructions - not NEON fmuls s15, s15, s10 fmacs s15, s14, s23
Below are MFLOPS results, mainly for the longer running versions, including those from the original ARM compilations. The first ones are for tablet A1, with the quad core Intel Atom CPU, where results for the the shorter running version are also provided, showing some slower speeds. In this case, performance from the native Intel code was up to nearly twice as fast as the ARM converted test run. In both cases, with 2 operations per word, maximum MP gains were on using L2 cache based data, with RAM speed limitations, but requiring two threads for maximum speed. With 32 operations per word, the quad cores provided performance gains of nearly four times.
Tablet T11 had some slightly slower results on the ARM/Intel variety, with tablet T7 providing little variation. Except for RAM based data, and 2 operations per word, appropriate performance gains were produced in line with the number of cores.
T21, with the Qualcomm Snapdragon 800, produced similar speeds using the old and ARM/Intel versions. Calculation speeds, with 1 and 2 threads, could be slower than T11, Cortex-A15, but RAM speed was much faster. The opposite applied, compared with A1 Atom, using native code.
August 2015 - Results provided for 64 bit T22 showing that, at 32 operations per word, it was just over twice as fast at 64 bits, then up to 3.7 times, at 2 operations per word, with cache based data. The reason is that 64 bit vector SIMD instructions were produced, instead of scalars.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-MFLOPS2 Benchmark V2.1 04-Feb-2015 11.03 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 502 501 476 575 575 573 2T 1012 975 921 1133 1140 1115 4T 1571 1627 979 2238 2255 2258 8T 1550 1890 1007 2235 2239 2217 Total Elapsed Time 117.4 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 642 717 658 1053 1026 987 2T 1052 1366 1016 2018 2108 2063 4T 1752 2483 956 3817 3676 3894 8T 1436 2217 992 3213 3428 3289 V7 Short Version Total Elapsed Time 3.6 seconds ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 695 696 661 1061 1061 1055 2T 1335 1382 1058 2088 2086 2102 4T 1832 2635 979 3993 4125 4145 8T 2026 2557 1007 3842 4044 4110 Total Elapsed Time 65.8 seconds -- Single Thread MFLOPS No Extra Compile Options -- 704 713 675 773 779 774 #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Dual Core CPU Measured GHz = 1.7 Android MP-MFLOPS2 Benchmark V2.1 29-Apr-2015 10.22 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 845 817 544 1546 1539 1512 2T 1593 1668 648 3140 3067 2977 4T 1974 1775 645 2963 3093 2845 8T 1935 2059 652 3108 3147 2985 Total Elapsed Time 58.5 seconds #################### T11 ARM-Intel #################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 20.30 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 695 756 536 1537 1501 1476 2T 1319 1527 645 3151 3077 3000 4T 1604 1567 657 3035 3095 2997 8T 1604 1639 658 3108 3125 2996 Total Elapsed Time 59.1 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Quad Cote 2150 MHz Measured Android MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 15.35 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 718 781 590 1214 1220 1228 2T 1572 1583 1118 2406 2436 2442 4T 2338 2959 1836 4867 4911 4859 8T 3148 3266 1866 4870 4916 4888 Total Elapsed Time 56.4 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 16.50 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 822 768 636 1232 1228 1231 2T 1662 1637 1184 2460 2463 2446 4T 2509 3216 1659 4519 4762 4900 8T 2965 3193 1881 4847 4925 4880 ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.17 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 190 190 184 670 672 664 2T 377 378 370 1343 1345 1329 4T 707 755 725 2657 2669 2621 8T 722 736 714 2640 2672 2631 Total Elapsed Time 113.0 seconds ###################### T22 64 Bit ###################### ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.24 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 705 701 636 1398 1394 1362 2T 1376 1395 942 2794 2797 2757 4T 2063 2602 962 5491 5546 5336 8T 2474 2611 957 5367 5500 5417 Total Elapsed Time 51.6 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Quad Core CPU Measured MGz = 1200 Android MP-MFLOPS2 Benchmark V2.1 05-Feb-2015 11.37 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 182 156 114 598 578 572 2T 365 321 194 1194 1163 1141 4T 716 655 233 2367 2316 2240 8T 717 682 233 2347 2371 2246 Total Elapsed Time 135.5 seconds #################### T7 ARM-Intel ##################### ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.44 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 188 156 116 598 578 574 2T 365 319 197 1195 1161 1145 4T 682 709 237 2372 2345 2249 8T 678 731 237 2361 2381 2254 Total Elapsed Time 135.0 seconds
For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program checks that calculations on every thread produce consistent numeric results.
The gcc 4.8 based ARM/Intel version, running on the Intel Atom tablet, is rated at twice the speed of the original, due to the use of native code. The fixed point results indicate overoptimisation, but the test uses little of the overall time, this being mainly dependent on the Cos, Exp and third MFLOPS tests.
The new native ARM version, running on tablets T11 and T7, produces a much slower overall MWIPS rating, mainly due to the Exp tests, but also influence by other slower results (some same as above). T21 indicates slower floating point calculations.
August 2015 - Results provided for 64 bit T22 showing that, at 64 bits, the Fixpt test was clearly nearly optimised out, but this makes little difference to the overall MWIPS rating, at 2.25 times faster than the 32 bit benchmark.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-Whetstone Benchmark V1.1 04-Feb-2015 11.39 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 953.7 363.0 382.4 267.8 21.0 13.2 413.1 1842.4 392.3 2T 1921.2 726.0 663.5 541.4 42.6 27.0 816.1 3662.6 793.3 4T 3820.6 1419.2 1514.6 1081.5 84.1 54.0 1543.8 6292.4 1588.5 8T 4003.8 1912.9 1872.4 1114.1 86.5 56.4 2053.1 8292.6 1599.7 Overall Seconds 4.88 1T, 4.87 2T, 4.96 4T, 10.05 8T #################### A1 ARM-Intel ###################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 17.35 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1916.9 691.4 691.3 497.2 35.3 27.6 10209.8 2787.3 1351.8 2T 3800.3 1377.6 1381.2 980.0 70.1 54.7 20248.0 5252.8 2748.7 4T 7604.9 2713.2 2711.8 1977.1 140.2 110.0 33906.3 9526.5 5550.8 8T 7798.1 3141.5 3627.2 2064.2 141.2 110.2 59590.6 12743.7 5711.5 Overall Seconds 4.94 1T, 5.00 2T, 5.06 4T, 10.11 8T #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android MP-Whetstone Benchmark V1.1 06-Sep-2013 12.49 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1308.2 345.9 379.0 294.1 30.8 17.2 1351.4 1265.7 843.1 2T 2886.6 782.1 782.6 614.0 80.1 34.3 2775.2 2463.7 1667.5 4T 3086.0 998.6 788.1 610.6 79.2 44.5 3472.0 2526.4 2191.4 8T 2930.0 788.2 843.5 616.5 80.5 35.0 2846.0 2799.1 1686.2 Overall Seconds 3.54 1T, 3.30 2T, 6.62 4T, 13.16 8T #################### T11 ARM-Intel #################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.23 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 837.2 340.1 341.7 191.2 39.1 6.2 1521.1 2532.8 629.3 2T 1676.2 596.2 683.2 387.3 77.8 12.4 3056.9 5055.1 1263.6 4T 1697.7 687.5 869.4 394.5 78.1 12.4 2980.7 6518.4 1258.8 8T 1685.2 685.9 691.0 389.7 78.3 12.4 3086.3 5113.7 1262.0 Overall Seconds 4.06 1T, 4.07 2T, 8.12 4T, 16.19 8T #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android MP-Whetstone Benchmark V1.1 06-Jul-2015 10.42 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1877.1 645.2 642.6 524.1 44.0 22.3 1364.7 1572.1 898.9 2T 3668.6 1220.2 1262.4 1021.9 85.9 43.8 2663.5 3078.4 1753.4 4T 7426.9 2375.5 2474.7 2097.7 175.7 88.2 5052.6 6240.4 3555.0 8T 7706.6 2692.2 2746.2 2186.9 180.1 90.3 5822.5 6902.7 3681.3 Overall Seconds 4.44 1T, 4.62 2T, 4.64 4T, 9.00 8T Total Elapsed Time 24.1 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-Whetstone Benchmark V1.1 22-Jul-2015 12.02 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1598.0 512.1 508.7 311.7 43.6 22.1 1142.9 2123.3 598.4 2T 3161.2 960.0 996.7 614.2 86.7 43.8 2258.9 3820.9 1194.7 4T 6348.0 1593.5 2019.5 1231.5 174.2 88.5 4471.1 8139.4 2398.3 8T 6419.6 2058.2 2077.5 1252.6 175.0 88.7 4520.9 8875.0 2409.0 Overall Seconds 4.88 1T, 5.00 2T, 5.05 4T, 9.92 8T Total Elapsed Time 29.2 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.30 Compiled for 32 bit ARM v7a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 676.4 275.9 281.9 147.9 35.4 5.3 600.3 901.0 285.5 2T 1362.5 533.8 561.7 298.0 70.9 10.8 1203.1 1838.9 574.0 4T 2698.6 903.9 1071.7 594.4 141.2 21.5 2346.1 3305.5 1138.5 8T 2830.1 1463.2 1393.0 614.2 152.5 21.9 3243.9 4418.3 1171.4 Overall Seconds 4.95 1T, 4.94 2T, 5.11 4T, 10.09 8T ###################### T22 64 Bit ###################### ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.34 Compiled for 64 bit ARM v8a Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1524.8 328.6 348.8 297.6 37.3 19.9 1462579 1867.2 1238.0 2T 3062.5 688.8 697.9 596.0 75.5 39.8 2097113 3726.7 2481.3 4T 6085.4 1214.9 1360.5 1185.4 150.5 79.4 2449153 7055.0 4951.8 8T 6222.4 1495.2 1545.6 1204.2 152.2 80.6 3869846 9218.8 5154.1 Overall Seconds 4.92 1T, 4.90 2T, 5.05 4T, 9.97 8T ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android MP-Whetstone Benchmark V1.0 17-Oct-2012 13.49 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 1033.7 247.4 235.4 266.0 25.3 15.0 448.4 630.9 513.5 2T 2058.1 456.3 473.0 532.4 50.0 30.1 898.1 1198.4 1026.6 4T 4122.8 831.9 944.7 1064.6 100.7 60.1 1797.0 2392.2 2053.4 8T 4163.2 1016.0 948.2 1069.5 101.8 60.9 1808.0 2414.2 2051.5 Overall Seconds 5.28 1T, 5.34 2T, 5.42 4T, 10.81 8T #################### T7 ARM-Intel ##################### ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.32 Using 1, 2, 4 and 8 Threads MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal 1 2 3 MOPS MOPS MOPS MOPS MOPS 1T 602.2 242.3 242.3 140.2 27.2 4.9 482.8 1425.2 239.1 2T 1208.7 481.2 484.2 280.8 55.0 9.9 970.0 2869.6 478.7 4T 2398.7 805.4 966.7 562.5 109.5 19.5 1938.2 5722.5 957.1 8T 2429.1 974.6 1076.2 562.4 110.9 19.7 1981.5 5816.1 963.6 Overall Seconds 4.94 1T, 4.93 2T, 5.08 4T, 9.93 8T
For further details see Dhrystone Benchmark above and the following, including further results Android MultiThreading Benchmark Apps. This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput, using multiple threads.
The new ARM/Intel version demonstarted similar speeds on the systems tested. Unlike other systems, the Intel Atom based tablet produced slower performance using multiple threads. Tests on a PC, via BlueStacks emulator, appeared to demonstrate that native Intel instructions were being used.
T21, with the Qualcomm Snapdragon 800, sometimes crashed running this benchmark and apparently every time, trying the ARM-Intel version. When running, the eigth thread performance is also highly suspect.
August 2015 - Results provided for 64 bit T22 showing that the 64 bit version was much faster than via the 32 bit variety.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.00 Threads 1 2 4 8 Seconds 0.96 3.27 6.83 13.79 Dhrystones per Second 4147126 2449335 2343954 2320745 VAX MIPS rating 2360 1394 1334 1321 #################### A1 ARM-Intel ###################### ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.02 Threads 1 2 4 8 Seconds 0.96 3.44 6.88 13.80 Dhrystones per Second 4154551 2323340 2324139 2318280 VAX MIPS rating 2365 1322 1323 1319 #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android MP-Dhrystone 2 Benchmark V1.1 10-Aug-2013 09.55 Threads 1 2 4 8 Seconds 0.50 0.53 1.05 2.18 Dhrystones per Second 3990211 7522450 7600539 7328598 VAX MIPS rating 2271 4281 4326 4171 #################### T11 ARM-Intel #################### ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.22 Threads 1 2 4 8 Seconds 0.99 1.12 2.33 4.45 Dhrystones per Second 4031981 7127449 6856521 7196710 VAX MIPS rating 2295 4057 3902 4096 #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Android MP-Dhrystone 2 Benchmark V1.1 06-Jul-2015 11.22 Threads 1 2 4 8 Seconds 0.64 0.83 0.94 1.23 Dhrystones per Second 5007132 7722435 13592474 20769050 VAX MIPS rating 2850 4395 7736 11821 Total Elapsed Time 4.4 seconds #################### T21 ARM-Intel #################### Failed to run ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.32 Compiled for 32 bit ARM v7a Threads 1 2 4 8 Seconds 0.64 0.71 0.90 1.70 Dhrystones per Second 2481286 4495793 7094180 7540038 VAX MIPS rating 1412 2559 4038 4291 ###################### T22 64 Bit ###################### ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.36 Compiled for 64 bit ARM v8a Threads 1 2 4 8 Seconds 0.89 1.06 1.64 3.24 Dhrystones per Second 4476736 7574470 9768350 9861922 VAX MIPS rating 2548 4311 5560 5613 ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android MP-Dhrystone 2 Benchmark V1.0 17-Oct-2012 13.59 Threads 1 2 4 8 Seconds 0.72 0.83 1.19 2.55 Dhrystones per Second 2782404 4829150 6740332 6271011 VAX MIPS rating 1584 2749 3836 3569 #################### T7 ARM-Intel ##################### ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.18 Threads 1 2 4 8 Seconds 0.78 0.95 1.27 2.44 Dhrystones per Second 2572642 4214238 6280420 6565767 VAX MIPS rating 1464 2399 3575 3737 ################ BlueStacks Emulator ################## PC with 3 GHz Phenom x4, windows 7 VAX MIPS Original 474 465 453 449 VAX MIPS ARM/Intel 4844 4670 4623 4724
This is a multithreading version of BusSpeed Benchmark above. Here, single thread performance of A1 Atom tablet was similar to that obtained unthreaded, with the ARM/Intel version again providing no improvement. Except for calculating bus speeds, the last column is the only one of real interest, where four cores produced gains of up to 3.7 times, using caches, and 1.9 times via RAM. The latter provided even better relative performance compared to ARM based systems. ARM/Intel version results are not shown for tablets T11 and T7, as they were both essentially the same as those obtained using the original MP benchmark. For further details and more results see Android MultiThreading Benchmark Apps. Some ARM/Intel results for T21 are slower than the original, but this might be due to the short running time.
Results from the PC based BlueStacks emulator are also shown below, to confirm that native Intel instructions were being used in the revised benchmark.
Estimated maximum data transfer speeds, based on burst reading results (like 16 x 1018 for T21). can exceed the specification. This is caused be shared data in the L3 cache, and the way that the program is run.
MP-BusSpd2i.apk is a revised version for Android. Running time is longer and, rather than all threads reading data from the beginning, starting addresses are staggered. This can result in slower speed as there of fewer calculations in the inner loop, but increased speed, due to cached shared data, appears to no longer be applicable and burst results can be used to estimate maximum RAM throughput (as shown).
August 2015 - Results included for T22 with 64 bit CPU and 64 bit Android 5.0. Just considering the Read All data, A53 64/32 bit L1 cache, L2 cache and RAM performance ratios averaged 2.2, 1.8 and 1.0.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 13.02 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3990 4458 6123 6512 6438 6729 2T 3894 5699 8948 10299 11800 12555 4T 5046 7109 11952 14750 15533 23304 8T 4533 7464 13097 16970 21674 22225 122.9 1T 1304 1613 2291 2661 3667 5063 2T 2568 3145 4529 5365 7440 10147 4T 4117 4801 7963 7495 8239 18911 8T 3130 5016 7355 8543 11648 15845 12288 1T 190 265 601 1203 2316 3832 2T 244 448 995 1771 3599 6575 4T 427 584 860 1741 3439 7449 8T 395 510 855 1613 3547 6776 Total Elapsed Time 13.5 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.28 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5925 6494 6778 6979 7047 7026 2T 3966 7029 9689 11689 12856 13654 4T 4438 8698 16739 22057 23946 25729 8T 4455 8619 15787 19934 22576 20804 122.9 1T 1490 1975 2360 2802 3818 5330 2T 2881 3798 4647 5531 7536 10546 4T 4452 6338 5910 10217 14650 19903 8T 4096 5075 6264 9213 12610 15821 12288 1T 206 273 593 1198 2343 3935 2T 276 455 842 1821 3319 6591 4T 445 730 1401 2076 4457 7525 8T 424 539 954 1829 3688 7064 Total Elapsed Time 13.0 seconds ########## A1 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.50 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 5431 6110 6780 6262 6655 7313 2T 3550 4464 7375 9825 11777 12442 4T 2027 4442 4399 8841 17611 23509 8T 983 2477 5063 4433 8568 15867 122.9 1T 1499 1991 2357 2839 3818 5382 2T 2816 3808 4708 5592 7557 10677 4T 4316 6313 7991 9816 14335 19993 8T 4235 5610 7917 8791 12828 19661 49152 1T 215 275 611 1183 2328 3922 2T 276 435 787 1671 3323 6507 4T 398 455 884 1754 3490 6971 8T 376 511 867 1746 3512 7510 Total Elapsed Time 48.6 seconds Maiximum RAM Speed Estimate = 511 x 16 = 8176 MB/second #################### T11 ARM-Intel #################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.45 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2165 3591 4256 5587 5998 6109 2T 4121 6469 9530 11381 11846 11936 4T 4106 6438 8827 6793 9802 12080 8T 4098 6390 9534 10141 10996 11603 122.9 1T 464 740 1173 2395 3276 3340 2T 579 989 1934 3994 5431 5792 4T 579 988 1930 3873 5469 5821 8T 580 985 1915 3999 5408 5812 12288 1T 134 172 211 462 602 1904 2T 269 343 387 934 1217 2685 4T 252 231 374 768 991 2625 8T 231 254 367 781 1104 2782 Total Elapsed Time 12.1 seconds ########## T11 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 17.07 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 3499 4539 5499 5505 6134 6045 2T 3775 7202 8377 10605 10457 11319 4T 3982 6676 7687 9326 9707 10807 8T 2546 3643 7891 8003 10725 11097 122.9 1T 672 901 1336 2784 3274 3334 2T 568 969 1931 3894 5427 5221 4T 574 971 1912 3831 5256 4811 8T 559 971 1917 3878 5387 5162 49152 1T 140 142 193 575 989 1499 2T 221 223 342 769 1379 2355 4T 228 223 344 783 1382 2376 8T 223 223 342 787 1385 2352 Total Elapsed Time 49.9 seconds Maiximum RAM Speed Estimate = 223 x 16 = 2568 MB/second Initial Results 12.3 1T 693 936 1266 2522 3264 3329 2T 557 900 1539 3459 3317 3613 4T 551 903 1557 2902 3475 3616 #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s L1 caches 4 x 32 KB, L2 cache shared 2048 KB Android MP-BusSpd v7 Benchmark V1.1 29-Jun-2015 18.37 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2580 2206 5048 5176 5679 5989 2T 4062 5175 9340 9868 10971 11281 4T 4688 10324 16552 17196 21714 23708 8T 8467 9834 16698 18183 21936 23693 122.9 1T 1152 1052 2068 3035 3927 5723 2T 1710 1840 3094 5001 7963 11475 4T 2047 2002 5031 9267 14698 22920 8T 2235 2275 5223 9348 14234 21783 12288 1T 262 382 508 867 1466 2661 2T 464 766 1049 1754 3186 5735 4T 612 1018 1796 3149 5892 9095 8T 575 680 1277 2308 4987 7948 Total Elapsed Time 12.7 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-BusSpd v7 Benchmark V1.1 23-May-2015 17.05 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1840 2073 3512 3554 4829 5243 2T 3432 4591 7128 7651 9120 9821 4T 4398 7855 13752 15428 18530 20235 8T 6692 9507 13857 16110 18143 18796 122.9 1T 860 753 2011 2841 3205 5282 2T 1505 1609 3076 5038 8089 10421 4T 1924 1981 4299 7588 14614 20754 8T 1909 1988 4264 7980 13884 19027 12288 1T 270 379 538 856 1626 2859 2T 471 677 1098 1849 3304 5924 4T 549 787 1066 1874 6274 10781 8T 713 853 1649 2258 4664 8321 Total Elapsed Time 13.1 seconds ########## T21 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.39 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2247 2616 4010 4443 4909 5614 2T 3558 4725 7241 9048 9747 10892 4T 6074 8303 13442 16937 18525 21068 8T 3998 5106 14314 13615 18200 20740 122.9 1T 874 1198 2024 2935 4529 5345 2T 1686 1702 3174 5357 7688 10545 4T 1988 2139 4465 8171 14969 21169 8T 1972 2139 4468 8195 15261 21132 49152 1T 292 406 516 899 1663 2929 2T 449 541 962 1569 2851 4776 4T 495 605 1109 2439 4161 8243 8T 530 564 1156 2149 4172 7907 Total Elapsed Time 48.0 seconds Maiximum RAM Speed Estimate = 605 x 16 = 9680 MB/second ###################### T22 32 Bit ###################### T22, Tab 2 A8-50, 1.3 GHz quad core 64 bit ARM Cortex-A53 Single Channel RAM, LPDDR3 666 MHz, 5.3 GB/second ARM/Intel MP-BusSpd Benchmark V1.2 12-Aug-2015 16.13 Compiled for 32 bit ARM v7a MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1849 2140 2079 2211 2270 2297 2T 3663 4252 4294 4400 4370 4580 4T 4630 5574 5691 5893 6015 6083 8T 5331 5775 6033 6622 7968 8023 122.9 1T 597 621 1119 1815 2135 2237 2T 869 943 1644 2992 3740 4412 4T 949 951 1922 3736 6468 7779 8T 948 978 1911 3717 6464 7542 12288 1T 123 174 344 678 1215 1840 2T 243 310 672 1332 2383 3974 4T 302 285 594 1282 2271 4606 8T 279 295 654 1198 2749 4660 Total Elapsed Time 12.8 seconds ########## T22 Long Version ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.14 Compiled for 32 bit ARM v7a 12.3 1T 1877 2124 2176 2266 2296 2343 2T 3625 4198 4341 4468 4536 4613 4T 5733 7541 8293 8830 8024 9042 8T 2985 3829 7438 6117 8108 8923 122.9 1T 604 625 1142 1846 2150 2284 2T 924 950 1793 3277 4270 4504 4T 962 989 1939 3765 6798 8862 8T 965 993 1933 3748 6651 8239 49152 1T 165 175 344 677 1285 1979 2T 234 238 482 961 1907 3547 4T 266 298 562 1224 2296 4478 8T 272 275 538 1098 2149 4282 Total Elapsed Time 48.8 seconds ###################### T22 64 Bit ###################### ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.18 Compiled for 64 bit ARM v8a 12.3 1T 2610 2472 2586 2727 2748 5841 2T 4404 4681 4994 5369 5420 11297 4T 6546 8125 9105 10243 10319 20610 8T 3380 4023 7919 7146 9871 19852 122.9 1T 604 621 1110 1872 2446 5100 2T 919 948 1855 3433 4853 10037 4T 961 974 1984 3924 7491 14935 8T 963 942 1931 3915 7572 14689 49152 1T 173 177 340 692 1300 2653 2T 266 241 479 968 1883 3724 4T 304 277 556 1130 2126 4328 8T 279 278 544 1138 2179 4275 Total Elapsed Time 49.4 seconds #################### T7 ARM-Intel ##################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.35 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2853 3392 3376 3511 3551 3494 2T 2857 3389 3542 5540 5730 5595 4T 7257 10326 10289 10997 11373 11100 8T 6584 10325 10485 11175 11322 11189 122.9 1T 362 379 347 546 623 978 2T 516 530 508 726 1227 1840 4T 598 658 548 1181 1556 2657 8T 721 733 736 1181 1548 2653 12288 1T 58 57 84 123 173 334 2T 111 111 182 248 348 664 4T 87 85 276 463 687 1290 8T 154 107 147 429 441 1242 Total Elapsed Time 12.7 seconds ########## T7 New Long Version ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.59 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 2166 2774 3181 3307 3377 3263 2T 3924 5188 5207 5754 5759 5805 4T 7570 10011 10252 11165 11375 11777 8T 3510 4786 9011 8318 11351 11544 122.9 1T 383 409 359 558 663 983 2T 525 541 520 741 1241 1814 4T 739 752 753 1219 1590 2776 8T 735 741 753 1218 1607 2737 49152 1T 56 51 81 126 172 330 2T 65 67 107 196 335 620 4T 70 68 108 215 426 835 8T 70 68 109 215 428 851 Total Elapsed Time 48.2 seconds Maiximum RAM Speed Estimate = 68 x 16 = 1088 MB/second ############### BlueStacks Original ############### Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 17.44 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 1600 1538 1641 1706 1600 1687 2T 1600 1641 1745 1600 1687 1638 4T 1600 1745 1745 1567 1638 1575 8T 1476 1641 1602 1638 1575 1596 122.9 1T 1000 923 1477 1600 1600 1688 2T 1000 952 1477 1600 1567 1282 4T 872 1163 1422 1567 1602 1576 8T 1026 1164 1477 1527 1644 1580 12288 1T 307 403 537 1075 1396 1512 2T 302 409 708 1075 1417 1433 4T 307 355 614 1024 1433 1535 8T 307 384 661 1023 1404 1512 Total Elapsed Time 13.9 seconds ############### BlueStacks ARM/Intel ############## ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.25 MB/Second Reading Data, 1, 2, 4 and 8 Threads KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll 12.3 1T 9999 18461 20000 20512 19692 21942 2T 10909 17777 19999 19692 21942 20480 4T 9599 18461 19692 19591 20480 19692 8T 10666 17066 19948 20480 20480 19200 122.9 1T 1500 1476 2742 5485 11636 13128 2T 1428 1396 2792 5585 11170 13653 4T 1396 1428 2954 5486 10973 13654 8T 1280 1371 2744 5909 10974 14630 12288 1T 460 439 645 631 1105 1331 2T 230 268 480 806 1433 2234 4T 256 307 575 1126 2010 2764 8T 236 390 756 1105 1911 3574 Total Elapsed Time 14.4 seconds
This is a conversion of the longer running MP-RndMem2.apk Benchmark, as the original, short version, produced inconsistent performance measurements. It is a multithreading variety of RandMem Benchmark above. For further details and more results see Android MultiThreading Benchmark Apps. Log file details are provided below for the original version, that performed relatively badly on the Intel based tablet A1, and the ARM/Intel version, with cache based speeds up to 3.6 times faster with reading tests and 1.3 times with reading/writing. The new version, running on ARM based tablets, produced similar results to those from the original, with some slower.
Compared with early ARM based devices, tablet A1 ARM/Intel tests again demonstrated superior performance from RAM based data and from L2 cache on reading, but not that well using L1 cache.
August 2015 - Results provided for 64 bit T22 with Cortex-A53 CPU. Probably as performance is dependent on the complex indexing used, performance is not significantly faster at 64 bits.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.14 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 1337 2505 1337 2509 2T 2637 2513 2657 2521 4T 3535 2420 3484 2454 8T 3195 2403 3088 2406 122.9 1T 1305 2280 963 1758 2T 2581 2285 1945 1748 4T 3588 2130 3125 1740 8T 3211 2269 2949 1745 12288 1T 1248 1962 101 215 2T 2469 1940 191 214 4T 3462 1954 323 214 8T 3127 1926 318 212 Total Elapsed Time 43.7 seconds #################### A1 ARM-Intel ###################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.54 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4643 3593 4710 3641 2T 8583 3552 8761 3564 4T 12707 3450 12496 3384 8T 10410 3389 10796 3408 122.9 1T 3733 2874 2408 2150 2T 7259 2871 4781 2165 4T 11726 2897 7656 2133 8T 11673 2853 7100 2113 12288 1T 3153 2087 226 238 2T 5782 2073 327 238 4T 6451 1997 447 236 8T 6471 2071 446 233 Total Elapsed Time 41.5 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.13 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 6696 4438 6594 4483 2T 12338 3078 12263 3573 4T 12419 2834 12166 2907 8T 12314 2903 11991 2934 122.9 1T 3371 2916 1639 1748 2T 6409 1922 2052 1097 4T 6155 1892 2027 1186 8T 6045 2105 2015 1192 12288 1T 1394 1048 153 133 2T 2245 985 285 123 4T 2277 1002 285 132 8T 2165 1001 286 127 Total Elapsed Time 44.0 seconds #################### T11 ARM-Intel #################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 12.07 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 6315 4486 6345 4484 2T 11837 2910 11846 3112 4T 11864 2835 11553 2858 8T 11821 3003 11805 3198 122.9 1T 3963 2681 1670 1704 2T 6672 1782 2040 1125 4T 6493 1817 2033 1218 8T 6673 1738 2038 1303 12288 1T 1805 1081 177 145 2T 2543 1066 279 137 4T 2600 1065 276 136 8T 2662 1073 281 138 Total Elapsed Time 43.7 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android MP-RndMem2 Benchmark V2.1 08-Jul-2015 16.33 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 5088 5325 4262 4711 2T 9752 4902 8895 4570 4T 17379 4653 17434 4096 8T 19771 4698 17358 4424 122.9 1T 2714 2578 1923 2163 2T 5614 2502 3483 2107 4T 10859 2219 4835 1972 8T 10654 2410 4904 1923 12288 1T 1798 952 186 204 2T 3489 974 341 195 4T 6515 943 563 196 8T 6218 922 563 187 Total Elapsed Time 42.3 seconds #################### T21 ARM-Intel #################### ARM/Intel MP-RndMem Benchmark V1.1 09-Jul-2015 11.48 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 4186 3777 4055 3933 2T 9324 3541 7710 3619 4T 16594 3350 15731 3142 8T 18117 3291 16187 3262 122.9 1T 2423 2043 1610 1683 2T 5235 2029 3013 1641 4T 10148 1935 4662 1565 8T 10015 1834 4611 1474 12288 1T 1363 886 171 186 2T 2643 845 325 187 4T 5197 823 534 184 8T 4801 835 542 184 Total Elapsed Time 42.6 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.13 Compiled for 32 bit ARM v7a MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 2894 2438 2887 2433 2T 5665 2402 5663 2403 4T 10922 2369 11100 2310 8T 10065 2293 10648 2265 122.9 1T 2681 2368 757 758 2T 5351 2360 1398 769 4T 10056 2308 2121 772 8T 8838 2351 1916 742 12288 1T 2309 1662 80 78 2T 3986 1683 164 73 4T 5419 1684 283 82 8T 4658 1694 279 82 ###################### T22 64 Bit ###################### ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.15 Compiled for 64 bit ARM v8a 12.29 1T 4445 3109 4455 3089 2T 8010 3100 8072 3105 4T 15909 3057 14711 3040 8T 14764 3036 14570 3037 122.9 1T 3457 2888 842 876 2T 6537 2924 1524 876 4T 11095 2892 2119 861 8T 11729 2916 2080 874 12288 1T 2475 1679 81 78 2T 4155 1713 163 73 4T 5503 1711 285 89 8T 4519 1717 281 89 Total Elapsed Time 48.1 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.17 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3120 3060 3128 3078 2T 6098 3003 6083 3004 4T 11354 2948 11188 2942 8T 11403 2857 10412 2872 122.9 1T 996 983 661 699 2T 1868 984 1012 697 4T 2600 982 1483 699 8T 2534 976 1459 694 12288 1T 335 286 91 80 2T 640 288 113 82 4T 892 286 130 82 8T 925 287 127 81 Total Elapsed Time 44.7 seconds #################### T7 ARM-Intel ##################### ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.59 MB/Second Using 1, 2, 4 and 8 Threads KB SerRD SerRDWR RndRD RndRDWR 12.29 1T 3060 2001 2867 1904 2T 5459 1879 5463 1867 4T 10797 1852 10537 1856 8T 10090 1802 10608 1813 122.9 1T 968 823 588 547 2T 1749 785 902 618 4T 2716 812 1328 672 8T 2733 810 1407 673 12288 1T 329 274 90 82 2T 636 272 112 82 4T 849 271 128 82 8T 869 271 126 81 Total Elapsed Time 45.4 seconds
Details of the benchmark can be found above and in android neon benchmarks.htm. The main point is that it was a complete surprise to discover that ARM NEON intrinsic functions could be converted to Intel SIMD SSE instructions, with significant performance improvement on an Atom based tablet. The use of NEON functions for ARM CPUs can be anticipated to produce similar performance ratings via the original and ARM/Intel versions, as reflected in the results below.
August 2015 - T22 results from 32 bit and 64 bit compilations were similar, as the programs use a limited number of identical intrinsic functions.
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with speed of 1446 MFLOPS at 2 bits.
NEON Single Precision Floating Point MFLOPS ######################################################## A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s MFLOPS Original 443.4 ARM-Intel 900.2 ######################################################## T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz MFLOPS Original 1334.9 ARM-Intel 1411.9 ######################################################## T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s MFLOPS Original 1250.1 ARM-Intel 1235.0 ######################################################## T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 MFLOPS 32 bit 407.1 64 bit 505.2 ######################################################## T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz MFLOPS Original 376.0 ARM-Intel 346.8 ######################################################## P33, Snapdragon 810 2000 MHz, Android 5.0.2 MFLOPS 32 bit 1446.4
This benchmark carries out the same calculations as the MemSpeed Benchmark measuring data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. The first set of calculations use normal functions followed by some using NEON Intrinsic Functions. The last two columns are NEON only results. For further details and results see android neon benchmarks.htm.
The native Intel code produced some performance gains, mainly using L1 cache based data, but speed in other areas is probably limited by data flow. The later compiler produced some slower speeds on ARM based tablet T11 and better/worse variations on T21.
August 2015 - Results provided for 64 bit T22. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android NeonSpeed Benchmark V1.1 02-Feb-2015 17.09 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 1778 3940 2807 5474 4997 5062 32 1781 3576 2636 4431 4316 4291 64 1772 3589 2639 4480 4337 4332 128 1784 3589 2641 4423 4320 4320 256 1766 3592 2642 4400 4347 4358 512 1784 3585 2633 4375 4350 4355 1024 1705 3253 2448 3760 3789 3788 4096 1673 3021 2366 3257 3245 3237 16384 1672 2948 2349 3062 3157 3151 65536 1675 2967 2345 3190 3168 3168 Total Elapsed Time 10.8 seconds #################### A1 ARM-Intel ###################### ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 16.54 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 1816 5996 4916 6244 6882 6880 32 1851 4703 3985 5200 5609 5711 64 1862 3845 3121 4174 4441 4520 128 1841 3929 3110 4179 4411 4487 256 1863 3932 3092 4179 4412 4493 512 1861 3938 3090 3894 4215 4415 1024 1784 3475 2738 3130 3223 3443 4096 1741 2376 2649 2998 3112 3139 16384 1774 3086 2780 3116 3140 3145 65536 1774 2987 2547 2328 3126 3072 Total Elapsed Time 10.1 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Android NeonSpeed Benchmark V1.1 09-Aug-2013 17.10 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3793 9641 4375 13023 13456 13562 32 5777 11410 4993 11718 11365 11143 64 4122 6692 3855 6539 6682 7210 128 4017 6565 3849 6475 6520 6983 256 4067 6562 3836 6459 6495 7038 512 3900 6531 3820 6428 6490 7095 1024 1821 2544 1774 2532 2554 2539 4096 1141 1645 1536 1612 1615 1635 16384 1437 1695 1490 1576 1694 1668 65536 1424 1675 1475 1699 1687 1694 Total Elapsed Time 11.2 seconds #################### T11 ARM-Intel #################### ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.17 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 2252 4964 3321 6602 7304 7237 32 4202 8364 4543 8366 8553 8101 64 3710 6096 3860 6570 6348 6182 128 3802 5581 3874 6044 5624 5877 256 3654 5618 3501 6154 5655 5783 512 3597 5688 3723 6130 5812 5684 1024 1727 2466 1659 2481 2454 2472 4096 1479 1718 1421 1714 1713 1706 16384 1488 1704 1435 1576 1705 1694 65536 1477 1755 1453 1754 1759 1752 Total Elapsed Time 10.8 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android NeonSpeed Benchmark V1.1 23-Jul-2015 13.00 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 4324 13809 4498 14660 17501 18186 32 3587 6845 2922 8073 6981 7035 64 3347 6894 2912 8078 6964 6938 128 3343 6651 2919 7922 6726 6999 256 3511 6963 3002 8071 6902 6897 512 3476 6628 3025 7827 6613 6818 1024 3172 4627 2773 6424 4800 4806 4096 2653 2051 2378 3613 2090 2054 16384 2356 1891 2118 3165 1955 1962 65536 2424 1923 2167 3368 1933 1925 Total Elapsed Time 9.9 seconds #################### T21 ARM-Intel #################### ARM/Intel NeonSpeed Benchmark V1.1 23-Jul-2015 13.03 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3623 16704 4623 15187 17446 16719 32 3455 9210 2997 8723 9280 9112 64 3336 7721 3002 8544 8469 8581 128 3415 7664 3111 8481 7549 7638 256 3584 7526 3087 8500 7849 7805 512 3538 7422 3154 8266 7567 7541 1024 3513 7227 3067 7789 7294 7261 4096 2302 1673 2413 3107 1693 1677 16384 2286 1616 2323 3024 1620 1617 65536 2322 1617 2271 2505 1634 1600 Total Elapsed Time 9.9 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.32 Compiled for 32 bit ARM v7a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 971 3853 1807 4059 3957 4397 32 970 3812 1800 3983 3891 4323 64 927 3228 1605 3038 3269 3521 128 926 3321 1681 3343 3354 3596 256 936 3386 1693 3449 3413 3667 512 898 2889 1578 2996 2927 3118 1024 794 1859 1345 2057 1996 1924 4096 794 1796 1250 1788 1813 1835 16384 792 1773 1270 1820 1829 1864 65536 796 1811 1289 1852 1832 1880 Total Elapsed Time 11.3 seconds ###################### T22 64 Bit ###################### ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.37 Compiled for 64 bit ARM v8a Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 3054 4055 3605 4376 4911 5094 32 2922 3787 3435 4198 4546 4682 64 2795 3514 3259 3658 4050 4116 128 2886 3529 3373 3924 4148 3963 256 2883 3641 3264 3942 4193 4276 512 2454 3165 2985 3385 3586 3542 1024 1633 2000 1835 2043 2114 2105 4096 1738 1893 1899 1900 1956 1955 16384 1757 1870 1886 1802 1921 1846 65536 1755 1875 1870 1903 1936 1937 Total Elapsed Time 10.2 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Android NeonSpeed Benchmark 15-Dec-2012 14.38 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 860 2575 2325 2918 3053 3245 L1 32 950 2551 2400 2823 2944 3131 64 744 1396 1329 1434 1465 1496 L2 128 713 1342 1319 1365 1392 1417 256 714 1339 1311 1357 1377 1400 512 708 1323 1299 1348 1358 1383 1024 608 875 869 917 930 952 4096 460 493 492 481 488 504 RAM 16384 460 498 487 507 506 504 65536 459 495 469 251 503 505 Total Elapsed Time 11.5 seconds #################### T7 ARM-Intel ##################### ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.07 Vector Reading Speed in MBytes/Second Memory Float v=v+s*v Int v=v+v+s Neon v=v+v KBytes Norm Neon Norm Neon Float Int 16 881 2440 2501 3334 3206 3465 32 901 1868 1705 2260 2083 2186 64 801 1395 1365 1573 1548 1581 128 784 1282 1278 1405 1389 1411 256 787 1279 1285 1420 1380 1409 512 777 1266 1267 1409 1370 1394 1024 604 786 762 769 770 828 4096 458 479 477 463 486 488 16384 436 447 448 469 470 469 65536 450 472 469 240 482 483 Total Elapsed Time 11.5 seconds
NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic functions used for all calculations. For further results see android neon benchmarks.htm.
Results for the original NEON version and a sample of MP-MFLOPS are provided below. NEON produced significant performance improvements across the board, including The Atom based tablet, via the ARM to Intel conversion layer. As might be expected using intrinsics, compilation via a later version of gcc made little difference in speed of ARM systems but the Intel native code increased performance by more than twice, on CPU speed limited tests.
Following the performance details are the numeric results of calculations from the fixed parameters used in the new version, for both ARM and Intel. It seems that Tablet T11 has an intermittent fault, as it occasionally fails to calculate a correct answer or causes the Tablet to crash and reboot. Now, this also appears to happen using the older version.
August 2015 - T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the latter suffering from fewer registers for the variables. Using one core, maximum speed was 2.77 GFLOPS, rising to 10.8 GFLOPS via four cores (best so far relative to CPU GHz). The one core speed equated to just over two floating point operation per clock cycle. This is disappointing, compared with Intel processors, such as the Core 2 onwards, at 6 per clock cycle out of a maximum of 8, with SSE SIMD code (See Linux results).
September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, at 64 bits. Performance, with 8 threads, is up to 23.6 GFLOPS, and up to nearly 3.5 results per clock cycle, using one core.
#################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Android NEON-MFLOPS-MP Benchmark V1.1 07-Feb-2015 18.37 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1110 1319 878 1188 1139 1226 2T 2470 2114 996 2406 2427 2390 4T 3159 2211 988 4148 3487 4006 8T 2066 2486 1003 4144 3944 4077 Total Elapsed Time 3.6 seconds Not NEON 4T 1571 1627 979 2238 2255 2258 Android NEON-MFLOPS2-MP Benchmark V2.1 07-Feb-2015 18.38 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1796 1520 1025 1231 1228 1227 2T 3354 2959 1047 2427 2445 2445 4T 4627 5508 978 4690 4791 4733 8T 3861 6307 1030 4611 4869 4742 Total Elapsed Time 88.3 seconds #################### A1 ARM-Intel ###################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2151 1962 1064 2619 2694 2650 2T 4421 3849 1048 5296 5463 5343 4T 5886 6652 982 9592 10735 10362 8T 3744 7284 1018 9085 10791 9493 Total Elapsed Time 13.8 seconds ############### A1 ARM-Intel 1000 MHz ################# ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 16.04 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1939 1266 674 2503 2388 2351 2T 3670 2652 679 4919 4792 4640 4T 3102 3051 676 4688 4678 4672 8T 3189 3425 657 4813 4869 4639 Total Elapsed Time 19.4 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Dual core, Measured 1.7 GHz Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1847 1415 597 3772 4096 3545 2T 3649 3309 664 8065 7966 7505 4T 3670 3922 658 7753 8148 7490 8T 5664 5570 681 8092 8355 7672 Total Elapsed Time 13.0 seconds Not NEON 2T 1593 1668 648 3140 3067 2977 #################### T11 ARM-Intel #################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 1965 1630 582 3792 4077 3521 2T 3789 2690 663 8497 8133 7297 4T 5714 4883 654 8364 8192 7554 8T 5414 6316 673 7976 8437 6635 Total Elapsed Time 13.0 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2757 2576 771 2808 2825 2800 2T 5662 5525 1516 5631 5664 5570 4T 6550 7846 1945 11167 11281 10939 8T 10273 10928 1981 10851 11211 11350 Total Elapsed Time 40.0 seconds Not NEON 4T 2338 2959 1836 4867 4911 4859 #################### T21 ARM-Intel #################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 3049 2857 622 2923 2874 2098 2T 5508 4887 1009 5477 5736 4349 4T 5643 5282 1410 11244 11601 8564 8T 9294 11156 1681 11288 11605 8946 Total Elapsed Time 14.0 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35 Compiled for 32 bit ARM v7a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 619 613 575 1444 1446 1426 2T 1174 1206 889 2894 2902 2839 4T 1585 1616 901 5679 5726 5596 8T 2075 2130 944 5400 5585 5519 Total Elapsed Time 25.8 seconds ###################### T22 64 Bit ###################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 726 745 647 2766 2774 2639 2T 1397 1402 903 5523 5552 5371 4T 1871 1930 898 10780 10479 10439 8T 2496 2876 1011 9736 10679 9900 Total Elapsed Time 15.1 seconds ##################### P33 64 Bit ##################### P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59 Compiled for 64 bit ARM v8a FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 2811 3126 1089 6943 6589 6342 2T 2488 4114 1541 12084 10559 8809 4T 4759 5480 2038 16516 14826 11960 8T 4840 8985 2452 22082 23563 12461 Total Elapsed Time 7.6 seconds ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Quad core, Measured 1200 MHz Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 532 402 124 1135 1044 960 2T 1255 798 213 2041 1987 1916 4T 2441 1553 229 4185 4034 3450 8T 1922 2403 226 3774 3996 3346 Total Elapsed Time 4.5 seconds Not NEON 4T 716 655 233 2367 2316 2240 #################### T7 ARM-Intel ##################### ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24 FPU Add & Multiply using 1, 2, 4 and 8 Threads 2 Ops/Word 32 Ops/Word KB 12.8 128 12800 12.8 128 12800 MFLOPS 1T 657 407 132 1077 1074 1053 2T 1265 817 222 2147 2150 2078 4T 2024 1695 234 4214 4276 3555 8T 2435 2495 234 4196 4100 3523 Total Elapsed Time 39.0 seconds ##################### New Results ##################### Results x 100000, 12345 indicates ERRORS ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 1T 44934 86735 99850 36770 79897 99759 2T 44934 86735 99850 36770 79897 99759 4T 44934 86735 99850 36770 79897 99759 8T 44934 86735 99850 36770 79897 99759 T11 44934 12345 99850 36770 79897 99759 Android NEON-MFLOPS-MP Benchmark V1.1 1T 86735 98519 99984 79897 97638 99975 2T 86735 98519 99984 79897 97638 99975 4T 86735 98519 99984 79897 97638 99975 8T 86735 98519 99984 79897 97638 99975 Android NEON-MFLOPS2-MP Benchmark V2.1 1T 40015 66980 99522 35216 54898 99234 2T 40015 66980 99522 35216 54898 99234 4T 40015 66980 99522 35216 54898 99234 8T 40015 66980 99522 35216 54898 99234
This is a multithreading version of NEON-Linpack Benchmark. Further details and results can be found in android neon benchmarks.htm. The benchmark is run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads, the programming code for zero theads being the same as the earlier example. Multithreading performance, using this standard linear equation solver, is severely degraded, due to overheads, the zero thread results being the only ones of real use.
Performance, using native Intel compilation, is shown to be twice as fast, except at N = 1000, which is mainly dependent on calculations from data in RAM. Speed from ARM can also be somewhat faster (or slower). T21, with the Qualcomm Snapdragon 800, obtains significantly fastest results, at unthreaded N = 500.
The program checks that the same numeric results are produced, irrespective of the number of threads used, at each matrix size. Then, due to rounding effects, these are slightly different from ARM and Intel hardware, as shown below.
August 2015 - T22 results from 32 bit and 64 bit compilations were again similar, due to the programs use a limited number of identical intrinsic functions.
MFLOPS 0 to 4 Threads, N 100, 500, 1000 #################### A1 Original ####################### A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s Threads None 1 2 4 N 100 452.39 21.00 23.48 17.48 N 500 663.38 275.56 88.66 312.71 N 1000 617.04 380.60 191.26 195.61 #################### A1 ARM-Intel ###################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 13.58 Threads None 1 2 4 N 100 971.71 37.72 36.36 39.66 N 500 1311.37 488.73 487.85 488.98 N 1000 945.97 727.85 737.95 742.34 Total Elapsed Time 59.966 seconds #################### T11 Original ##################### T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2 Measured 1.7 GHz Threads None 1 2 4 N 100 1399.82 54.86 55.31 54.66 N 500 1154.21 434.16 434.06 436.97 N 1000 571.26 482.57 487.25 485.80 #################### T11 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44 Threads None 1 2 4 N 100 1497.90 61.13 63.13 61.87 N 500 1399.10 491.49 489.29 494.69 N 1000 586.14 499.00 504.97 497.49 Total Elapsed Time 43.952 seconds #################### T21 Original ##################### T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46 Threads None 1 2 4 N 100 1311.08 12.38 12.93 15.05 N 500 2271.56 344.04 419.52 381.73 N 1000 837.30 540.99 523.52 564.87 Total Elapsed Time 143.534 seconds #################### T21 ARM-Intel #################### ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51 Threads None 1 2 4 N 100 1308.07 14.89 11.77 11.63 N 500 2341.17 407.96 481.02 415.12 N 1000 901.21 551.80 566.77 564.31 Total Elapsed Time 145.750 seconds ###################### T22 32 Bit ###################### T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52 Compiled for 32 bit ARM v7a Threads None 1 2 4 N 100 460.74 22.35 23.16 23.82 N 500 480.63 336.52 339.94 303.66 N 1000 470.02 405.86 403.01 405.98 ###################### T22 64 Bit ###################### ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57 Compiled for 64 bit ARM v8a Threads None 1 2 4 N 100 548.67 27.70 33.93 37.00 N 500 470.04 285.95 297.79 301.67 N 1000 519.02 441.84 443.47 441.91 ##################### T7 Original ###################### T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, Measured 1200 MHz Threads None 1 2 4 N 100 413.47 45.95 48.22 48.34 N 500 253.08 187.51 189.69 189.94 N 1000 148.76 135.49 136.08 136.17 #################### T7 ARM-Intel ##################### ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40 Threads None 1 2 4 N 100 385.49 28.79 29.06 29.25 N 500 272.07 184.85 183.70 183.18 N 1000 147.09 131.92 132.44 130.05 Total Elapsed Time 64.318 seconds ################### Numeric Results ################### NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1 N 100 500 1000 ARM NR 1.60 3.96 11.32 RE 3.80277634e-05 4.72068787e-04 2.70068645e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05 Intel NR 1.68 3.96 11.39 RE 4.00543213e-05 4.72545624e-04 2.71725655e-03 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05
The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds.
Besides Android, the bechmarks are available to run via Windows and Linux.
Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in
FFTBenchmarks.htm.
Below is an example of results.
Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800 ARM/Intel FFT Benchmark 3c.0 08-Sep-2015 23.15 Compiled for 32 bit ARM v7a Size milliseconds K Single Precision Double Precision 1 0.155 0.352 1.341 0.087 0.073 0.073 2 0.812 0.814 0.750 0.201 0.187 0.251 4 1.751 1.658 1.776 0.414 0.405 0.443 8 3.712 1.083 1.065 0.930 0.899 0.890 16 2.880 3.356 2.430 2.579 2.658 2.380 32 6.124 6.541 5.605 5.907 6.070 5.681 64 13.430 12.566 12.774 13.792 13.556 13.997 128 30.737 27.408 27.132 33.318 33.088 33.071 256 64.472 63.394 64.690 73.288 72.546 72.786 512 153.609 150.383 156.046 155.788 156.304 163.178 1024 315.283 306.323 307.409 369.426 337.074 336.684 1024 Square Check Maximum Noise Average Noise SP 9.999520e-01 3.346482e-06 4.565234e-11 DP 1.000000e+00 1.133294e-23 1.428110e-28 Total Elapsed Time 6.5 seconds
A1 Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745 Screen pixels w x h 800 x 1216 Android Build Version 4.4.2 Processor : ARMv7 processor rev 1 (v7l) BogoMIPS : 1500.0 Features : neon vfp swp half thumb fastmult edsp vfpv3 CPU implementer : 0x69 CPU architecture: 7 CPU variant : 0x1 CPU part : 0x001 CPU revision : 1 Hardware : placeholder Revision : 0001 Linux version 3.10.20 Mainly runs at 1.86 GHz Turbo Boost T7 Device Google Nexus 7 quad core CPU 1.3, GHz 1.2 GHz> 1 core RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec Screen pixels w x h 1280 x 736 MHz Twelve-core Nvidia GeForce ULP graphics 416 MHz Android Build Version 4.1.2 Processor : ARMv7 Processor rev 9 (v7l) processor : 0 BogoMIPS : 1993.93 processor : 1 BogoMIPS : 1993.93 processor : 2 BogoMIPS : 1993.93 processor : 3 BogoMIPS : 1993.93 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x2 CPU part : 0xc09 - Cortex-A9 CPU revision : 9 Hardware : grouper - nVidia Tegra 3 T30L Revision : 0000 Linux version 3.1.10 Runs at 1.2 GHz T11 Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15, Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s Screen pixels w x h 1920 x 1032 Android Build Version 4.2.2 - Jelly Bean Processor : ARMv7 Processor rev 4 (v7l) processor : 0 BogoMIPS : 992.87 processor : 1 BogoMIPS : 997.78 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc0f CPU revision : 4 Hardware : SMDK5250 Linux version 3.4.35Ut Runs at 1.7 GHz T21 Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800 (Krait 400) 2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz Device Amazon KFTHWI Screen pixels w x h 1200 x 1803 Android Build Version 4.4.3 Processor : ARMv7 Processor rev 0 (v7l) processor : 0, 1, 2, 3 BogoMIPS : 38.40 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt CPU implementer : 0x51 CPU architecture: 7 CPU variant : 0x2 CPU part : 0x06f CPU revision : 0 Hardware : Qualcomm MSM8974 Revision : 0000 Linux version 3.4.0-perf (gcc version 4.7) T22 Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 1 GB LPDDR3, GPU Mali T720 MP2 Device LENOVO Lenovo TAB 2 A8-50F Screen pixels w x h 800 x 1216 Android Build Version 5.0.2 Processor : AArch64 Processor rev 3 (aarch64) processor : 0, 1, 2 BogoMIPS : 26.0 Features : fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: AArch64 CPU variant : 0x0 CPU part : 0xd03 CPU revision : 3 Hardware : MT8161 Linux version 3.10.65 P33 Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm Snapdragon 810 64-bit CPU Screen pixels w x h 1080 x 1776 Android Build Version 5.0.2 Processor : AArch64 Processor rev 1 (aarch64) processor : 0 to 7 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 CPU implementer : 0x41 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd07 CPU revision : 1 Hardware : Qualcomm Technologies, Inc MSM8994 Linux version 3.?10.?49 BS1 BlueStacks Emulator on 3 GHz Phenom via Windows 7 Screen pixels w x h 1024 x 600 Android Build Version 2.3.4 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 Screen pixels w x h 1440 x 852 Android Build Version 4.4.2