Roy Longbottom at Linkedin Roy Longbottom's Android Native ARM + Intel Benchmarks

For latest results see Android Benchmarks For 32 Bit and 64 Bit CPUs from ARM, Intel and MIPS.

Contents


Download Benchmark Apps


A Settings, Security option may need changing to allow installation of non-Market applications

Logo NativeWhetstone2.apk
First standard benchmark Download
Logo Dhrystone2i.apk
First integer benchmark Download
Logo LinpackDP2.apk
First comptutational benchmark Download
Logo LinpackSP2.apk
Single precision Linpack Download
Logo LivermoreLoops2.apk
First supercomputer benchmark Download
Logo MemSpeedi.apk
Floating Point Cache and
RAM Test Download
Logo BusSpeedv7i.apk
Integer Bus, Cache and RAM
Test Download
Logo RandMemi.apk
Random/Serial Access
Cache and RAM Test Download
Logo MP-MFLOPSi.apk
CPU, Cache, RAM MFLOPS
Test Download
Logo MP-MFLOPS2i.apk
Long Running MP-MFLOPS Download
Logo MP-WHETSi.apk
Whetstone Floating and Fixed Point Tests Download
Logo MP-Dhryi.apk
Dhrystone Integer Benchmark Download
Logo MP-BusSpdi.apk
Multithreaded BusSpeed
Benchmark Download
Logo MP-RndMemi.apk
Multithreaded RandMem
Benchmark Download
Logo NEON-Linpacki.apk
Linpack Benchmark using ARM
NEON Intrinsic Functions Download
Logo NeonSpeedi.apk
NEON Memory Speed Test
Using Intrinsic Functions Download
Logo NEON-MFLOPS2i-MP.apk
MP-MFLOPS using ARM
NEON Intrinsic Functions Download
Logo NEON-Linpacki-MP.apk
Linpack MP Benchmark nsing
NEON Intrinsic Functions Download
Logo MP-BusSpd2i.apk
Long running vesion
with staggered start Download
Logo fft1.apk
Original FFT Benchmark Download
Logo fft3c.apk
Optimised FFT Benchmark Download




All the above were produced using gcc 4.8, via Eclipse, running under Linux Ubuntu 14.04

General

Intel Atom processors are appearing in a number of Android devices. When running existing ARM apps that are compiled to produce native code, rather than via Java, Android, for these devices, has a compatibility layer, called Houdini, that maps ARM instructions into X86 instructions. This is known to produce poor performance, with questions on battery drain.

My existing Android benchmarks were produced on Linux Ubuntu based PCs, using Eclipse. Many use a Java front end, with C/C++ code compiled using a Java Native Interface. These projects can be downloaded from Android Benchmarks.zip, Android Graphics Benchmarks.zip, Android NEON Benchmarks.zip, and Android MP Benchmarks.zip.

The JNI directory contains the C/C++ code and an Application.mk file that tells the compiler which platform to produce machine code for. The mk file, for original benchmarks, had parameters APP_ABI := armeabi-v7a, for ARM V7 CPUs, or = armeabi armeabi-v7a, to include earlier technology, the appropriate one being selected at run time.

I was surprised to find that gcc 4.8 provided parameters to produce native Intel code, and others. Those currently available are arm64-v8a, armeabi, armeabi-v7a, mips, mips64, x86 and x86-64. I use APP_ABI := all, to at least run the programs via ARM and Intel CPUs. Although the Atom is a 64 bit CPU, the currently installed Android 4.4 will not run x86-64 compilations. Eclipse projects for the new compilations are in Android Intel-ARM Benchmarks.zip

Initial comparisons provided are for tablets with Intel Atom, ARM Cortex-A9 and ARM Cortex-A15 CPUs, plus via BlueStacks Emulator running under Windows 7, on a 3.0 GHz Phenom, and Windows 8 on a 3.7 GHz Core i7. The results are for the original ARM only compilations and the latest with ARM and Intel native instructions.

These benchmarks should also run on 64 bit CPUs with 64 bit versions of Android. Some slight changes are being included in the programs to identify which section of the software is being used. They are being run on a Lenovo Tab 2 A8-50, 8 Inch Tablet, with a 1.3 GHz MediaTek mt8161 quad core processor (64 bit ARM Cortex-A53) and Android 5.0.2. Further details are in Android 64 Bit Benchmarks.htm and results are included below.

To Start


Logged Configuration

All the benchmarks were run on an Asus MeMO Pad 7 ME176CX that has a quad core Intel Atom Z3745, rated as 1.33 GHz but mainly running at the Turbo Boost Speed of 1.86 GHz. All benchmarks have an option save results via Email, and this includes details of system used. Following are example details provided for this Asus MeMo Pad 7. Similar details of other Android deices are in Android Benchmarks.htm. Those provided later are a brief summary.

 Intel CPU Code
 Device Asus K013
 Screen pixels w x h 800 x 1216
 Android Build Version 4.4.2
 d : 0, siblings : 4, core id : 3, cpu cores : 4, apicid : 6, initial apicid : 6
 fdiv_bug : no, f00f_bug : no, coma_bug : no, fpu : yes, fpu_exception : yes
 cpuid level : 11, wp : yes
 flags : fpu vme + numerous others including up to SSE4
 bogomips : 2666.77
 clflush size : 64
 cache_alignment : 64
 address sizes : 36 bits physical, 48 bits virtual
 processor : 3
 vendor_id : GenuineIntel
 cpu family : 6
 model : 55
 model name : Intel(R) Atom(TM) CPU Z3745 @ 1.33GHz
 stepping : 8
 microcode : 0x81b
 cpu MHz : 1862.000
 cache size : 1024 KB
 physical i
 Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
 PREEMPT Tue Sep 16 10:49:37 CST 2014
 With ARM CPU Code
 Screen pixels w x h 800 x 1216
 Android Build Version 4.4.2
 Processor : ARMv7 processor rev 1 (v7l)
 BogoMIPS : 1500.0
 Features : neon vfp swp half thumb fastmult edsp vfpv3
 CPU implementer : 0x69
 CPU architecture: 7
 CPU variant : 0x1
 CPU part : 0x001
 CPU revision : 1
 Hardware : placeholder
 Revision : 0001
 Serial : 0000000000000001
 Linux version 3.10.20-g268162b (3.2.23.182) (gcc version 4.7 (GCC) ) #1 SMP
 PREEMPT Tue Sep 16 10:49:37 CST 2014
 
 
To Start


Whetstone Benchmark - NativeWhetstone2.apk

This provides an overall rating in MWIPS, plus separate results for the eight test procedures in MFLOPS (floating point) and MOPS (functions and integer). For full details and results via Windows. Linux, Android and via different programming languages, see Whetstone Benchmark Results on PCs.

Native Intel code produced average performance gains of 1.93 times using Atom A1. The original version was slow running on the Phenom based BlueStacks Android emulator, not the case with the later BlueStacks version, running on the 3.7 GHz Core i7, with both being much faster on the newer benchmark, apparently running native Intel instructions, rather than conversion to ARM. With the later ARM code, MWIPS was much lower on the Cortex CPUs, entirely due to the slow EXP functions test.

July 2015 - ARM/Intel version speeds are similar to the original on ARM CPUs reported here, except the COS tests on T7 and T11 which produces significant impact on the overall MWIPS rating.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Results at 32 and 64 bits were not that different.


 System ARM MHz Android MWIPS ------MFLOPS------- ------------MOPS--------------
 See CPU Build 1 2 3 COS EXP FIXPT IF EQUAL

 Original ARM Version
 A1 Z3745 1866 4.4.2 1075.4 373.8 311.5 284.5 21.9 14.2 1421.1 1839.2 797.0
 T7 v7-A9 1200 4.1.2 1115.0 271.3 250.7 256.4 25.8 14.6 1190.0 1797.0 1198.7
 T22 v8-A53 1300 5.0.2 1433.7 348.0 319.3 308.2 36.3 19.8 1551.4 1861.9 611.0
 T11 v7-A15 1700 4.2.2 1477.7 363.9 220.6 307.5 39.7 18.0 1690.5 2527.9 1127.9
 T21 QU-800 2150 4.4.3 2035.1 665.7 640.0 531.6 45.2 23.1 3535.2 3180.4 2120.0
 BS1 Emul Phen 3000 2.3.4 103.6 36.9 32.6 37.7 1.8 1.4 130.2 414.0 374.1
 BS2 Emul i7 3700 4.4.2 844.5 428.6 351.8	 343.6 14.6 10.9 1909.1 533.5 478.8
 ARM/Intel 32 Bit Version
 A1 Z3745 1866 4.4.2 1888.4 665.8 504.4 492.0 35.7 27.5 3191.4 3585.8 2146.7
 T7 v7-A9 1200 4.1.2 731.1 273.6 253.0 252.8 28.0 5.0 1185.2 2383.4 1192.1
 T11 v7-A15 1700 4.2.2 907.4 363.3 327.1 303.1 33.6 6.3 1506.9 2476.5 1122.6
 T21 QU-800 2150 4.4.3 1973.8 679.6 648.4 525.6 44.7 21.9 3516.7 3147.2 1567.7
 T22 v8-A53 1300 5.0.2 834.7 348.9 312.7 310.9 36.7 5.4 1556.7 1867.2 570.5
 BS1 Emul Phen 3000 2.3.4 2992.3 897.2 707.4 623.6 76.3 37.8 3705.9 4423.1 2281.5
 BS2 Emul i7 3700 4.4.2 5086.9 1066.7 1120.0 963.2 166.4 56.4 6300.0 11436.5 3786.9
 ARM/Intel 64 Bit Version
 T22 v8-A53 1300 5.0.2 1494.2 347.1 307.0 305.9 37.5 20.6 1552.2 1863.7 1239.1
 
To Start


Dhrystone Benchmark - Dhrystone2i.apk

The Dhrystone integer benchmark produces a performance rating in Vax MIPS (AKA DMIPS). Further details of the Dhrystone benchmark, and results from Windows and Linux based PCs, can be found in Dhrystone Results.htm. The ratio MIPS/MHz is often quoted, but this depends on compiler optimisation (or over-optimisation)

The new version, with native Intel code, produces a 33% gain in performance, with BlueStacks Emulator 9.2 times faster. Arm Cortex speeds are somewhat slower.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation produced a significant improvement.


 System ARM MHz Android Vax MIPS
 See MIPS /MHz

 Original ARM Version
 A1 Z3745 1866 4.4.2 1840 0.99
 T7 v7-A9 1200 4.1.2 1610 1.34
 T22 v8-A53 1300 5.0.2 1683 1.29
 T11 v7-A15 1700 4.2.2 3189 1.88
 T21 QU-800 2150 4.4.3 3854 1.79
 BS1 Emul Phen 3000 2.3.4 484 0.16
 BS2 Emul i7 3700 4.4.2 746 0.20
 ARM/Intel 32 Bit Version
 A1 Z3745 1866 4.4.2 2451 1.31
 T7 v7-A9 1200 4.1.2 1317 1.10
 T22 v8-A53 1300 5.0.2 1423 1.09
 T11 v7-A15 1700 4.2.2 2551 1.50
 T21 QU-800 2150 4.4.3 3319 1.54
 BS1 Emul Phen 3000 2.3.4 4464 1.49
 BS2 Emul i7 3700 4.4.2 8841 2.39
 ARM/Intel 64 Bit Version
 T22 v8-A53 1300 5.0.2 2569 1.98
 
To Start


Linpack Benchmark - LinpackDP2.apk, LinpackSP2.apk

The Linpack benchmark speed is measured in MFLOPS, officially for double precision floating point calculations. A version was produced using NEON functions, that only provides single precision operation. So, for comparison purposes, an available C code option, to define single precision data, was used to produce a new version and this has usually lead to a higher MFLOPS speed. Results from various hardware and software platforms can be found in Linpack Results.htm.

Performance of the Linpack benchmark is almost entirely dependent on the calculation x[i]=x[i]+c*y[i]. Later ARM processors include vfpv4 instructions that execute fused multiply-accumulate instructions, possibly doubling performance. Compilation of these seems to have appeared in compiler gcc 4.8. Tablet T11 has vfpv4 but T7 does not - See System Details. The result is that the T11 DP benchmark runs much faster on the recompiled code (same with T21). The Intel Native code compilation, running on A1, was more than twice as fast as the original, produced by gcc 4.4. Some of the gain is due to using the new compiler, with conversion to ARM instructions, and others due to native Intel code.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. 64 bit operation increased speed by almost 2 times with double precision calculations and 2.7 times at single precision.

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with SP speed of 1277 MFLOPS at 64 bits.

BlueStacks is particularly fast running with the native Intel version.

 
 System ARM MHz Android LinpackDP LinpackSP
 See MFLOPS MFLOPS

 Original ARM Version
 A1 Z3745 1866 4.4.2 168.16 296.63
 T7 v7-A9 1200 4.1.2 151.05 201.30
 T22 v8-A53 1300 5.0.2 156.70 184.09
 T11 v7-A15 1700 4.2.2 459.17 803.04 
 T21 QU-800 2150 4.4.3 389.52 751.95
 BS1 Emul Ph 3000 2.3.4 16.61 26.53
 BS2 Emul i7 3700 4.4.2 138.85 227.42 
 GCC 4.8 ARM Version
 A1 Z3745 1866 4.4.2 282.29
 ARM/Intel 32 Bit Version
 A1 Z3745 1866 4.4.2 362.63 408.87
 T7 v7-A9 1200 4.1.2 159.34 199.84
 T22 v8-A53 1300 5.0.2 172.28 180.64
 T11 v7-A15 1700 4.2.2 826.36 952.88
 T21 QU-800 2150 4.4.3 629.92 790.83 
 BS1 Emul Ph 3000 2.3.4 1808.57 1474.70
 BS2 Emul i7 3700 4.4.2 3390.95 1886.36
 ARM/Intel 64 Bit Version
 T22 v8-A53 1300 5.0.2 340.18 482.43
 P33 QU-810 2000 5.0.2 1277.76 
 
To Start


Livermore Loops Benchmark - LivermoreLoops2.apk

The Livermore Loops comprise 24 kernels of numerical application with speeds calculated in MFLOPS. A summary is also produced, with maximum, minimum and various mean values, geometric mean being the official average. As for other of these benchmarks, details and results are provided, in this case, in Livermore Loops Results.htm.

This time, the new compiler produces some slower results on Tablet T11, with the Atom, running native code, being faster on average, and 2.56 times faster than via that ARM conversion Houdini layer. T21 MFLOPS can also be different.

August 2015 - T22 included with 64 bit CPU and 64 bit Android 5.0. Here, 64 bit/32 bit geometric mean performance ratio is 1.5.


 System ARM MHz Android 
 See Max Average Geomean Harmean Min 

 Original ARM Version
 A1 Z3745 1866 4.4.2 535.8 201.9 172.4 146.7 48.8
 T7 v7-A9 1200 4.1.2 391.9 202.1 181.3 160.9 68.1
 T11 v7-A15 1700 4.2.2 1252.8 476.0 375.8 288.8 90.8 
 T21 QU-800 2150 4.4.3 1075.5 437.1 356.7 284.4 100.3
 BS2 Emul i7 3700 4.4.2 321.7 134.4 118.1 101.8 29.3
 ARM/Intel 32 Bit Version
 A1 Z3745 1866 4.4.2 1031.2 480.0 429.8 378.6 154.7
 T22 v8-A53 1300 5.0.2 393.4 188.3 158.3 124.6 27.1
 T7 v7-A9 1200 4.1.2 396.6 207.6 175.6 136.1 26.8
 T11 v7-A15 1700 4.2.2 1411.4 471.2 342.1 219.5 34.3 
 T21 QU-800 2150 4.4.3 1159.4 446.9 356.0 280.3 112.3
 BS2 Emul i7 3700 4.4.2 5422.6 2232.1 1784.4 1372.7 350.5
 ARM/Intel 64 Bit Version
 T22 v8-A53 1300 5.0.2 772.2 265.9 232.5 206.3 97.8 
 
To Start


MemSpeed Benchmark - MemSpeedi.apk

This benchmark measures data reading speeds in MegaBytes per second carrying out calculations on arrays of cache and RAM data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m], using double and single precision floating point and x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can be calculated by dividing double precision MB/second by 8 and 16, for the two tests, and single precision speeds by 4 and 8. Assembly listings for integer tests show that Millions of Instructions Per Second (MIPS) can be found by multiplying MB/second by 0.78 with 2 adds and 0.66 for the other test. Cache sizes are indicated by varying performance as memory usage changes. For more details and further results see MemSpeed in Android Benchmarks.htm.

The native ARM/Intel results, on Intel Atom based A1, averaged 44% faster via L1 cache data, 27% using L2 and 14% from RAM. Result on tablets T7. T11 and T21 showed some gains and some losses. The Intel native code is particularly demonstrated by results using the BlueStacks App Player, running on an Intel Core i7 based PC.

August 2015 - Results provided for 64 bit T22. The 64 bit compilation was nearly twice as fast as the 32 bit version with double precision floating point calculations, using cached data, and provided a 33% increase from RAM. Corresponding single precision ratios were 2.6 and 2.0 times and integer ratios of 2.2 and 1.5.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MemSpeed Benchmark 1.1 01-Feb-2015 10.06
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 2773 1745 2821 5993 3274 3094 L1
 32 3088 1690 2451 4849 2769 2896
 64 3066 1694 2245 3883 2434 2568 L2 
 128 3084 1695 2261 3886 2466 2524
 256 3158 1732 2285 3964 2264 2176
 512 2666 1721 2295 3959 2505 2561
 1024 2938 1659 2163 3567 2356 2443
 4096 2775 1653 2123 3055 2307 2395 RAM
 16384 2827 1659 2121 3208 2321 2411
 65536 2840 1661 2112 3248 2314 2406
 Total Elapsed Time 10.8 seconds
 
 #################### A1 ARM-Intel ######################
 ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 11.46
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 3287 1859 4560 9789 4688 7316
 32 3233 1856 3807 6633 3990 4030
 64 3304 1860 2965 4457 2996 3894
 128 3303 1855 3006 4463 3113 3992
 256 3306 1860 2978 4463 3093 3946
 512 3307 1862 2964 4377 3097 3958
 1024 3031 1778 2766 3993 2867 3472
 4096 2863 1776 2692 3129 2763 3046
 16384 2857 1776 2702 3063 2768 3050
 65536 2865 1765 2702 3176 2782 3087
 Total Elapsed Time 10.1 seconds
 
 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2000 MHz Cortex-A15, Android 4.2.2
 Measured 1700 MHz
 Android MemSpeed Benchmark 1.1 09-Aug-2013 17.04
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 7296 4159 3513 9375 5453 6211 L1
 32 7253 4540 3882 7364 4873 4839
 64 6902 4265 3878 7026 4373 4274 L2
 128 6735 4032 2480 4005 2797 3288
 256 5859 3775 2192 4527 3263 3676
 512 5795 3781 3568 6282 3819 3818
 1024 2609 1757 1754 2607 1805 1825
 4096 1614 1422 1471 1654 1342 1441 RAM
 16384 1624 1412 1474 1642 1336 1443
 65536 1617 1408 1479 1368 1321 1423
 Total Elapsed Time 10.7 seconds
 
 #################### T11 ARM-Intel ####################
 ARM/Intel MemSpeed Benchmark 1.1 23-Apr-2015 12.26
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 6540 4359 4580 10119 6292 6502
 32 8185 5132 4682 8729 4622 4465
 64 5770 3530 3473 5780 3447 3782
 128 5311 3386 3475 5225 3441 3451
 256 5667 3642 3678 5805 3643 3726
 512 5047 3318 3334 4869 3303 3337
 1024 2015 1469 1423 2050 1452 1386
 4096 1535 1322 1342 1598 1381 1385
 16384 1505 1379 1406 1584 1387 1384
 65536 1509 1306 1332 1585 1387 1382
 Total Elapsed Time 10.8 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Android MemSpeed Benchmark 1.1 02-Jun-2015 11.01
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 8922 4635 3566 12412 5648 3774 L1
 32 5116 3542 2773 7594 4827 3657 L2
 64 5174 3393 2684 5652 3757 3130
 128 5286 3387 2648 5443 3758 3194
 256 4937 3446 2889 7469 4624 3449
 512 4941 3459 2915 7452 4566 3724
 1024 4837 3449 2848 7065 4455 3722
 4096 2840 2606 2343 2581 2458 2567 RAM
 16384 2606 2423 2232 2395 2238 2338
 65536 2653 2453 2257 2457 2312 2420
 Total Elapsed Time 9.7 seconds
 Maximum SP MFLOPS 1159 Integer MIPS 2802

 #################### T21 ARM-Intel ####################
 ARM/Intel MemSpeed Benchmark 1.1 02-Jun-2015 11.27
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 8074 4831 2603 11252 5065 3892 L1
 32 5302 4138 3709 7252 4985 3693 L2
 64 4801 3510 2832 5739 3684 3015 
 128 4502 3783 3577 5991 3914 3547
 256 4907 3913 3934 6876 4280 4056
 512 4686 3883 3921 6236 4215 4060
 1024 4716 3808 3823 6131 4185 3942 
 4096 2691 2603 2679 2249 2634 2709 RAM
 16384 2227 2223 2420 1798 2191 2445
 65536 2099 2106 2306 1738 2040 2346
 Total Elapsed Time 9.9 seconds
 Maximum SP MFLOPS 1207 Integer MIPS 2898
 
 ###################### T22 32 Bit ######################
 ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.16
 Compiled for 32 bit ARM v7a
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 1940 971 1693 2470 1278 2084 L1
 32 1879 955 1676 2378 1255 1967
 64 1801 938 1615 2254 1218 1912 L2
 128 1706 941 1620 2279 1224 1872
 256 1818 935 1570 2291 1155 1875
 512 1633 884 1451 2008 1132 1704
 1024 1276 781 1181 1454 938 1324 RAM
 4096 1335 808 1260 1533 1010 1386
 16384 1342 813 1270 1487 1013 1419
 65536 1346 809 1274 1546 1031 1252
 Total Elapsed Time 11.7 seconds

###################### T22 64 Bit ######################
 ARM/Intel MemSpeed Benchmark 1.2 05-Aug-2015 17.29
 Compiled for 64 bit ARM v8a
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 4092 2198 3951 5293 3611 4408
 32 3753 2496 3630 4651 3300 3992
 64 3407 2388 3368 3715 3023 3677
 128 3496 2462 3521 4137 3139 3844
 256 3535 2481 3573 4199 3322 3911
 512 3054 2248 3126 3556 2548 3372
 1024 1714 1704 2029 2069 1854 2099
 4096 1832 1595 1841 1914 1780 1897
 16384 1844 1601 1850 1925 1798 1891
 65536 1859 1608 1837 1921 1795 1812
 Total Elapsed Time 10.2 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 1 GB DDR3 RAM 
 Measured 1200 MHz
 Android MemSpeed Benchmark 17-Oct-2012 20.19
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 1735 888 2456 2726 1364 2818 L1
 32 1448 760 1474 1700 1039 1648
 64 1318 719 1290 1468 952 1385 L2
 128 1279 715 1289 1443 944 1336
 256 1268 714 1279 1435 943 1313
 512 1158 691 1204 1321 892 1228
 1024 729 553 735 772 632 742
 4096 445 392 425 442 421 439 RAM
 16384 435 390 428 435 412 431
 65536 445 404 393 450 432 449
 Total Elapsed Time 12.2 seconds

 #################### T7 ARM-Intel #####################
 ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.24
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 1856 1019 2537 2913 1459 2544
 32 1416 832 1327 1508 920 1345
 64 1286 779 1198 1418 908 1296
 128 1282 781 1195 1424 912 1305
 256 1278 774 1190 1433 878 1298
 512 1197 752 1122 1340 862 1216
 1024 833 626 822 903 695 857
 4096 463 420 456 463 440 459
 16384 459 426 453 455 435 458
 65536 463 430 411 462 443 452
 Total Elapsed Time 11.5 seconds
 
 #################### BS2 Original ######################
 
 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
 Android MemSpeed Benchmark 1.1 25-Apr-2015 12.58
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 1523 1777 731 1406 1939 1163
 32 1306 1641 787 1641 1939 1023
 64 1524 1230 511 1422 1662 1143
 128 1524 1707 787 1641 1641 948
 256 1456 1670 853 1525 1708 1094
 512 1527 1642 853 1642 1779 948
 1024 1528 1646 853 1646 1713 1094
 4096 1535 1809 853 1809 1945 1194
 16384 1638 1638 819 1774 1872 1170
 65536 1404 1747 819 1747 1820 1156
 Total Elapsed Time 12.5 seconds

 #################### BS2 ARM-Intel #####################
 ARM/Intel MemSpeed Benchmark 1.1 25-Apr-2015 12.47
 Reading Speed in MBytes/Second
 Memory x[m]=x[m]+s*y[m] Int+ x[m]=x[m]+y[m]
 KBytes Dble Sngl Int Dble Sngl Int

 16 35555 9309 14065 30476 19393 19394
 32 30476 19394 14222 35555 18518 17066
 64 26666 16623 17778 30476 18286 16410
 128 26667 17778 17778 29092 18286 19051
 256 25098 16675 16327 27354 19395 18825
 512 25100 13063 12190 26666 19395 17793
 1024 24631 17589 16415 24623 16415 16415
 4096 24638 17783 16644 24638 17093 17783
 16384 14745 12639 11000 14000 13611 12834
 65536 14043 11359 12336 15490 10649 10649
 Total Elapsed Time 12.6 seconds
 
To Start


BusSpeed Benchmark - BusSpeedv7i.apk

This benchmark is designed to identify reading data in bursts over buses. The program starts by reading a word (4 bytes) with an address increment of 32 words (128 bytes) before reading another word. The increment is reduced by half on successive tests, until all data is read. On reading data from RAM, 64 Byte bursts are typically used. Then, measured reading speed reduces from a maximum, when all data is read, to a minimum on using 16 word increments (64 bytes). Potential maximum speed can be estimated by multiplying this minimum value by 16. With this burst rate, measured speed at 32 word and 16 word increments are likely to be the same. Cache sizes are indicated by varying speed as memory use changes. Note, with smallest L1 cache demands, measured speed can be low due to overheads when reading little data. For more details and further results see BusSpeed in Android Benchmarks.htm.

The native code ARM/Intel version provided no real performance improvement on tablet A1, with the Atom Z3745 CPU. In ARM mode, there was also little difference on Tablets T21, T11 and T7. The main reason for these similarities is the long sequence of identical C arithmetic statements is easy to convert for efficient processing. BlueStacks speed on the Intel CPU were again outstanding.

August 2015 - Results provided for 64 bit T22. Reading all data, 64/32 bit comparison ratios were up to 2.0 from L1 cache, 1.5 from L2 cache and 1.25 from RAM.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android BusSpeed Benchmark 1.1 v7 21-Dec-2014 16.06
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 4178 3473 6270 6713 6759 6869 L1
 32 1420 1529 2252 2686 3702 5108
 64 1385 1498 2276 2629 3657 5108 L2
 128 1394 1542 2278 2614 3640 5092
 256 1410 1576 2258 2607 3259 5110
 512 1417 1574 2274 2602 3700 5119
 1024 349 428 888 1431 2848 4306 RAM
 4096 215 265 593 1181 2289 3891
 16384 210 266 596 1181 2278 3897
 65536 220 272 600 1193 2346 3886
 Total Elapsed Time 5.1 seconds

 #################### A1 ARM-Intel ######################
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 4845 5705 6403 6926 7094 7167 L1
 32 1407 1716 2255 2646 3713 5094
 64 1395 1703 2257 2689 3754 4843 L2
 128 1283 1571 2108 2620 3671 5135
 256 1416 1753 2288 2679 3687 5178
 512 1439 1372 2251 2510 3679 5183
 1024 350 409 942 1696 2792 4403
 4096 213 253 564 1188 2173 3631 RAM
 16384 219 259 600 1189 2330 3920
 65536 218 259 599 1102 2323 3716
 Total Elapsed Time 5.1 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/sec
 Android BusSpeed Benchmark 1.1 v7 09-Aug-2013 17.07
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 3193 3451 4412 5272 5389 6191 L1
 32 1298 1558 1990 3478 4264 4420
 64 804 928 1209 2442 3263 3426 L2
 128 784 904 1175 2321 3148 3333
 256 780 908 1181 2336 3142 3327
 512 788 907 1165 2312 3120 3300
 1024 360 387 384 803 1348 1744
 4096 145 146 194 507 648 1378 RAM
 16384 141 136 190 507 638 1373
 65536 142 141 191 506 643 1371
 Total Elapsed Time 5.3 seconds

 #################### T11 ARM-Intel ####################
 ARM/Intel BusSpeed Benchmark 1.1 v7 23-Apr-2015 12.15
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 2085 3208 4055 4553 5272 5758
 32 1282 1811 2498 4182 4867 5163
 64 600 864 1309 2974 3504 3841
 128 614 892 1310 3027 3500 3826
 256 614 892 1337 3050 3509 3828
 512 618 888 1319 3042 3382 3811
 1024 425 479 444 1244 1803 2291
 4096 146 146 191 590 1050 1751
 16384 141 139 186 585 1039 1725
 65536 139 139 187 585 1039 1721
 Total Elapsed Time 5.3 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Android BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 1382 1350 3122 4300 4938 5283 L1
 32 1106 1118 2026 2637 3786 5210 L2
 64 1064 1118 2058 2679 3820 5251
 128 1123 1170 2081 2688 3669 4166
 256 1121 1196 2109 2623 3873 3429
 512 940 1127 2050 2684 3777 4795
 1024 951 1124 2038 2655 3759 4950
 4096 239 375 472 806 1486 2679 RAM
 16384 239 370 464 806 1476 2656
 65536 239 368 495 854 1537 2792
 Total Elapsed Time 5.0 seconds

 #################### T21 ARM-Intel ####################
 ARM/Intel BusSpeed Benchmark 1.1 v7 04-Jun-2015 17.00
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 1328 1442 2797 4291 4699 5685 L1
 32 1165 1100 1933 2848 3603 5844 L2
 64 1147 1055 2007 2846 3586 5890
 128 1181 1136 2008 2711 3600 5878
 256 1185 1126 2018 2716 3568 5873
 512 1022 1026 1805 2525 3378 5611
 1024 796 843 1584 2202 3088 5053
 4096 199 294 431 657 1166 2409 RAM
 16384 200 299 430 659 1167 2408
 65536 205 301 436 668 1173 2380
 Total Elapsed Time 5.2 seconds

 ###################### T22 32 Bit ######################
 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 10.57
 Compiled for 32 bit ARM v7a
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 874 932 1814 2302 2355 2263 L1
 32 758 803 1309 1820 2323 2386
 64 653 671 1203 1741 2206 2332 L2
 128 603 620 1107 1693 2222 2351
 256 574 589 1075 1711 2211 2327
 512 332 372 681 1075 1863 2120
 1024 137 193 371 578 1322 2129 RAM
 4096 172 179 351 567 1151 2126
 16384 172 178 351 504 1117 2136
 65536 172 177 349 478 882 2129
 Total Elapsed Time 5.3 seconds

 ###################### T22 64 Bit ######################
 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel BusSpeed Benchmark 1.2 06-Aug-2015 11.02
 Compiled for 64 bit ARM v8a
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 3188 3635 3937 4327 4372 4462
 32 1478 1607 2246 3382 3853 4144
 64 600 622 1163 2011 2972 3585
 128 558 575 1056 1889 2892 3525
 256 538 550 1028 1826 2837 3260
 512 371 425 813 1490 2403 3202
 1024 136 196 382 728 1423 2750
 4096 170 177 346 669 1340 2652
 16384 169 174 341 678 1352 2663
 65536 168 174 341 676 1347 2611
 Total Elapsed Time 5.2 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1200 MHz, Android 4.1.2, 1 GB DDR3 RAM 
 Android BusSpeed Benchmark 19-Oct-2012 17.29
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 2723 2420 3044 3364 3499 3500 L1
 32 1054 1087 1061 1382 1565 2145
 64 436 433 419 652 751 1160 L2
 128 345 337 337 542 633 943
 256 329 309 322 522 614 961
 512 339 299 311 506 574 937
 1024 170 168 180 269 349 629
 4096 59 55 84 127 176 338 RAM
 16384 56 56 83 125 173 335
 65536 56 56 82 125 174 334
 Total Elapsed Time 5.7 seconds
 
 #################### T7 ARM-Intel #####################
 ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.30
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 2940 3344 3625 3866 3862 3893
 32 698 707 682 1071 1208 1826
 64 448 477 465 726 851 1357
 128 367 355 292 542 657 1070
 256 334 344 341 546 651 1059
 512 326 336 336 531 629 1025
 1024 169 175 197 309 411 749
 4096 58 58 83 131 191 395
 16384 56 57 83 129 189 392
 65536 56 48 82 129 187 388
 Total Elapsed Time 5.6 seconds

 #################### BS2 Original ######################
 BS 2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8 
 Android BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.57
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 1428 1280 1280 1422 1333 1489
 32 1428 1280 1280 1365 1706 1602
 64 1066 1481 1600 1463 1463 1707
 128 1666 1365 1489 1463 1463 1833
 256 1429 1706 1293 1425 1466 1823
 512 1333 1463 1603 1425 1468 1565
 1024 1280 1463 1710 1468 1565 1730
 4096 1282 1367 1475 1730 1310 1617
 16384 412 943 958 1258 1398 1677
 65536 449 958 1078 1304 1677 1677
 Total Elapsed Time 6.8 seconds

 #################### BS2 ARM-Intel #####################
 ARM/Intel BusSpeed Benchmark 1.1 v7 25-Apr-2015 12.49
 Reading Speed 4 Byte Words in MBytes/Second
 Memory Inc32 Inc16 Inc8 Inc4 Inc2 Read
 KBytes Words Words Words Words Words All

 16 13333 12800 22222 13675 18285 14224
 32 10666 10666 12190 21333 21367 21334
 64 6666 6666 10666 13333 21333 21337
 128 6826 6400 10240 17067 21335 18290
 256 4266 5120 8533 13654 18290 20483
 512 2667 2667 5335 9103 16386 20515
 1024 2560 2560 5692 9105 15608 22806
 4096 2673 2752 5470 9175 17126 21880
 16384 741 943 2070 4404 8808 14680
 65536 542 838 1572 3595 6710 11930
 Total Elapsed Time 6.5 seconds
 
To Start


RandMem Benchmark - RandMemi.apk

RandMem benchmark carries out four tests at increasing data sizes to produce data transfer speeds in MBytes Per Second from caches and memory. Serial and random address selections are employed, using the same program structure, with read and read/write tests using 32 bit integers. The main purpose is to demonstrate how much slower performance can be through using random access. Here, speed can be considerably influenced by reading and writing in bursts, where much of the data is not used, and by the size of preceding caches. For more details and further results see RandMem in Android Benchmarks.htm.

On A1 Atom based tablet, the native code ARM/Intel version results showed gains of around 25% on all reading tests, but no difference with writing and reading. The same benchmark, running on Tablets T11 and T21, showed some improvement, using cache based data, but a variability in comparative performance on T7.

August 2015 - Results provided for 64 bit T22 showing 32 bit and 64 bit versions were not that different overall, each one slightly faster on some tests.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android RandMem Benchmark 1.1 01-Feb-2015 10.12
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 3434 5064 3462 5113 L1
 32 2833 4042 2652 3645
 64 2837 4058 2068 2561 L2
 128 2822 4041 1809 2205
 256 2828 4040 1435 1755
 512 2816 3997 1245 1456
 1024 2578 3256 379 445
 4096 2412 1946 209 268 RAM
 16384 2485 2039 179 217
 65536 2457 2041 140 170
 Total Elapsed Time 11.8 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 17.27
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 4291 5626 4584 5630
 32 3217 3792 3492 3783
 64 3677 4253 2629 2644
 128 3666 4241 2299 2289
 256 3688 3930 1829 1850
 512 3682 4189 1522 1592
 1024 3285 3558 562 667
 4096 2999 2007 272 274
 16384 3019 2065 210 220
 65536 2989 2068 141 186
 Total Elapsed Time 8.8 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Android RandMem Benchmark 1.1 13-Aug-2013 17.29
 MBytes/Second Transferring 4 Byte Words 
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 2881 2478 3388 3650 L1
 32 4301 2968 3197 3249
 64 3669 2511 2201 2249 L2
 128 3566 2560 1571 1566
 256 3557 2461 1334 1256
 512 3524 2547 1136 1098
 1024 1933 1144 534 513
 4096 1993 1064 184 173 RAM
 16384 1970 1086 141 144
 65536 1973 1117 106 104
 Total Elapsed Time 9.1 seconds

 #################### T11 ARM-Intel ####################
 ARM/Intel RandMem Benchmark 1.1 23-Apr-2015 20.42
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 3642 3102 5464 4114
 32 5462 3409 4096 3737
 64 4800 2785 2028 2064
 128 4308 2575 1572 1589
 256 4381 2574 1332 1260
 512 4311 2544 1215 1097
 1024 2033 1156 513 471
 4096 1891 1042 213 178
 16384 2028 1032 154 139
 65536 2033 1055 109 106
 Total Elapsed Time 9.2 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Android RandMem Benchmark 1.1 10-Jun-2015 12.43
 MBytes/Second Transferring 4 Byte Words 
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 4407 4704 3995 4900
 32 2611 3071 2207 2703
 64 2496 2797 1821 2139
 128 2080 3173 1668 1758
 256 2425 3183 1439 1520
 512 2359 3116 1193 1355
 1024 2366 3117 368 382
 4096 2293 2280 201 209
 16384 2293 2237 170 175
 65536 2299 2261 146 150
 Total Elapsed Time 8.5 seconds

#################### T21 ARM-Intel ####################
 ARM/Intel RandMem Benchmark 1.1 10-Jun-2015 12.45
 MBytes/Second Transferring 4 Byte Words 
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 5005 4626 4067 4863
 32 3253 2994 2246 2622
 64 3223 2855 1986 2072
 128 2861 3128 1912 1776
 256 3246 3174 1666 1523
 512 3195 3111 1469 1372
 1024 3190 3079 369 383
 4096 3027 2381 212 213
 16384 3065 2300 174 177
 65536 3080 2281 150 150
 Total Elapsed Time 8.6 seconds

 ###################### T22 32 Bit ######################
 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 
ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.29
 Compiled for 32 bit ARM v7a
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 2807 3606 2753 3595 L1
 32 2719 3433 1429 1930
 64 2615 3266 914 1166 L2
 128 2592 3243 705 828
 256 2570 3223 637 720
 512 2367 2684 237 347
 1024 2137 1855 120 163 RAM
 4096 1918 1658 83 97
 16384 2152 1665 74 85
 65536 2104 1652 72 64
 Total Elapsed Time 11.6 seconds

###################### T22 64 Bit ######################
 T22, ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel RandMem Benchmark 1.2 06-Aug-2015 12.32
 Compiled for 64 bit ARM v8a
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 3865 3033 3798 3027
 32 3622 2760 3105 2734
 64 3094 2803 1011 1077
 128 3074 2740 776 801
 256 3050 2771 718 693
 512 2420 2463 270 371
 1024 1322 1853 131 164
 4096 1754 1598 87 100
 16384 1791 1586 75 91
 65536 1856 1609 57 68
 Total Elapsed Time 14.6 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Android RandMem Benchmark 20-Oct-2012 11.14
 MBytes/Second Transferring 4 Byte Words 
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 2788 3041 2795 3041 L1
 32 2769 3011 2767 3020
 64 1027 1038 839 911 L2
 128 916 918 616 649
 256 904 905 514 538
 512 899 907 475 499
 1024 712 699 345 354
 4096 323 284 92 88 RAM
 16384 316 282 73 70
 65536 314 281 65 62
 Total Elapsed Time 10.9 seconds

 #################### T7 ARM-Intel #####################
 ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.33
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 2521 3175 2490 3038
 32 1427 1451 1218 1446
 64 1133 1052 853 907
 128 1039 871 646 650
 256 1028 909 543 518
 512 1025 895 499 502
 1024 700 489 242 236
 4096 487 282 90 88
 16384 483 281 71 70
 65536 478 274 63 62
 Total Elapsed Time 11.3 seconds

 #################### BS2 Original ######################
 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
 Android RandMem Benchmark 1.1 25-Apr-2015 12.59
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 4069 5008 4069 2174
 32 4439 5426 4069 1953
 64 3974 5682 3552 1860
 128 3721 5209 3758 1717
 256 4342 5210 3157 1204
 512 4167 5342 2845 1141
 1024 4350 5208 2606 1000
 4096 3475 5709 1938 867
 16384 4343 5120 747 400
 65536 3657 5818 533 256
 Total Elapsed Time 14.2 seconds

 #################### BS2 ARM-Intel #####################
 ARM/Intel RandMem Benchmark 1.1 25-Apr-2015 12.50
 BlueStacks on 3.9 GHz Core i7
 MBytes/Second Transferring 4 Byte Words
 Memory Serial....... Random.......
 KBytes Read Rd/Wrt Read Rd/Wrt

 16 23252 24414 19148 29593
 32 25432 27127 25432 24038
 64 21552 23674 14533 9301
 128 21702 20834 12020 8140
 256 22727 19934 9470 6513
 512 22321 17362 5953 5686
 1024 20840 18945 5691 4815
 4096 21053 16693 2291 2291
 16384 12308 10057 1067 1018
 65536 10667 10338 753 711
 Total Elapsed Time 8.3 seconds
 
To Start


MP-MFLOPS Benchmarks - MP-MFLOPSi and MP-MFLOPS2i

The benchmarks are recompilations of those in www.roylongbottom.org.uk/Android MultiThreading Benchmarks.htm. The arithmetic operations executed are of the form x[i] = (x[i] + a) * b - (x[i] + c) * d + (x[i] + e) * f with 2 and 32 operations per input data word, using 1, 2, 4 and 8 threads. Data sizes are limited to three to use L1 cache, L2 cache and RAM at 12.8, 128 and 12800 KB (3200, 32000 and 3200000 single precision floating point words). Each thread uses the same calculations but accessing different segments of the data. The program checks for consistent numeric results, primarily to show that all calculations are carried out and can be run. The numeric results start with values of 1.0, with subsequent calculations reducing the values, the amount depending on the number of calculations.

An example of results for MP-MFLOPSi, from the log file, is provided below. showing identical numeric results, independent of the number of threads used (as it should be). This original version became too fast for later technology, producing inconsistent MFLOPS performance ratios. Versions with longer running versions were produced, to avoid this problem, in this case MP-MFLOPS2i with 50 times more calculations, producing the expected reduction in result values. The numeric results from ARM processors are slightly different, due to rounding effects (see Short and Long below).

Examination of disassembled code, using default compile parameters, showed that Intel SIMD and ARM NEON instructions were not being produced. These could execute such as four linked multiply and add instructions simultaneously, providing MFLOPS speeds of up to eight times CPU MHz, per core. The type of instructions used are shown below, where Intel varieties used only one word out of four in SSE registers (Single Instruction Single Data - SISD), and ARM code employed single word scalar registers. The latter were vector type, using three registers, including such as floating-point multiply-accumulate single precision (fmacs).

The released versions were recompiled, using the compile options shown below, but made no difference to the type of code used. Intel compilations used more registers that produced faster speeds at 32 operations per word. ARM code was virtually identical, producing similar performance.


 Intel CPU Short - 5000 Repeat Passes
 ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24

 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 642 717 658 1053 1026 987
 2T 1052 1366 1016 2018 2108 2063
 4T 1752 2483 956 3817 3676 3894
 8T 1436 2217 992 3213 3428 3289
 Results x 100000, 0 indicates ERRORS
 1T 86735 98519 99984 79894 97641 99975
 2T 86735 98519 99984 79894 97641 99975
 4T 86735 98519 99984 79894 97641 99975
 8T 86735 98519 99984 79894 97641 99975
 Total Elapsed Time 3.6 seconds
 
 Intel CPU Long - 100000 Repeat Passes

 1T-8T 40392 76406 99700 35296 66012 99521
 ######################################################

 ARM CPU Short

 1T-8T 86735 98519 99984 79897 97638 99975

 ARM CPU Long

 1T-8T 40392 76406 99700 35218 66014 99520
 ######################################################

 Android.mk LOCAL_CFLAGS

 ifeq ($(TARGET_ARCH_ABI),x86)
 LOCAL_CFLAGS += -ffast-math -mtune=atom -mssse3 -mfpmath=sse
 endif
 ifeq ($(TARGET_ARCH_ABI),x86_64) 
 LOCAL_CFLAGS += -ffast-math -mtune=slm -msse4.2
 endif 
 ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
 LOCAL_ARM_NEON := true
 LOCAL_CFLAGS += -mfpu=neon
 endif
 ifeq ($(TARGET_ARCH_ABI),arm64-v8a)
 LOCAL_CFLAGS += -DHAVE_NEON64=1
 endif
 ######################################################
 Intel SSE SISD Instructions - not SIMD 
 mulss 36(%esp), %xmm2 addss %xmm1, %xmm2
 ARM Vector Instructions - not NEON 
 fmuls s15, s15, s10 fmacs s15, s14, s23
 
To Start


MP-MFLOPS Benchmark Results

Below are MFLOPS results, mainly for the longer running versions, including those from the original ARM compilations. The first ones are for tablet A1, with the quad core Intel Atom CPU, where results for the the shorter running version are also provided, showing some slower speeds. In this case, performance from the native Intel code was up to nearly twice as fast as the ARM converted test run. In both cases, with 2 operations per word, maximum MP gains were on using L2 cache based data, with RAM speed limitations, but requiring two threads for maximum speed. With 32 operations per word, the quad cores provided performance gains of nearly four times.

Tablet T11 had some slightly slower results on the ARM/Intel variety, with tablet T7 providing little variation. Except for RAM based data, and 2 operations per word, appropriate performance gains were produced in line with the number of cores.

T21, with the Qualcomm Snapdragon 800, produced similar speeds using the old and ARM/Intel versions. Calculation speeds, with 1 and 2 threads, could be slower than T11, Cortex-A15, but RAM speed was much faster. The opposite applied, compared with A1 Atom, using native code.

August 2015 - Results provided for 64 bit T22 showing that, at 32 operations per word, it was just over twice as fast at 64 bits, then up to 3.7 times, at 2 operations per word, with cache based data. The reason is that 64 bit vector SIMD instructions were produced, instead of scalars.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MP-MFLOPS2 Benchmark V2.1 04-Feb-2015 11.03
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 502 501 476 575 575 573
 2T 1012 975 921 1133 1140 1115
 4T 1571 1627 979 2238 2255 2258
 8T 1550 1890 1007 2235 2239 2217
 Total Elapsed Time 117.4 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel MP-MFLOPS v7 Benchmark V1.1 28-Apr-2015 17.24
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 642 717 658 1053 1026 987
 2T 1052 1366 1016 2018 2108 2063
 4T 1752 2483 956 3817 3676 3894
 8T 1436 2217 992 3213 3428 3289

 V7 Short Version Total Elapsed Time 3.6 seconds

 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.24
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 695 696 661 1061 1061 1055
 2T 1335 1382 1058 2088 2086 2102
 4T 1832 2635 979 3993 4125 4145
 8T 2026 2557 1007 3842 4044 4110
 Total Elapsed Time 65.8 seconds

 -- Single Thread MFLOPS No Extra Compile Options --

 704 713 675 773 779 774

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Dual Core CPU Measured GHz = 1.7
 Android MP-MFLOPS2 Benchmark V2.1 29-Apr-2015 10.22
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
1T 845 817 544 1546 1539 1512
2T 1593 1668 648 3140 3067 2977
4T 1974 1775 645 2963 3093 2845
8T 1935 2059 652 3108 3147 2985
 Total Elapsed Time 58.5 seconds

 #################### T11 ARM-Intel ####################
 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 20.30
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 695 756 536 1537 1501 1476
 2T 1319 1527 645 3151 3077 3000
 4T 1604 1567 657 3035 3095 2997
 8T 1604 1639 658 3108 3125 2996
 
 Total Elapsed Time 59.1 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Quad Cote 2150 MHz Measured
Android MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 15.35
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 718 781 590 1214 1220 1228
 2T 1572 1583 1118 2406 2436 2442
 4T 2338 2959 1836 4867 4911 4859
 8T 3148 3266 1866 4870 4916 4888
 Total Elapsed Time 56.4 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel MP-MFLOPS2 Benchmark V2.1 05-Jul-2015 16.50
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 822 768 636 1232 1228 1231
 2T 1662 1637 1184 2460 2463 2446
 4T 2509 3216 1659 4519 4762 4900
 8T 2965 3193 1881 4847 4925 4880

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.17
 Compiled for 32 bit ARM v7a
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 190 190 184 670 672 664
 2T 377 378 370 1343 1345 1329
 4T 707 755 725 2657 2669 2621
 8T 722 736 714 2640 2672 2631
 Total Elapsed Time 113.0 seconds

###################### T22 64 Bit ######################
 ARM/Intel MP-MFLOPS2 Benchmark V2.2 09-Aug-2015 21.24
 Compiled for 64 bit ARM v8a
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 705 701 636 1398 1394 1362
 2T 1376 1395 942 2794 2797 2757
 4T 2063 2602 962 5491 5546 5336
 8T 2474 2611 957 5367 5500 5417
 Total Elapsed Time 51.6 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Quad Core CPU Measured MGz = 1200
 Android MP-MFLOPS2 Benchmark V2.1 05-Feb-2015 11.37
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 182 156 114 598 578 572
 2T 365 321 194 1194 1163 1141
 4T 716 655 233 2367 2316 2240
 8T 717 682 233 2347 2371 2246
 Total Elapsed Time 135.5 seconds

 #################### T7 ARM-Intel #####################
 ARM/Intel MP-MFLOPS2 Benchmark V2.1 28-Apr-2015 17.44
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 188 156 116 598 578 574
 2T 365 319 197 1195 1161 1145
 4T 682 709 237 2372 2345 2249
 8T 678 731 237 2361 2381 2254
 Total Elapsed Time 135.0 seconds
 
To Start


MP-Whetstone Benchmark - MP-WHETSi

For more information on Whetstone Benchmark see stand alone version, above. The multithreading version runs multiple copies of the same code, with separate variables. In this case, performance of each of the eight test functions and overall MWIPS ratings is invariably (nearly) proportional to the number of CPU cores available. The driving program checks that calculations on every thread produce consistent numeric results.

The gcc 4.8 based ARM/Intel version, running on the Intel Atom tablet, is rated at twice the speed of the original, due to the use of native code. The fixed point results indicate overoptimisation, but the test uses little of the overall time, this being mainly dependent on the Cos, Exp and third MFLOPS tests.

The new native ARM version, running on tablets T11 and T7, produces a much slower overall MWIPS rating, mainly due to the Exp tests, but also influence by other slower results (some same as above). T21 indicates slower floating point calculations.

August 2015 - Results provided for 64 bit T22 showing that, at 64 bits, the Fixpt test was clearly nearly optimised out, but this makes little difference to the overall MWIPS rating, at 2.25 times faster than the 32 bit benchmark.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MP-Whetstone Benchmark V1.1 04-Feb-2015 11.39
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 953.7 363.0 382.4 267.8 21.0 13.2 413.1 1842.4 392.3
 2T 1921.2 726.0 663.5 541.4 42.6 27.0 816.1 3662.6 793.3
 4T 3820.6 1419.2 1514.6 1081.5 84.1 54.0 1543.8 6292.4 1588.5
 8T 4003.8 1912.9 1872.4 1114.1 86.5 56.4 2053.1 8292.6 1599.7
 Overall Seconds 4.88 1T, 4.87 2T, 4.96 4T, 10.05 8T

 #################### A1 ARM-Intel ######################
 ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 17.35
 
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1916.9 691.4 691.3 497.2 35.3 27.6 10209.8 2787.3 1351.8
 2T 3800.3 1377.6 1381.2 980.0 70.1 54.7 20248.0 5252.8 2748.7
 4T 7604.9 2713.2 2711.8 1977.1 140.2 110.0 33906.3 9526.5 5550.8
 8T 7798.1 3141.5 3627.2 2064.2 141.2 110.2 59590.6 12743.7 5711.5
 Overall Seconds 4.94 1T, 5.00 2T, 5.06 4T, 10.11 8T

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Android MP-Whetstone Benchmark V1.1 06-Sep-2013 12.49
 
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1308.2 345.9 379.0 294.1 30.8 17.2 1351.4 1265.7 843.1
 2T 2886.6 782.1 782.6 614.0 80.1 34.3 2775.2 2463.7 1667.5
 4T 3086.0 998.6 788.1 610.6 79.2 44.5 3472.0 2526.4 2191.4
 8T 2930.0 788.2 843.5 616.5 80.5 35.0 2846.0 2799.1 1686.2
 Overall Seconds 3.54 1T, 3.30 2T, 6.62 4T, 13.16 8T

 #################### T11 ARM-Intel ####################
 ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.23
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 837.2 340.1 341.7 191.2 39.1 6.2 1521.1 2532.8 629.3
 2T 1676.2 596.2 683.2 387.3 77.8 12.4 3056.9 5055.1 1263.6
 4T 1697.7 687.5 869.4 394.5 78.1 12.4 2980.7 6518.4 1258.8
 8T 1685.2 685.9 691.0 389.7 78.3 12.4 3086.3 5113.7 1262.0
 Overall Seconds 4.06 1T, 4.07 2T, 8.12 4T, 16.19 8T
 
 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Android MP-Whetstone Benchmark V1.1 06-Jul-2015 10.42
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1877.1 645.2 642.6 524.1 44.0 22.3 1364.7 1572.1 898.9
 2T 3668.6 1220.2 1262.4 1021.9 85.9 43.8 2663.5 3078.4 1753.4
 4T 7426.9 2375.5 2474.7 2097.7 175.7 88.2 5052.6 6240.4 3555.0
 8T 7706.6 2692.2 2746.2 2186.9 180.1 90.3 5822.5 6902.7 3681.3
 Overall Seconds 4.44 1T, 4.62 2T, 4.64 4T, 9.00 8T
 Total Elapsed Time 24.1 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel MP-Whetstone Benchmark V1.1 22-Jul-2015 12.02
 
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1598.0 512.1 508.7 311.7 43.6 22.1 1142.9 2123.3 598.4
 2T 3161.2 960.0 996.7 614.2 86.7 43.8 2258.9 3820.9 1194.7
 4T 6348.0 1593.5 2019.5 1231.5 174.2 88.5 4471.1 8139.4 2398.3
 8T 6419.6 2058.2 2077.5 1252.6 175.0 88.7 4520.9 8875.0 2409.0
 Overall Seconds 4.88 1T, 5.00 2T, 5.05 4T, 9.92 8T
 Total Elapsed Time 29.2 seconds

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.30
 Compiled for 32 bit ARM v7a
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 676.4 275.9 281.9 147.9 35.4 5.3 600.3 901.0 285.5
 2T 1362.5 533.8 561.7 298.0 70.9 10.8 1203.1 1838.9 574.0
 4T 2698.6 903.9 1071.7 594.4 141.2 21.5 2346.1 3305.5 1138.5
 8T 2830.1 1463.2 1393.0 614.2 152.5 21.9 3243.9 4418.3 1171.4
 Overall Seconds 4.95 1T, 4.94 2T, 5.11 4T, 10.09 8T

###################### T22 64 Bit ######################
 ARM/Intel MP-Whetstone Benchmark V1.2 10-Aug-2015 11.34
 Compiled for 64 bit ARM v8a
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1524.8 328.6 348.8 297.6 37.3 19.9 1462579 1867.2 1238.0
 2T 3062.5 688.8 697.9 596.0 75.5 39.8 2097113 3726.7 2481.3
 4T 6085.4 1214.9 1360.5 1185.4 150.5 79.4 2449153 7055.0 4951.8
 8T 6222.4 1495.2 1545.6 1204.2 152.2 80.6 3869846 9218.8 5154.1
 Overall Seconds 4.92 1T, 4.90 2T, 5.05 4T, 9.97 8T

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Android MP-Whetstone Benchmark V1.0 17-Oct-2012 13.49
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 1033.7 247.4 235.4 266.0 25.3 15.0 448.4 630.9 513.5
 2T 2058.1 456.3 473.0 532.4 50.0 30.1 898.1 1198.4 1026.6
 4T 4122.8 831.9 944.7 1064.6 100.7 60.1 1797.0 2392.2 2053.4
 8T 4163.2 1016.0 948.2 1069.5 101.8 60.9 1808.0 2414.2 2051.5
 Overall Seconds 5.28 1T, 5.34 2T, 5.42 4T, 10.81 8T

 #################### T7 ARM-Intel #####################
 ARM/Intel MP-Whetstone Benchmark V1.1 30-Apr-2015 21.32
 Using 1, 2, 4 and 8 Threads
 MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal
 1 2 3 MOPS MOPS MOPS MOPS MOPS

 1T 602.2 242.3 242.3 140.2 27.2 4.9 482.8 1425.2 239.1
 2T 1208.7 481.2 484.2 280.8 55.0 9.9 970.0 2869.6 478.7
 4T 2398.7 805.4 966.7 562.5 109.5 19.5 1938.2 5722.5 957.1
 8T 2429.1 974.6 1076.2 562.4 110.9 19.7 1981.5 5816.1 963.6
 Overall Seconds 4.94 1T, 4.93 2T, 5.08 4T, 9.93 8T
 
To Start


MP Dhrystone Benchmark - MP-Dhryi.apk

For further details see Dhrystone Benchmark above and the following, including further results Android MultiThreading Benchmark Apps. This multithreading benchmark runs using 1, 2, 4 and 8 threads, executing multiple copies of the same program. An initial calibration, using a single thread, determines the number of passes needed for an overall execution time of 1 second. Then all threads are run using the same pass count, running time being extended when there are more threads than CPUs. The same calculations are carried out on each thread. Separate data arrays are used for each thread but some variables can be used by all threads. The latter is probably responsible for failure to increase throughput, using multiple threads.

The new ARM/Intel version demonstarted similar speeds on the systems tested. Unlike other systems, the Intel Atom based tablet produced slower performance using multiple threads. Tests on a PC, via BlueStacks emulator, appeared to demonstrate that native Intel instructions were being used.

T21, with the Qualcomm Snapdragon 800, sometimes crashed running this benchmark and apparently every time, trying the ARM-Intel version. When running, the eigth thread performance is also highly suspect.

August 2015 - Results provided for 64 bit T22 showing that the 64 bit version was much faster than via the 32 bit variety.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.00

 Threads 1 2 4 8
 Seconds 0.96 3.27 6.83 13.79
 Dhrystones per Second 4147126 2449335 2343954 2320745
 VAX MIPS rating 2360 1394 1334 1321

 #################### A1 ARM-Intel ######################
 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.02

 Threads 1 2 4 8
 Seconds 0.96 3.44 6.88 13.80
 Dhrystones per Second 4154551 2323340 2324139 2318280
 VAX MIPS rating 2365 1322 1323 1319

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Android MP-Dhrystone 2 Benchmark V1.1 10-Aug-2013 09.55

 Threads 1 2 4 8
 Seconds 0.50 0.53 1.05 2.18
 Dhrystones per Second 3990211 7522450 7600539 7328598
 VAX MIPS rating 2271 4281 4326 4171
 
 #################### T11 ARM-Intel ####################
 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.22

 Threads 1 2 4 8
 Seconds 0.99 1.12 2.33 4.45
 Dhrystones per Second 4031981 7127449 6856521 7196710
 VAX MIPS rating 2295 4057 3902 4096

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Android MP-Dhrystone 2 Benchmark V1.1 06-Jul-2015 11.22

 Threads 1 2 4 8
 Seconds 0.64 0.83 0.94 1.23
 Dhrystones per Second 5007132 7722435 13592474 20769050
 VAX MIPS rating 2850 4395 7736 11821
 Total Elapsed Time 4.4 seconds

 #################### T21 ARM-Intel #################### 
 Failed to run


 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.32
 Compiled for 32 bit ARM v7a

 Threads 1 2 4 8
 Seconds 0.64 0.71 0.90 1.70
 Dhrystones per Second 2481286 4495793 7094180 7540038
 VAX MIPS rating 1412 2559 4038 4291

###################### T22 64 Bit ######################
 ARM/Intel MP-Dhrystone 2 Benchmark V1.2 10-Aug-2015 11.36
 Compiled for 64 bit ARM v8a

 Threads 1 2 4 8
 Seconds 0.89 1.06 1.64 3.24
 Dhrystones per Second 4476736 7574470 9768350 9861922
 VAX MIPS rating 2548 4311 5560 5613
 
 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Android MP-Dhrystone 2 Benchmark V1.0 17-Oct-2012 13.59

 Threads 1 2 4 8
 Seconds 0.72 0.83 1.19 2.55
 Dhrystones per Second 2782404 4829150 6740332 6271011
 VAX MIPS rating 1584 2749 3836 3569

 #################### T7 ARM-Intel #####################

 ARM/Intel MP-Dhrystone 2 Benchmark V1.1 04-May-2015 17.18
 Threads 1 2 4 8
 Seconds 0.78 0.95 1.27 2.44
 Dhrystones per Second 2572642 4214238 6280420 6565767
 VAX MIPS rating 1464 2399 3575 3737

 ################ BlueStacks Emulator ##################
 PC with 3 GHz Phenom x4, windows 7

 VAX MIPS Original 474 465 453 449
 VAX MIPS ARM/Intel 4844 4670 4623 4724
 
To Start


MP-BusSpeed Benchmark - MP-BusSpdi.apk

This is a multithreading version of BusSpeed Benchmark above. Here, single thread performance of A1 Atom tablet was similar to that obtained unthreaded, with the ARM/Intel version again providing no improvement. Except for calculating bus speeds, the last column is the only one of real interest, where four cores produced gains of up to 3.7 times, using caches, and 1.9 times via RAM. The latter provided even better relative performance compared to ARM based systems. ARM/Intel version results are not shown for tablets T11 and T7, as they were both essentially the same as those obtained using the original MP benchmark. For further details and more results see Android MultiThreading Benchmark Apps. Some ARM/Intel results for T21 are slower than the original, but this might be due to the short running time.

Results from the PC based BlueStacks emulator are also shown below, to confirm that native Intel instructions were being used in the revised benchmark.

Estimated maximum data transfer speeds, based on burst reading results (like 16 x 1018 for T21). can exceed the specification. This is caused be shared data in the L3 cache, and the way that the program is run.

MP-BusSpd2i.apk is a revised version for Android. Running time is longer and, rather than all threads reading data from the beginning, starting addresses are staggered. This can result in slower speed as there of fewer calculations in the inner loop, but increased speed, due to cached shared data, appears to no longer be applicable and burst results can be used to estimate maximum RAM throughput (as shown).

August 2015 - Results included for T22 with 64 bit CPU and 64 bit Android 5.0. Just considering the Read All data, A53 64/32 bit L1 cache, L2 cache and RAM performance ratios averaged 2.2, 1.8 and 1.0.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 13.02
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 3990 4458 6123 6512 6438 6729
 2T 3894 5699 8948 10299 11800 12555
 4T 5046 7109 11952 14750 15533 23304
 8T 4533 7464 13097 16970 21674 22225
 122.9 1T 1304 1613 2291 2661 3667 5063
 2T 2568 3145 4529 5365 7440 10147
 4T 4117 4801 7963 7495 8239 18911
 8T 3130 5016 7355 8543 11648 15845
 12288 1T 190 265 601 1203 2316 3832
 2T 244 448 995 1771 3599 6575
 4T 427 584 860 1741 3439 7449
 8T 395 510 855 1613 3547 6776
 Total Elapsed Time 13.5 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.28
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 5925 6494 6778 6979 7047 7026
 2T 3966 7029 9689 11689 12856 13654
 4T 4438 8698 16739 22057 23946 25729
 8T 4455 8619 15787 19934 22576 20804
 122.9 1T 1490 1975 2360 2802 3818 5330
 2T 2881 3798 4647 5531 7536 10546
 4T 4452 6338 5910 10217 14650 19903
 8T 4096 5075 6264 9213 12610 15821
 12288 1T 206 273 593 1198 2343 3935
 2T 276 455 842 1821 3319 6591
 4T 445 730 1401 2076 4457 7525
 8T 424 539 954 1829 3688 7064
 Total Elapsed Time 13.0 seconds

 ########## A1 New Long Version
 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.50
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 5431 6110 6780 6262 6655 7313
 2T 3550 4464 7375 9825 11777 12442
 4T 2027 4442 4399 8841 17611 23509
 8T 983 2477 5063 4433 8568 15867
122.9 1T 1499 1991 2357 2839 3818 5382
 2T 2816 3808 4708 5592 7557 10677
 4T 4316 6313 7991 9816 14335 19993
 8T 4235 5610 7917 8791 12828 19661
49152 1T 215 275 611 1183 2328 3922
 2T 276 435 787 1671 3323 6507
 4T 398 455 884 1754 3490 6971
 8T 376 511 867 1746 3512 7510
 Total Elapsed Time 48.6 seconds

 Maiximum RAM Speed Estimate = 511 x 16 = 8176 MB/second


 #################### T11 ARM-Intel ####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.45
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 2165 3591 4256 5587 5998 6109
 2T 4121 6469 9530 11381 11846 11936
 4T 4106 6438 8827 6793 9802 12080
 8T 4098 6390 9534 10141 10996 11603
 122.9 1T 464 740 1173 2395 3276 3340
 2T 579 989 1934 3994 5431 5792
 4T 579 988 1930 3873 5469 5821
 8T 580 985 1915 3999 5408 5812
 12288 1T 134 172 211 462 602 1904
 2T 269 343 387 934 1217 2685
 4T 252 231 374 768 991 2625
 8T 231 254 367 781 1104 2782
 Total Elapsed Time 12.1 seconds

 ########## T11 New Long Version
 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 17.07
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 3499 4539 5499 5505 6134 6045
 2T 3775 7202 8377 10605 10457 11319
 4T 3982 6676 7687 9326 9707 10807
 8T 2546 3643 7891 8003 10725 11097
122.9 1T 672 901 1336 2784 3274 3334
 2T 568 969 1931 3894 5427 5221
 4T 574 971 1912 3831 5256 4811
 8T 559 971 1917 3878 5387 5162
49152 1T 140 142 193 575 989 1499
 2T 221 223 342 769 1379 2355
 4T 228 223 344 783 1382 2376
 8T 223 223 342 787 1385 2352
 Total Elapsed Time 49.9 seconds

 Maiximum RAM Speed Estimate = 223 x 16 = 2568 MB/second
 Initial Results

 12.3 1T 693 936 1266 2522 3264 3329
 2T 557 900 1539 3459 3317 3613
 4T 551 903 1557 2902 3475 3616

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 L1 caches 4 x 32 KB, L2 cache shared 2048 KB
 Android MP-BusSpd v7 Benchmark V1.1 29-Jun-2015 18.37
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 2580 2206 5048 5176 5679 5989
 2T 4062 5175 9340 9868 10971 11281
 4T 4688 10324 16552 17196 21714 23708
 8T 8467 9834 16698 18183 21936 23693
122.9 1T 1152 1052 2068 3035 3927 5723
 2T 1710 1840 3094 5001 7963 11475
 4T 2047 2002 5031 9267 14698 22920
 8T 2235 2275 5223 9348 14234 21783
12288 1T 262 382 508 867 1466 2661
 2T 464 766 1049 1754 3186 5735
 4T 612 1018 1796 3149 5892 9095
 8T 575 680 1277 2308 4987 7948
 Total Elapsed Time 12.7 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel MP-BusSpd v7 Benchmark V1.1 23-May-2015 17.05
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 1840 2073 3512 3554 4829 5243
 2T 3432 4591 7128 7651 9120 9821
 4T 4398 7855 13752 15428 18530 20235
 8T 6692 9507 13857 16110 18143 18796
122.9 1T 860 753 2011 2841 3205 5282
 2T 1505 1609 3076 5038 8089 10421
 4T 1924 1981 4299 7588 14614 20754
 8T 1909 1988 4264 7980 13884 19027
12288 1T 270 379 538 856 1626 2859
 2T 471 677 1098 1849 3304 5924
 4T 549 787 1066 1874 6274 10781
 8T 713 853 1649 2258 4664 8321
 Total Elapsed Time 13.1 seconds

 ########## T21 New Long Version
 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.39
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 2247 2616 4010 4443 4909 5614
 2T 3558 4725 7241 9048 9747 10892
 4T 6074 8303 13442 16937 18525 21068
 8T 3998 5106 14314 13615 18200 20740
122.9 1T 874 1198 2024 2935 4529 5345
 2T 1686 1702 3174 5357 7688 10545
 4T 1988 2139 4465 8171 14969 21169
 8T 1972 2139 4468 8195 15261 21132
49152 1T 292 406 516 899 1663 2929
 2T 449 541 962 1569 2851 4776
 4T 495 605 1109 2439 4161 8243
 8T 530 564 1156 2149 4172 7907
 Total Elapsed Time 48.0 seconds

 Maiximum RAM Speed Estimate = 605 x 16 = 9680 MB/second
 ###################### T22 32 Bit ######################
 T22, Tab 2 A8-50, 1.3 GHz quad core 64 bit ARM Cortex-A53
 Single Channel RAM, LPDDR3 666 MHz, 5.3 GB/second
 ARM/Intel MP-BusSpd Benchmark V1.2 12-Aug-2015 16.13
 Compiled for 32 bit ARM v7a
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 1849 2140 2079 2211 2270 2297
 2T 3663 4252 4294 4400 4370 4580
 4T 4630 5574 5691 5893 6015 6083
 8T 5331 5775 6033 6622 7968 8023
122.9 1T 597 621 1119 1815 2135 2237
 2T 869 943 1644 2992 3740 4412
 4T 949 951 1922 3736 6468 7779
 8T 948 978 1911 3717 6464 7542
12288 1T 123 174 344 678 1215 1840
 2T 243 310 672 1332 2383 3974
 4T 302 285 594 1282 2271 4606
 8T 279 295 654 1198 2749 4660
 Total Elapsed Time 12.8 seconds

 ########## T22 Long Version
 ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.14
 Compiled for 32 bit ARM v7a

 12.3 1T 1877 2124 2176 2266 2296 2343
 2T 3625 4198 4341 4468 4536 4613
 4T 5733 7541 8293 8830 8024 9042
 8T 2985 3829 7438 6117 8108 8923
122.9 1T 604 625 1142 1846 2150 2284
 2T 924 950 1793 3277 4270 4504
 4T 962 989 1939 3765 6798 8862
 8T 965 993 1933 3748 6651 8239
49152 1T 165 175 344 677 1285 1979
 2T 234 238 482 961 1907 3547
 4T 266 298 562 1224 2296 4478
 8T 272 275 538 1098 2149 4282
 Total Elapsed Time 48.8 seconds

 ###################### T22 64 Bit ######################
 ARM/Intel MP-BusSpd2 Benchmark V1.2 12-Aug-2015 16.18
 Compiled for 64 bit ARM v8a

 12.3 1T 2610 2472 2586 2727 2748 5841
 2T 4404 4681 4994 5369 5420 11297
 4T 6546 8125 9105 10243 10319 20610
 8T 3380 4023 7919 7146 9871 19852
122.9 1T 604 621 1110 1872 2446 5100
 2T 919 948 1855 3433 4853 10037
 4T 961 974 1984 3924 7491 14935
 8T 963 942 1931 3915 7572 14689
49152 1T 173 177 340 692 1300 2653
 2T 266 241 479 968 1883 3724
 4T 304 277 556 1130 2126 4328
 8T 279 278 544 1138 2179 4275
 Total Elapsed Time 49.4 seconds

 #################### T7 ARM-Intel #####################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.35
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 2853 3392 3376 3511 3551 3494
 2T 2857 3389 3542 5540 5730 5595
 4T 7257 10326 10289 10997 11373 11100
 8T 6584 10325 10485 11175 11322 11189
 122.9 1T 362 379 347 546 623 978
 2T 516 530 508 726 1227 1840
 4T 598 658 548 1181 1556 2657
 8T 721 733 736 1181 1548 2653
 12288 1T 58 57 84 123 173 334
 2T 111 111 182 248 348 664
 4T 87 85 276 463 687 1290
 8T 154 107 147 429 441 1242
 Total Elapsed Time 12.7 seconds

 ########## T7 New Long Version
 ARM/Intel MP-BusSpd2 Benchmark V1.0 24-Jul-2015 15.59
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 2166 2774 3181 3307 3377 3263
 2T 3924 5188 5207 5754 5759 5805
 4T 7570 10011 10252 11165 11375 11777
 8T 3510 4786 9011 8318 11351 11544
122.9 1T 383 409 359 558 663 983
 2T 525 541 520 741 1241 1814
 4T 739 752 753 1219 1590 2776
 8T 735 741 753 1218 1607 2737
49152 1T 56 51 81 126 172 330
 2T 65 67 107 196 335 620
 4T 70 68 108 215 426 835
 8T 70 68 109 215 428 851
 Total Elapsed Time 48.2 seconds

 Maiximum RAM Speed Estimate = 68 x 16 = 1088 MB/second
 ############### BlueStacks Original ###############
 Android MP-BusSpd v7 Benchmark V1.1 05-May-2015 17.44
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 1600 1538 1641 1706 1600 1687
 2T 1600 1641 1745 1600 1687 1638
 4T 1600 1745 1745 1567 1638 1575
 8T 1476 1641 1602 1638 1575 1596
 122.9 1T 1000 923 1477 1600 1600 1688
 2T 1000 952 1477 1600 1567 1282
 4T 872 1163 1422 1567 1602 1576
 8T 1026 1164 1477 1527 1644 1580
 12288 1T 307 403 537 1075 1396 1512
 2T 302 409 708 1075 1417 1433
 4T 307 355 614 1024 1433 1535
 8T 307 384 661 1023 1404 1512
 Total Elapsed Time 13.9 seconds

 ############### BlueStacks ARM/Intel ##############
 ARM/Intel MP-BusSpd v7 Benchmark V1.1 05-May-2015 14.25
 MB/Second Reading Data, 1, 2, 4 and 8 Threads
 KB Inc32 Inc16 Inc8 Inc4 Inc2 RdAll

 12.3 1T 9999 18461 20000 20512 19692 21942
 2T 10909 17777 19999 19692 21942 20480
 4T 9599 18461 19692 19591 20480 19692
 8T 10666 17066 19948 20480 20480 19200
 122.9 1T 1500 1476 2742 5485 11636 13128
 2T 1428 1396 2792 5585 11170 13653
 4T 1396 1428 2954 5486 10973 13654
 8T 1280 1371 2744 5909 10974 14630
 12288 1T 460 439 645 631 1105 1331
 2T 230 268 480 806 1433 2234
 4T 256 307 575 1126 2010 2764
 8T 236 390 756 1105 1911 3574
 Total Elapsed Time 14.4 seconds
 
To Start


MP-RandMem Benchmark - MP-RndMemi.apk

This is a conversion of the longer running MP-RndMem2.apk Benchmark, as the original, short version, produced inconsistent performance measurements. It is a multithreading variety of RandMem Benchmark above. For further details and more results see Android MultiThreading Benchmark Apps. Log file details are provided below for the original version, that performed relatively badly on the Intel based tablet A1, and the ARM/Intel version, with cache based speeds up to 3.6 times faster with reading tests and 1.3 times with reading/writing. The new version, running on ARM based tablets, produced similar results to those from the original, with some slower.

Compared with early ARM based devices, tablet A1 ARM/Intel tests again demonstrated superior performance from RAM based data and from L2 cache on reading, but not that well using L1 cache.

August 2015 - Results provided for 64 bit T22 with Cortex-A53 CPU. Probably as performance is dependent on the complex indexing used, performance is not significantly faster at 64 bits.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.14
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 1337 2505 1337 2509
 2T 2637 2513 2657 2521
 4T 3535 2420 3484 2454
 8T 3195 2403 3088 2406
 122.9 1T 1305 2280 963 1758
 2T 2581 2285 1945 1748
 4T 3588 2130 3125 1740
 8T 3211 2269 2949 1745
 12288 1T 1248 1962 101 215
 2T 2469 1940 191 214
 4T 3462 1954 323 214
 8T 3127 1926 318 212
 Total Elapsed Time 43.7 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.54
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 4643 3593 4710 3641
 2T 8583 3552 8761 3564
 4T 12707 3450 12496 3384
 8T 10410 3389 10796 3408
 122.9 1T 3733 2874 2408 2150
 2T 7259 2871 4781 2165
 4T 11726 2897 7656 2133
 8T 11673 2853 7100 2113
 12288 1T 3153 2087 226 238
 2T 5782 2073 327 238
 4T 6451 1997 447 236
 8T 6471 2071 446 233
 Total Elapsed Time 41.5 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.13
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 6696 4438 6594 4483
 2T 12338 3078 12263 3573
 4T 12419 2834 12166 2907
 8T 12314 2903 11991 2934
 122.9 1T 3371 2916 1639 1748
 2T 6409 1922 2052 1097
 4T 6155 1892 2027 1186
 8T 6045 2105 2015 1192
 12288 1T 1394 1048 153 133
 2T 2245 985 285 123
 4T 2277 1002 285 132
 8T 2165 1001 286 127
 Total Elapsed Time 44.0 seconds

 #################### T11 ARM-Intel ####################
 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 12.07
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 6315 4486 6345 4484
 2T 11837 2910 11846 3112
 4T 11864 2835 11553 2858
 8T 11821 3003 11805 3198
 122.9 1T 3963 2681 1670 1704
 2T 6672 1782 2040 1125
 4T 6493 1817 2033 1218
 8T 6673 1738 2038 1303
 12288 1T 1805 1081 177 145
 2T 2543 1066 279 137
 4T 2600 1065 276 136
 8T 2662 1073 281 138
 Total Elapsed Time 43.7 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 Android MP-RndMem2 Benchmark V2.1 08-Jul-2015 16.33
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

12.29 1T 5088 5325 4262 4711
 2T 9752 4902 8895 4570
 4T 17379 4653 17434 4096
 8T 19771 4698 17358 4424
122.9 1T 2714 2578 1923 2163
 2T 5614 2502 3483 2107
 4T 10859 2219 4835 1972
 8T 10654 2410 4904 1923
12288 1T 1798 952 186 204
 2T 3489 974 341 195
 4T 6515 943 563 196
 8T 6218 922 563 187
 Total Elapsed Time 42.3 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel MP-RndMem Benchmark V1.1 09-Jul-2015 11.48
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

12.29 1T 4186 3777 4055 3933
 2T 9324 3541 7710 3619
 4T 16594 3350 15731 3142
 8T 18117 3291 16187 3262
122.9 1T 2423 2043 1610 1683
 2T 5235 2029 3013 1641
 4T 10148 1935 4662 1565
 8T 10015 1834 4611 1474
12288 1T 1363 886 171 186
 2T 2643 845 325 187
 4T 5197 823 534 184
 8T 4801 835 542 184
 Total Elapsed Time 42.6 seconds

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.13
 Compiled for 32 bit ARM v7a
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 2894 2438 2887 2433
 2T 5665 2402 5663 2403
 4T 10922 2369 11100 2310
 8T 10065 2293 10648 2265
 122.9 1T 2681 2368 757 758
 2T 5351 2360 1398 769
 4T 10056 2308 2121 772
 8T 8838 2351 1916 742
 12288 1T 2309 1662 80 78
 2T 3986 1683 164 73
 4T 5419 1684 283 82
 8T 4658 1694 279 82

###################### T22 64 Bit ######################
 ARM/Intel MP-RndMem Benchmark V1.2 12-Aug-2015 17.15
 Compiled for 64 bit ARM v8a

 12.29 1T 4445 3109 4455 3089
 2T 8010 3100 8072 3105
 4T 15909 3057 14711 3040
 8T 14764 3036 14570 3037
 122.9 1T 3457 2888 842 876
 2T 6537 2924 1524 876
 4T 11095 2892 2119 861
 8T 11729 2916 2080 874
 12288 1T 2475 1679 81 78
 2T 4155 1713 163 73
 4T 5503 1711 285 89
 8T 4519 1717 281 89
 Total Elapsed Time 48.1 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Android MP-RndMem2 Benchmark V2.1 06-May-2015 12.17
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 3120 3060 3128 3078
 2T 6098 3003 6083 3004
 4T 11354 2948 11188 2942
 8T 11403 2857 10412 2872
 122.9 1T 996 983 661 699
 2T 1868 984 1012 697
 4T 2600 982 1483 699
 8T 2534 976 1459 694
 12288 1T 335 286 91 80
 2T 640 288 113 82
 4T 892 286 130 82
 8T 925 287 127 81
 Total Elapsed Time 44.7 seconds

 #################### T7 ARM-Intel #####################
 ARM/Intel MP-RndMem Benchmark V1.1 06-May-2015 11.59
 MB/Second Using 1, 2, 4 and 8 Threads
 KB SerRD SerRDWR RndRD RndRDWR

 12.29 1T 3060 2001 2867 1904
 2T 5459 1879 5463 1867
 4T 10797 1852 10537 1856
 8T 10090 1802 10608 1813
 122.9 1T 968 823 588 547
 2T 1749 785 902 618
 4T 2716 812 1328 672
 8T 2733 810 1407 673
 12288 1T 329 274 90 82
 2T 636 272 112 82
 4T 849 271 128 82
 8T 869 271 126 81
 Total Elapsed Time 45.4 seconds
 
To Start


NEON-Linpack Benchmark - NEON-Linpacki.apk

Details of the benchmark can be found above and in android neon benchmarks.htm. The main point is that it was a complete surprise to discover that ARM NEON intrinsic functions could be converted to Intel SIMD SSE instructions, with significant performance improvement on an Atom based tablet. The use of NEON functions for ARM CPUs can be anticipated to produce similar performance ratings via the original and ARM/Intel versions, as reflected in the results below.

August 2015 - T22 results from 32 bit and 64 bit compilations were similar, as the programs use a limited number of identical intrinsic functions.

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, with speed of 1446 MFLOPS at 2 bits.


 NEON Single Precision Floating Point MFLOPS

 ########################################################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 
 MFLOPS Original 443.4 ARM-Intel 900.2
 ########################################################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 MFLOPS Original 1334.9 ARM-Intel 1411.9
 ########################################################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 MFLOPS Original 1250.1 ARM-Intel 1235.0
 ########################################################
 
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 MFLOPS 32 bit 407.1 64 bit 505.2
 ########################################################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 MFLOPS Original 376.0 ARM-Intel 346.8
 ########################################################
 P33, Snapdragon 810 2000 MHz, Android 5.0.2
 MFLOPS 32 bit 1446.4
 
To Start


NeonSpeed Benchmark - NeonSpeedi.apk

This benchmark carries out the same calculations as the MemSpeed Benchmark measuring data reading speeds in Mega Bytes per second, with functions accessing arrays of cache and RAM based data, sized 2 x 8 KB to 2 x 32 MB. Calculations are x[m]=x[m]+s*y[m] and x[m]=x[m]+y[m] single precision floating point with x[m]=x[m]+s+y[m] and x[m]=x[m]+y[m] with integers. Million Floating Point Operations Per Second (MFLOPS) speed can calculated by dividing single precision MB/second by 4 and 8, for the two tests. The first set of calculations use normal functions followed by some using NEON Intrinsic Functions. The last two columns are NEON only results. For further details and results see android neon benchmarks.htm.

The native Intel code produced some performance gains, mainly using L1 cache based data, but speed in other areas is probably limited by data flow. The later compiler produced some slower speeds on ARM based tablet T11 and better/worse variations on T21.

August 2015 - Results provided for 64 bit T22. As with NEON-Linpack, many results from 32 bit and 64 bit compilations, via NEON intrinsic functions, were similar. With normal code, the 64 bit compilations were up to near four times faster than those at 32 bits.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android NeonSpeed Benchmark V1.1 02-Feb-2015 17.09
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 1778 3940 2807 5474 4997 5062
 32 1781 3576 2636 4431 4316 4291
 64 1772 3589 2639 4480 4337 4332
 128 1784 3589 2641 4423 4320 4320
 256 1766 3592 2642 4400 4347 4358
 512 1784 3585 2633 4375 4350 4355
 1024 1705 3253 2448 3760 3789 3788
 4096 1673 3021 2366 3257 3245 3237
 16384 1672 2948 2349 3062 3157 3151
 65536 1675 2967 2345 3190 3168 3168
 Total Elapsed Time 10.8 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 16.54
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 1816 5996 4916 6244 6882 6880
 32 1851 4703 3985 5200 5609 5711
 64 1862 3845 3121 4174 4441 4520
 128 1841 3929 3110 4179 4411 4487
 256 1863 3932 3092 4179 4412 4493
 512 1861 3938 3090 3894 4215 4415
 1024 1784 3475 2738 3130 3223 3443
 4096 1741 2376 2649 2998 3112 3139
 16384 1774 3086 2780 3116 3140 3145
 65536 1774 2987 2547 2328 3126 3072
 Total Elapsed Time 10.1 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Android NeonSpeed Benchmark V1.1 09-Aug-2013 17.10
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 3793 9641 4375 13023 13456 13562
 32 5777 11410 4993 11718 11365 11143
 64 4122 6692 3855 6539 6682 7210
 128 4017 6565 3849 6475 6520 6983
 256 4067 6562 3836 6459 6495 7038
 512 3900 6531 3820 6428 6490 7095
 1024 1821 2544 1774 2532 2554 2539
 4096 1141 1645 1536 1612 1615 1635
 16384 1437 1695 1490 1576 1694 1668
 65536 1424 1675 1475 1699 1687 1694
 Total Elapsed Time 11.2 seconds

 #################### T11 ARM-Intel ####################
 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.17
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 2252 4964 3321 6602 7304 7237
 32 4202 8364 4543 8366 8553 8101
 64 3710 6096 3860 6570 6348 6182
 128 3802 5581 3874 6044 5624 5877
 256 3654 5618 3501 6154 5655 5783
 512 3597 5688 3723 6130 5812 5684
 1024 1727 2466 1659 2481 2454 2472
 4096 1479 1718 1421 1714 1713 1706
 16384 1488 1704 1435 1576 1705 1694
 65536 1477 1755 1453 1754 1759 1752
 Total Elapsed Time 10.8 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 Android NeonSpeed Benchmark V1.1 23-Jul-2015 13.00
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 4324 13809 4498 14660 17501 18186
 32 3587 6845 2922 8073 6981 7035
 64 3347 6894 2912 8078 6964 6938
 128 3343 6651 2919 7922 6726 6999
 256 3511 6963 3002 8071 6902 6897
 512 3476 6628 3025 7827 6613 6818
 1024 3172 4627 2773 6424 4800 4806
 4096 2653 2051 2378 3613 2090 2054
 16384 2356 1891 2118 3165 1955 1962
 65536 2424 1923 2167 3368 1933 1925
 Total Elapsed Time 9.9 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel NeonSpeed Benchmark V1.1 23-Jul-2015 13.03
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 3623 16704 4623 15187 17446 16719
 32 3455 9210 2997 8723 9280 9112
 64 3336 7721 3002 8544 8469 8581
 128 3415 7664 3111 8481 7549 7638
 256 3584 7526 3087 8500 7849 7805
 512 3538 7422 3154 8266 7567 7541
 1024 3513 7227 3067 7789 7294 7261
 4096 2302 1673 2413 3107 1693 1677
 16384 2286 1616 2323 3024 1620 1617
 65536 2322 1617 2271 2505 1634 1600
 Total Elapsed Time 9.9 seconds

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.32
 Compiled for 32 bit ARM v7a
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 971 3853 1807 4059 3957 4397
 32 970 3812 1800 3983 3891 4323
 64 927 3228 1605 3038 3269 3521
 128 926 3321 1681 3343 3354 3596
 256 936 3386 1693 3449 3413 3667
 512 898 2889 1578 2996 2927 3118
 1024 794 1859 1345 2057 1996 1924
 4096 794 1796 1250 1788 1813 1835
 16384 792 1773 1270 1820 1829 1864
 65536 796 1811 1289 1852 1832 1880
 Total Elapsed Time 11.3 seconds

 ###################### T22 64 Bit ######################
 ARM/Intel NeonSpeed Benchmark V1.2 13-Aug-2015 16.37
 Compiled for 64 bit ARM v8a
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 3054 4055 3605 4376 4911 5094
 32 2922 3787 3435 4198 4546 4682
 64 2795 3514 3259 3658 4050 4116
 128 2886 3529 3373 3924 4148 3963
 256 2883 3641 3264 3942 4193 4276
 512 2454 3165 2985 3385 3586 3542
 1024 1633 2000 1835 2043 2114 2105
 4096 1738 1893 1899 1900 1956 1955
 16384 1757 1870 1886 1802 1921 1846
 65536 1755 1875 1870 1903 1936 1937
 Total Elapsed Time 10.2 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Android NeonSpeed Benchmark 15-Dec-2012 14.38
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 860 2575 2325 2918 3053 3245 L1
 32 950 2551 2400 2823 2944 3131
 64 744 1396 1329 1434 1465 1496 L2
 128 713 1342 1319 1365 1392 1417
 256 714 1339 1311 1357 1377 1400
 512 708 1323 1299 1348 1358 1383
 1024 608 875 869 917 930 952
 4096 460 493 492 481 488 504 RAM
 16384 460 498 487 507 506 504
 65536 459 495 469 251 503 505
 Total Elapsed Time 11.5 seconds

#################### T7 ARM-Intel #####################
 ARM/Intel NeonSpeed Benchmark V1.1 09-May-2015 18.07
 Vector Reading Speed in MBytes/Second
 Memory Float v=v+s*v Int v=v+v+s Neon v=v+v
 KBytes Norm Neon Norm Neon Float Int

 16 881 2440 2501 3334 3206 3465
 32 901 1868 1705 2260 2083 2186
 64 801 1395 1365 1573 1548 1581
 128 784 1282 1278 1405 1389 1411
 256 787 1279 1285 1420 1380 1409
 512 777 1266 1267 1409 1370 1394
 1024 604 786 762 769 770 828
 4096 458 479 477 463 486 488
 16384 436 447 448 469 470 469
 65536 450 472 469 240 482 483
 Total Elapsed Time 11.5 seconds
 
To Start


NEON-MFLOPS-MP Benchmark - NEON-MFLOPS2i-MP.apk

NEON-MFLOPS-MP carries out the same calculations as MP-MFLOPS Benchmarks above, but with NEON intrinsic functions used for all calculations. For further results see android neon benchmarks.htm.

Results for the original NEON version and a sample of MP-MFLOPS are provided below. NEON produced significant performance improvements across the board, including The Atom based tablet, via the ARM to Intel conversion layer. As might be expected using intrinsics, compilation via a later version of gcc made little difference in speed of ARM systems but the Intel native code increased performance by more than twice, on CPU speed limited tests.

Following the performance details are the numeric results of calculations from the fixed parameters used in the new version, for both ARM and Intel. It seems that Tablet T11 has an intermittent fault, as it occasionally fails to calculate a correct answer or causes the Tablet to crash and reboot. Now, this also appears to happen using the older version.

August 2015 - T22 NEON 64 bit compilation produced a small performance gain over 32 bit results, at 2 operations per word, but near double speed at 32 operations, the latter suffering from fewer registers for the variables. Using one core, maximum speed was 2.77 GFLOPS, rising to 10.8 GFLOPS via four cores (best so far relative to CPU GHz). The one core speed equated to just over two floating point operation per clock cycle. This is disappointing, compared with Intel processors, such as the Core 2 onwards, at 6 per clock cycle out of a maximum of 8, with SSE SIMD code (See Linux results).

September 2015 - New best score from P33, with 2 GHz Qualcomm Snapdragon 810, (Cortex-A57) and Android 5.0.2, at 64 bits. Performance, with 8 threads, is up to 23.6 GFLOPS, and up to nearly 3.5 results per clock cycle, using one core.


 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Android NEON-MFLOPS-MP Benchmark V1.1 07-Feb-2015 18.37
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 1110 1319 878 1188 1139 1226
 2T 2470 2114 996 2406 2427 2390
 4T 3159 2211 988 4148 3487 4006
 8T 2066 2486 1003 4144 3944 4077
 Total Elapsed Time 3.6 seconds
 Not NEON
 4T 1571 1627 979 2238 2255 2258

 Android NEON-MFLOPS2-MP Benchmark V2.1 07-Feb-2015 18.38
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 1796 1520 1025 1231 1228 1227
 2T 3354 2959 1047 2427 2445 2445
 4T 4627 5508 978 4690 4791 4733
 8T 3861 6307 1030 4611 4869 4742
 Total Elapsed Time 88.3 seconds

 #################### A1 ARM-Intel ######################
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.17
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 2151 1962 1064 2619 2694 2650
 2T 4421 3849 1048 5296 5463 5343
 4T 5886 6652 982 9592 10735 10362
 8T 3744 7284 1018 9085 10791 9493
 Total Elapsed Time 13.8 seconds

 ############### A1 ARM-Intel 1000 MHz #################
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 16.04
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 1939 1266 674 2503 2388 2351
 2T 3670 2652 679 4919 4792 4640
 4T 3102 3051 676 4688 4678 4672
 8T 3189 3425 657 4813 4869 4639
 Total Elapsed Time 19.4 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Dual core, Measured 1.7 GHz
 Android NEON-MFLOPS-MP Benchmark V1.1 13-Sep-2013 13.44
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 1847 1415 597 3772 4096 3545
 2T 3649 3309 664 8065 7966 7505
 4T 3670 3922 658 7753 8148 7490
 8T 5664 5570 681 8092 8355 7672
 
 Total Elapsed Time 13.0 seconds
 Not NEON
 2T 1593 1668 648 3140 3067 2977

 #################### T11 ARM-Intel ####################
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.07
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 1965 1630 582 3792 4077 3521
 2T 3789 2690 663 8497 8133 7297
 4T 5714 4883 654 8364 8192 7554
 8T 5414 6316 673 7976 8437 6635
 Total Elapsed Time 13.0 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 Android NEON-MFLOPS2-MP Benchmark V2.1 25-Jul-2015 18.44
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 2757 2576 771 2808 2825 2800
 2T 5662 5525 1516 5631 5664 5570
 4T 6550 7846 1945 11167 11281 10939
 8T 10273 10928 1981 10851 11211 11350
 Total Elapsed Time 40.0 seconds
 Not NEON
 4T 2338 2959 1836 4867 4911 4859

 #################### T21 ARM-Intel #################### 
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 28-Jun-2015 16.32
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 3049 2857 622 2923 2874 2098
 2T 5508 4887 1009 5477 5736 4349
 4T 5643 5282 1410 11244 11601 8564
 8T 9294 11156 1681 11288 11605 8946
 Total Elapsed Time 14.0 seconds

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.35
 Compiled for 32 bit ARM v7a
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 619 613 575 1444 1446 1426
 2T 1174 1206 889 2894 2902 2839
 4T 1585 1616 901 5679 5726 5596
 8T 2075 2130 944 5400 5585 5519
 Total Elapsed Time 25.8 seconds

 ###################### T22 64 Bit ######################
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 13-Aug-2015 16.38
 Compiled for 64 bit ARM v8a
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 726 745 647 2766 2774 2639
 2T 1397 1402 903 5523 5552 5371
 4T 1871 1930 898 10780 10479 10439
 8T 2496 2876 1011 9736 10679 9900
 Total Elapsed Time 15.1 seconds

##################### P33 64 Bit ##################### 
 P33 Quad-core 2 GHz Qualcomm Snapdragon 810, Android 5.0.2 
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.2 16-Sep-2015 17.59
 Compiled for 64 bit ARM v8a
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 2811 3126 1089 6943 6589 6342
 2T 2488 4114 1541 12084 10559 8809
 4T 4759 5480 2038 16516 14826 11960
 8T 4840 8985 2452 22082 23563 12461
 Total Elapsed Time 7.6 seconds

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Quad core, Measured 1200 MHz
 Android NEON-MFLOPS-MP Benchmark V1.0 20-Dec-2012 16.57
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 532 402 124 1135 1044 960
 2T 1255 798 213 2041 1987 1916
 4T 2441 1553 229 4185 4034 3450
 8T 1922 2403 226 3774 3996 3346
 Total Elapsed Time 4.5 seconds
 Not NEON
 4T 716 655 233 2367 2316 2240

 #################### T7 ARM-Intel #####################
 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1 13-May-2015 12.24
 FPU Add & Multiply using 1, 2, 4 and 8 Threads
 2 Ops/Word 32 Ops/Word
 KB 12.8 128 12800 12.8 128 12800
 MFLOPS
 1T 657 407 132 1077 1074 1053
 2T 1265 817 222 2147 2150 2078
 4T 2024 1695 234 4214 4276 3555
 8T 2435 2495 234 4196 4100 3523
 Total Elapsed Time 39.0 seconds

 ##################### New Results #####################
 Results x 100000, 12345 indicates ERRORS

 ARM/Intel NEON-MFLOPS2-MP Benchmark V2.1
 1T 44934 86735 99850 36770 79897 99759
 2T 44934 86735 99850 36770 79897 99759
 4T 44934 86735 99850 36770 79897 99759
 8T 44934 86735 99850 36770 79897 99759
 
 T11 44934 12345 99850 36770 79897 99759
 
 Android NEON-MFLOPS-MP Benchmark V1.1
 1T 86735 98519 99984 79897 97638 99975
 2T 86735 98519 99984 79897 97638 99975
 4T 86735 98519 99984 79897 97638 99975
 8T 86735 98519 99984 79897 97638 99975
 
 Android NEON-MFLOPS2-MP Benchmark V2.1 
 1T 40015 66980 99522 35216 54898 99234
 2T 40015 66980 99522 35216 54898 99234
 4T 40015 66980 99522 35216 54898 99234
 8T 40015 66980 99522 35216 54898 99234
 
To Start


NEON-Linpack-MP Benchmark - NEON-Linpacki-MP.apk

This is a multithreading version of NEON-Linpack Benchmark. Further details and results can be found in android neon benchmarks.htm. The benchmark is run on 100x100, 500x500 and 1000x1000 matrices using 0, 1, 2 and 4 separate threads, the programming code for zero theads being the same as the earlier example. Multithreading performance, using this standard linear equation solver, is severely degraded, due to overheads, the zero thread results being the only ones of real use.

Performance, using native Intel compilation, is shown to be twice as fast, except at N = 1000, which is mainly dependent on calculations from data in RAM. Speed from ARM can also be somewhat faster (or slower). T21, with the Qualcomm Snapdragon 800, obtains significantly fastest results, at unthreaded N = 500.

The program checks that the same numeric results are produced, irrespective of the number of threads used, at each matrix size. Then, due to rounding effects, these are slightly different from ARM and Intel hardware, as shown below.

August 2015 - T22 results from 32 bit and 64 bit compilations were again similar, due to the programs use a limited number of identical intrinsic functions.


 MFLOPS 0 to 4 Threads, N 100, 500, 1000
 #################### A1 Original #######################
 A1 Quad Core 1.86 GHz Intel Atom Z3745, Android 4.4
 Dual Channel LPDDR3-1066 Bandwidth 17.1 GB/s
 Threads None 1 2 4

 N 100 452.39 21.00 23.48 17.48
 N 500 663.38 275.56 88.66 312.71
 N 1000 617.04 380.60 191.26 195.61

 #################### A1 ARM-Intel ######################
 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 13.58
 Threads None 1 2 4

 N 100 971.71 37.72 36.36 39.66
 N 500 1311.37 488.73 487.85 488.98
 N 1000 945.97 727.85 737.95 742.34
 Total Elapsed Time 59.966 seconds

 #################### T11 Original #####################
 T11 Samsung EXYNOS 5250 2.0 GHz Cortex-A15, Android 4.2.2
 Measured 1.7 GHz
 Threads None 1 2 4

 N 100 1399.82 54.86 55.31 54.66
 N 500 1154.21 434.16 434.06 436.97
 N 1000 571.26 482.57 487.25 485.80

 #################### T11 ARM-Intel ####################
 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.44
 Threads None 1 2 4

 N 100 1497.90 61.13 63.13 61.87
 N 500 1399.10 491.49 489.29 494.69
 N 1000 586.14 499.00 504.97 497.49
 Total Elapsed Time 43.952 seconds

 #################### T21 Original #####################
 T21 Qualcomm Snapdragon 800 2150 MHz, Android 4.4.4
 Dual Channel 32 Bit LPDDR3-1866 RAM 14.9 GB/s
 Android Linpack NEON SP MP Benchmark 26-Jul-2015 11.46
 Threads None 1 2 4

 N 100 1311.08 12.38 12.93 15.05
 N 500 2271.56 344.04 419.52 381.73
 N 1000 837.30 540.99 523.52 564.87
 Total Elapsed Time 143.534 seconds

 #################### T21 ARM-Intel #################### 
 ARM/Intel Linpack NEON SP MP Benchmark 26-Jul-2015 11.51
 Threads None 1 2 4

 N 100 1308.07 14.89 11.77 11.63
 N 500 2341.17 407.96 481.02 415.12
 N 1000 901.21 551.80 566.77 564.31
 Total Elapsed Time 145.750 seconds

 ###################### T22 32 Bit ######################
 T22, Quad Core ARM Cortex-A53 1300 MHz, Android 5.0.2 
 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.52
 Compiled for 32 bit ARM v7a
 Threads None 1 2 4

 N 100 460.74 22.35 23.16 23.82
 N 500 480.63 336.52 339.94 303.66
 N 1000 470.02 405.86 403.01 405.98

 ###################### T22 64 Bit ######################
 ARM/Intel Linpack NEON SP MP Benchmark 1.2 13-Aug-2015 12.57
 Compiled for 64 bit ARM v8a
 Threads None 1 2 4

 N 100 548.67 27.70 33.93 37.00
 N 500 470.04 285.95 297.79 301.67
 N 1000 519.02 441.84 443.47 441.91

 ##################### T7 Original ######################
 T7, ARM Cortex-A9 1300 MHz, Android 4.1.2, 
 Measured 1200 MHz
 Threads None 1 2 4

 N 100 413.47 45.95 48.22 48.34
 N 500 253.08 187.51 189.69 189.94
 N 1000 148.76 135.49 136.08 136.17

#################### T7 ARM-Intel #####################
 ARM/Intel Linpack NEON SP MP Benchmark 14-May-2015 15.40
 Threads None 1 2 4

 N 100 385.49 28.79 29.06 29.25
 N 500 272.07 184.85 183.70 183.18
 N 1000 147.09 131.92 132.44 130.05
 Total Elapsed Time 64.318 seconds

################### Numeric Results ###################
 NR=norm resid RE=resid MA=machep X0=x[0]-1 XN=x[n-1]-1
 N 100 500 1000
 ARM
 NR 1.60 3.96 11.32
 RE 3.80277634e-05 4.72068787e-04 2.70068645e-03
 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07
 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04
 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05

 Intel
 NR 1.68 3.96 11.39
 RE 4.00543213e-05 4.72545624e-04 2.71725655e-03
 MA 1.19209290e-07 1.19209290e-07 1.19209290e-07
 X0 -1.38282776e-05 5.26905060e-05 1.62243843e-04
 XN -7.51018524e-06 3.26633453e-05 -6.65783882e-05
 

FFT Benchmarks - fft1.apk, fft3c.apk

The benchmarks run code for single and double precision Fast Fourier Transforms of size 1024 to 1048576 (1K to 1024K), each one being run three times to identify variance. Results are displayed and saved in a log file (FFT-tests.txt), with FFT running time in milliseconds. Besides Android, the bechmarks are available to run via Windows and Linux. Two versions are available FFT1, original version and with optimised C code as FFT3c. Further details, results, and links for benchmarks and source code are in FFTBenchmarks.htm. Below is an example of results.

 Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800
 ARM/Intel FFT Benchmark 3c.0 08-Sep-2015 23.15
 Compiled for 32 bit ARM v7a
 Size milliseconds
 K Single Precision Double Precision
 1 0.155 0.352 1.341 0.087 0.073 0.073 
 2 0.812 0.814 0.750 0.201 0.187 0.251 
 4 1.751 1.658 1.776 0.414 0.405 0.443 
 8 3.712 1.083 1.065 0.930 0.899 0.890 
 16 2.880 3.356 2.430 2.579 2.658 2.380 
 32 6.124 6.541 5.605 5.907 6.070 5.681 
 64 13.430 12.566 12.774 13.792 13.556 13.997 
 128 30.737 27.408 27.132 33.318 33.088 33.071 
 256 64.472 63.394 64.690 73.288 72.546 72.786 
 512 153.609 150.383 156.046 155.788 156.304 163.178 
 1024 315.283 306.323 307.409 369.426 337.074 336.684 
 1024 Square Check Maximum Noise Average Noise
 SP 9.999520e-01 3.346482e-06 4.565234e-11
 DP 1.000000e+00 1.133294e-23 1.428110e-28
 Total Elapsed Time 6.5 seconds
 

System Details



 A1 Asus MemoPad 7 ME176CEX, 1.86 GHz Atom Intel Atom Z3745 
 Screen pixels w x h 800 x 1216
 Android Build Version 4.4.2
 Processor : ARMv7 processor rev 1 (v7l)
 BogoMIPS : 1500.0
 Features : neon vfp swp half thumb fastmult edsp vfpv3
 CPU implementer : 0x69
 CPU architecture: 7
 CPU variant : 0x1
 CPU part : 0x001
 CPU revision : 1
 Hardware : placeholder
 Revision : 0001
 Linux version 3.10.20
 Mainly runs at 1.86 GHz Turbo Boost
 T7 Device Google Nexus 7 quad core CPU 1.3, GHz 1.2 GHz> 1 core
 RAM 1 GB DDR3L-1333 Bandwidth 5.3 GB/sec
 Screen pixels w x h 1280 x 736 MHz 
 Twelve-core Nvidia GeForce ULP graphics 416 MHz
 Android Build Version 4.1.2
 Processor : ARMv7 Processor rev 9 (v7l)
 processor : 0 BogoMIPS : 1993.93
 processor : 1 BogoMIPS : 1993.93
 processor : 2 BogoMIPS : 1993.93
 processor : 3 BogoMIPS : 1993.93
 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls 
 CPU implementer : 0x41
 CPU architecture: 7
 CPU variant : 0x2
 CPU part : 0xc09 - Cortex-A9
 CPU revision : 9
 Hardware : grouper - nVidia Tegra 3 T30L
 Revision : 0000
 Linux version 3.1.10
 Runs at 1.2 GHz
 T11 Voyo A15, Samsung EXYNOS 5250 Dual core 2.0 GHz Cortex-A15, 
 Mali-T604 GPU, 2 GB DDR3-1600 RAM, dual channel, 12.8 GB/s
 Screen pixels w x h 1920 x 1032 
 Android Build Version 4.2.2 - Jelly Bean
 Processor : ARMv7 Processor rev 4 (v7l)
 processor : 0
 BogoMIPS : 992.87
 processor : 1
 BogoMIPS : 997.78
 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4
 idiva idivt 
 CPU implementer : 0x41
 CPU architecture: 7
 CPU variant : 0x0
 CPU part : 0xc0f
 CPU revision : 4
 Hardware : SMDK5250
 Linux version 3.4.35Ut
 Runs at 1.7 GHz
 T21 Kindle Fire HDX 7, 2.2 GHz Quad Core Qualcomm Snapdragon 800 (Krait 400) 
 2 x 32 Bit LPDDR3-1866 Memory, 14.9 GB/s, GPU Qualcomm Adreno 330, 578 MHz
 Device Amazon KFTHWI
 Screen pixels w x h 1200 x 1803 
 Android Build Version 4.4.3
 Processor : ARMv7 Processor rev 0 (v7l)
 processor : 0, 1, 2, 3
 BogoMIPS : 38.40
 Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
 CPU implementer : 0x51
 CPU architecture: 7
 CPU variant : 0x2
 CPU part : 0x06f
 CPU revision : 0
 Hardware : Qualcomm MSM8974
 Revision : 0000
 Linux version 3.4.0-perf (gcc version 4.7) 
 T22 Lenovo Tab 2 A8-50, 1.3 GHz quad core 64 bit MediaTek ARM Cortex-A53 
 1 GB LPDDR3, GPU Mali T720 MP2
 Device LENOVO Lenovo TAB 2 A8-50F
 Screen pixels w x h 800 x 1216
 Android Build Version 5.0.2
 Processor : AArch64 Processor rev 3 (aarch64)
 processor : 0, 1, 2
 BogoMIPS : 26.0
 Features : fp asimd aes pmull sha1 sha2 crc32
 CPU implementer : 0x41
 CPU architecture: AArch64
 CPU variant : 0x0
 CPU part : 0xd03
 CPU revision : 3
 Hardware : MT8161
 Linux version 3.10.65 
 P33 Sony Xperia Z3+ E6533, Quad-core 1.5 GHz & Quad-core 2 GHz Qualcomm
 Snapdragon 810 64-bit CPU
 Screen pixels w x h 1080 x 1776
 Android Build Version 5.0.2
 Processor : AArch64 Processor rev 1 (aarch64)
 processor : 0 to 7
 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32
 CPU implementer : 0x41
 CPU architecture: 8
 CPU variant : 0x1
 CPU part : 0xd07
 CPU revision : 1
 Hardware : Qualcomm Technologies, Inc MSM8994
 Linux version 3.?10.?49
 BS1 BlueStacks Emulator on 3 GHz Phenom via Windows 7
 Screen pixels w x h 1024 x 600
 Android Build Version 2.3.4
 BS2 BlueStacks Emulator on 3.7 GHz Core i7 via Windows 8
 Screen pixels w x h 1440 x 852
 Android Build Version 4.4.2 
 


Roy Longbottom at Linkedin Roy Longbottom January 2016



The Official Internet Home for my Benchmarks is via the link
Roy Longbottom's PC Benchmark Collection


AltStyle によって変換されたページ (->オリジナル) /