SourceForge logo
SourceForge logo
Menu

math-atlas-devel — ATLAS developers' list

You can subscribe to this list here.

2001 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
(8)
Oct
(17)
Nov
(29)
Dec
(30)
2002 Jan
(19)
Feb
(19)
Mar
(29)
Apr
(3)
May
(38)
Jun
(14)
Jul
(6)
Aug
(7)
Sep
(12)
Oct
(6)
Nov
(9)
Dec
2003 Jan
(6)
Feb
(5)
Mar
(8)
Apr
(10)
May
(4)
Jun
(11)
Jul
(5)
Aug
(3)
Sep
(12)
Oct
(1)
Nov
(9)
Dec
(45)
2004 Jan
(7)
Feb
(6)
Mar
(4)
Apr
(7)
May
(7)
Jun
(30)
Jul
(7)
Aug
(6)
Sep
(1)
Oct
(4)
Nov
(18)
Dec
(25)
2005 Jan
(11)
Feb
(10)
Mar
(3)
Apr
(7)
May
Jun
Jul
(1)
Aug
(29)
Sep
(6)
Oct
(8)
Nov
(2)
Dec
(5)
2006 Jan
Feb
(16)
Mar
(2)
Apr
(9)
May
(15)
Jun
(24)
Jul
(10)
Aug
(39)
Sep
(20)
Oct
(8)
Nov
(30)
Dec
(28)
2007 Jan
(1)
Feb
(19)
Mar
(11)
Apr
(3)
May
(12)
Jun
(7)
Jul
(20)
Aug
(9)
Sep
(7)
Oct
(7)
Nov
(8)
Dec
(6)
2008 Jan
(3)
Feb
(8)
Mar
Apr
May
(7)
Jun
(16)
Jul
(38)
Aug
(11)
Sep
(6)
Oct
(2)
Nov
Dec
(4)
2009 Jan
(6)
Feb
(25)
Mar
(13)
Apr
(5)
May
Jun
Jul
(1)
Aug
(8)
Sep
(16)
Oct
(17)
Nov
(2)
Dec
(1)
2010 Jan
(3)
Feb
(3)
Mar
(2)
Apr
(5)
May
Jun
(2)
Jul
Aug
Sep
Oct
(16)
Nov
(53)
Dec
(7)
2011 Jan
(10)
Feb
(37)
Mar
(30)
Apr
(12)
May
(5)
Jun
(14)
Jul
(7)
Aug
(8)
Sep
(37)
Oct
(3)
Nov
(5)
Dec
(60)
2012 Jan
(25)
Feb
(5)
Mar
(4)
Apr
(7)
May
(12)
Jun
(28)
Jul
(28)
Aug
(2)
Sep
(5)
Oct
(6)
Nov
Dec
(17)
2013 Jan
(18)
Feb
(10)
Mar
(30)
Apr
(21)
May
Jun
(10)
Jul
(8)
Aug
Sep
(39)
Oct
(54)
Nov
(8)
Dec
(6)
2014 Jan
(17)
Feb
(14)
Mar
(16)
Apr
(67)
May
(2)
Jun
(8)
Jul
(7)
Aug
(9)
Sep
(6)
Oct
(9)
Nov
(12)
Dec
2015 Jan
(5)
Feb
(9)
Mar
(1)
Apr
(2)
May
Jun
(1)
Jul
(2)
Aug
(6)
Sep
(1)
Oct
(1)
Nov
Dec
(3)
2016 Jan
Feb
Mar
Apr
May
Jun
(3)
Jul
(22)
Aug
Sep
(1)
Oct
Nov
(21)
Dec
2017 Jan
(20)
Feb
Mar
(2)
Apr
May
Jun
(8)
Jul
Aug
(1)
Sep
Oct
Nov
Dec
2018 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
(3)
Nov
Dec

Showing results of 2001

1 2 3 .. 81 > >> (Page 1 of 81)
From: Fulton, B. <bef...@iu...> - 2018年10月10日 13:30:53
Attachments: smime.p7s
I've built this for IU's Carbonate cluster. I'll test it more later, but the
"make install" appeared to want a recursive copy when installing the include
files, so I added that flag. I also tried to run "make time" on a couple of
nodes with slightly different configurations, but it appeared to return the
exact same values - is there a "make timeclean" or some equivalent I could
run?
--
Ben Fulton
Research Technologies
Scientific Applications and Performance Tuning
Indiana University
E-Mail: bef...@iu...
-----Original Message-----
From: R. Clint Whaley <rcw...@iu...> 
Sent: Friday, October 5, 2018 3:30 AM
To: List for developer discussion, NOT SUPPORT.
<mat...@li...>
Subject: [atlas-devel] 3.11.41
I have released 3.11.41. It is a bugfix release, fixing rotmg, assembly
errors on POWER, and a performance regression in small triangle TRMM.
Cheers,
Clint
ATLAS 3.11.41 released 10/05/18, highlights of changes from 3.11.40:
 * Fixed bug in drotmg: https://sourceforge.net/p/math-atlas/bugs/256/
 * Fixed assembly errors for POWER9 (failure to save correct regs)
 * Fixed performance regression for small triangle TRMM
--
******************************************
** R. Clint Whaley, PhD, Assoc Prof, IU
** http://homes.soic.indiana.edu/rcwhaley/
******************************************
_______________________________________________
Math-atlas-devel mailing list
Mat...@li...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
From: R. C. W. <rcw...@iu...> - 2018年10月05日 07:30:36
I have released 3.11.41. It is a bugfix release, fixing rotmg, assembly 
errors on POWER, and a performance regression in small triangle TRMM.
Cheers,
Clint
ATLAS 3.11.41 released 10/05/18, highlights of changes from 3.11.40:
 * Fixed bug in drotmg: https://sourceforge.net/p/math-atlas/bugs/256/
 * Fixed assembly errors for POWER9 (failure to save correct regs)
 * Fixed performance regression for small triangle TRMM
-- 
******************************************
** R. Clint Whaley, PhD, Assoc Prof, IU
** http://homes.soic.indiana.edu/rcwhaley/
******************************************
From: R. C. W. <rcw...@iu...> - 2018年10月03日 00:53:40
Guys,
Sorry to spam both lists and any dups that causes, but since it has 
looked like I've retired, I'm sending this to atlas-devel & announce.
3.11.40 has finally been released. I have actually been working on it 
for most of this time, but, with the move to Indiana factored in, it has 
taken me this long to get the framework working again!
The reason is that we have essentially rewritten the entire way 
microkernels are tuned and accessed in the library. Therefore, the 
majority of tuning code has been touched or rewritten, and since this 
includes all the generation, etc, it took a long while to get things at 
all reliable.
The end goal is that increased microkernel specialization should greatly 
increase our weird-shape and parallel scaling performance.
Right now, you will hopefully see much better serial non-GEMM BLAS 
performance (eg., small-triangle TRSM or TRMM, for instance). Very 
large problems aren't likely to have a huge difference, if prior 
releases supported your architecture well (eg., we've added AVX-512 to 
the code generators, which obviously will hugely improve SkylakeX 
asymptotic performance).
The installs have gone from long to endless, unfortunately. I will fix 
this before stable, but right now searches are all brute-force and 
ignorance while we concentrate on getting the last of the microkernel 
handling solidified. I will attempt to speed up search later, and allow 
for a "no-timing" install from archdefs, so that people on 
already-supported platforms can skip most or all of the tuning (a 
feature many maintainers have long wanted).
For now, terrible install times will just be a feature until we finish 
debugging and publish the new BLAS approach.
The major weakness in the install when ran on arbitrary machines right 
now (other than time) is in some new cache detection code that creates a 
file called atlas_cache.h. This code dies on several machines, and I 
haven't had time to track down details. However, if it fails for you, 
open up a tracker item and I can tell you how to proceed beyond it even 
before fixing the code in question.
Hopefully, this release should be purely faster than any other that came 
before, but if you spot performance regressions, please let us know. We 
are not yet always using the correct microkernel (even when the library 
has built it), because our selection algorithm work is awaiting the 
finishing of the new tuning strategy.
Eventually, ATLAS will be able to not only tune microkernels to make the 
BLAS/LAPACK, but specialized operations for people wanting to avoid BLAS 
overheads (at cost of calling messy microkernels; think of things like 
tensor algebra with very small shapes that need to scale, perhaps 
machine learning, etc.). This allows you to have detailed cache control 
necessary to scale when the problem size isn't large enough to dominate 
low-order terms, and thus make BLAS API OK.
ChangeLog (which has almost no detail on massive changes) is below.
Cheers,
Clint
ATLAS 3.11.40 released 10/02/18, highlights of changes from 3.11.39:
 * Basically a rewrite of all L3BLAS and LAPACK tuning framework:
 + Complete rewrite of all searches to allow different "views" of 
kernels
 for maximum performance for all-BLAS usage; present 
implementation very
 slow even with archdefs, will need to be speed up before stable
 + Complete rewrite of gemm kernel choice mechanism
 + Complete rewrite of all BLAS handling for much improved 
small/medium perf
 via greater use of microkernels
 * Addition of core count to archdefs, because this usually increases 
block
 factors when maximizing performance
 * Addition of -ansi flag to avoid C changes borking include files
 * Archdef support for host of modern Intel/AMD + POWER9:
 - Corei264AVXp16, Corei3EP64AVXMACp36, Corei4X64AVXZp18,
 - AMD64K10h64SSE3p32, AMDRyzen64AVXMACp[8,16,64]
 - ARM64xgene164p8, ARM64thund64p48
 - POWER964LEVSXp8
 * Addition of cpuid-based cache detection for Intel & AMD x86 machines
 - Presently gets wrong answer on some machines, where shared caches
 are either multiplied or divided by P inappropriately
 * Beginning of rewrite of generic cache detection
 * Fixed bug where names like "c99-gcc" preferred over "gcc"
 * Added -Si indthr 1 option to autoprobe for aliased thread IDs
 + Presently, only supported on ARM64 & x86 with at least SSE2
 * Complete rewrite of gemm kernel indexing to compact data structures
 and minimize cache pollution
From: R. C. W. <rcw...@iu...> - 2017年08月17日 13:15:37
Guys,
I am now at Indiana University, having just completed my move, and am 
presently preparing to teach next week. This is a reason for the delay 
in responding to the several 3.10 patches/questions just recently. I am 
keeping your e-mails, and will respond as soon as I get on top of the 
new place and its processes.
The recent delay in developer releases is because I rewrote my 
microkernel handling for greater efficiency, and it has taken a *long* 
time to get it working again. We are presently working on greatly 
improving our non-GEMM small-case performance, which I think is going to 
be worth the wait when I get it out.
Anyway, I'm still working on both stable & developer, and will respond 
as soon as I can.
Cheers,
Clint
-- 
******************************************
** R. Clint Whaley, PhD, Assoc Prof, IU
** http://homes.soic.indiana.edu/rcwhaley/
******************************************
From: R. C. W. <wh...@my...> - 2017年06月30日 00:14:15
>
> The implementation of HT has improved over the years, so please don't 
> assume results obtained on older processors are applicable to the 
> current ones. I used to be a HT skeptic but almost everything runs 
> faster with them on Haswell and later, particularly the client parts 
> (i.e. Core series as opposed to Xeon).
Unless they have changed the definition of what HT does, I do not see a 
theoretical way to avoid the cache problem.
>
> You might try running an actual application, where you get a mix
> of kernels. This tends to stress the cache more, and can
> sometimes expose the downside of HT.
>
>
> On the other hand, idle HTs help with OS interrupts and other stuff 
> that happens quite a bit in an HPC environment once one starts using 
> MPI etc. This is one of the reasons I encourage everyone to enable HT 
> in the BIOS even if their applications don't use them.
If the OS interrupts, its interrupting all threads, so I don't think I'm 
following this line of thought. Maybe you mean that if you have a huge 
stack of threads to be run, using HT you have 2 or 4 slots to round 
robin into once interrupted?
>
> I remember finding slight speedup in some case leading me to think
> HT was helpful, but then I had performance collapses other places,
> which led to me to recommend turning it off (or using affinity to
> avoid it, like MKL is doing, if you can't turn it off) to maximize
> performance.
>
>
> If nothing else, HT doubles the number of threads, which hurts any 
> part of a code that scales poorly, and it makes it harder to manage 
> affinity. I had to spend quite a bit of time helping users with SMT 
> (2-4 HW threads per core) on Blue Gene/Q in my old job.
>
> So, for instance, take LAPACK or ATLAS LU or QR (or your own
> version) and hook them up to the two BLAS. Does the non-MKL
> HT-liking kernel get anywhere close to MKL performance despite
> it's gemm looking as good with HT, or does it collapse its
> performance while MKL maintains?
>
>
> I don't have test driver for those already so I'm afraid I'm not going 
> to punt on those experiments. However, if somebody else posts the 
> code, I'll certainly run it and post results for generally available 
> hardware.
ATLAS comes with timers for any or all of these. They are built to time 
other's libs too.
For instance, set BLASlib to MKL, set FLAPACKlib to your f77 lapack, and 
"make xdtlatime_fl_sb" will time using MKL + LAPACK. Switch BLASlib to 
bliss now, remake, voila.
> My guess is the MKL group got the same "HT not-reliable, non-HT
> is" results, and that's why its behaving in this way.
>
>
> Maybe. In any case, it simplifies the design space to not have to 
> think about >1 threads sharing an L1.
L1 is not the problem on modern machines. As you scale like with Xeon-E 
series you need to use every scrap of cache, including shared. If you 
use the full scale of something like 12-cores per shared cache, I 
believe you will see substantial slowdowns from HT.
Cheers,
Clint
>
> Jeff
>
> Thanks for results!
> Clint
>
> On 06/29/2017 05:56 PM, Hammond, Jeff R wrote:
>
> Good catch. strace shows only 35 calls to clone in both cases
> with MKL. I didn’t know that MKL was doing these tricks.
>
> However, I tested another DGEMM implementation that supports
> AVX2 and it uses all of the HTs and it performs on par with
> MKL, but only when HT is used.
>
> Jeff
>
>
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
> KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x
> 2>&1 | head -n5000 | grep -c clone
> 71
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
> KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x
> 2>&1 | head -n5000 | grep -c clone
> 35
>
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep
> -v "%"
> blis_dgemm_nn_rrr 384 384 384 204.027 
> 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 650.820 
> 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 816.355 
> 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 835.650 
> 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 832.179 
> 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 863.123 
> 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 844.502 
> 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 860.262 
> 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 851.694 
> 5.80e-18 PASS
> blis_dgemm_nn_rrr 3840 3840 3840 856.526 
> 6.79e-18 PASS
>
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep
> -v "%"
> blis_dgemm_nn_rrr 384 384 384 161.331 
> 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 437.967 
> 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 545.498 
> 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 616.338 
> 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 606.650 
> 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 611.153 
> 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 603.314 
> 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 631.292 
> 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 625.833 
> 5.80e-18 PASS
>
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep
> -v "%"
> blis_dgemm_nn_rrr 384 384 384 159.789 
> 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 443.810 
> 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 536.077 
> 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 596.069 
> 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 595.763 
> 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 616.531 
> 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 591.823 
> 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 615.153 
> 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 621.714 
> 5.80e-18 PASS
>
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep
> -v "%"
> blis_dgemm_nn_rrr 384 384 384 189.615 
> 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 423.504 
> 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 445.424 
> 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 444.830 
> 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 442.893 
> 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 445.979 
> 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 445.694 
> 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 451.026 
> 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 454.909 
> 5.80e-18 PASS
>
>
> On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley
> <rcw...@ls...
> <mailto:rcw...@ls...><mailto:rcw...@ls...
> <mailto:rcw...@ls...>>> wrote:
> Jeff,
>
> Have you run a thread monitor to see if MKL is simply not
> using the hyperthreading regardless of whether it is on or off
> in BIOS?
>
> You also may want to try something like LU.
>
> Cheers,
> Clint
>
>
> On 06/29/2017 05:15 PM, Jeff Hammond wrote:
> I don't see any negative impact from using HT relative to not
> using HT, at
> least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5%
> gain here is
> irrelevant and may be due to thermal effects (this box is in
> my cubicle,
> not an air-conditioned machine room).
>
> $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
> Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
>
> HT on
>
> $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
> Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
>
> I would be interested to see folks post data to support the
> argument
> against HT.
>
> Jeff
>
> On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
> mat...@li...
> <mailto:mat...@li...><mailto:mat...@li...
> <mailto:mat...@li...>>> wrote:
>
> Thank you very much for quick response. Just to check if my
> understanding
> is correct :
>
> 1. By turning off cpuid in bios, I only need to use -t N to
> build Atlas
> right?
>
> 2. The N in -t N is the total number of threads on the
> machine, not per
> Cpu right ?
>
> 3. One more question I have is, how to set the correct -t N
> for mpi based
> application.
> Let's say on the 2-cpu machine with 4 cores per CPU,
> should I use -t
> 4 or -t 8 if I rum my application with 2 mpi processes :
> mpirun -n 2 myprogram
>
> Many thanks !
>
> Sent from Yahoo Mail on Android
>
> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
> <wh...@my...
> <mailto:wh...@my...><mailto:wh...@my...
> <mailto:wh...@my...>>> wrote:
> Hyperthreading is an optimization aimed at addressing poorly
> optimized
> code. The idea is that most codes cannot drive the backend
> hardware
> (ALU/FPU, etc) at the maximal rate, so if you duplicate
> registers you
> can, amongst several threads, find enough work to keep the
> backend busy.
>
> ATLAS (or any optimized linear algebra library) already runs
> the FPU at
> its maximal rate supported by the cache architecture after
> cache blocking.
>
> If you can already drive the backend at >90% of peak, then
> hyperthreading can actually *lose* you performance, as the
> threads bring
> conflicting data in the cache.
>
> It's usually not a night and day difference, but I haven't
> measured it
> in the huge blocking era used by recent developer releases (it
> may be
> worse there).
>
> My general recommendation is turn off hyperthreading for highly
> optimized codes, and turn it on for relatively unoptimized codes.
>
> As to which core IDs correspond to the physical cores, that
> varies by
> machine. On x86, you can use CPUID to determine that if you are
> super-knowledgeable. I usually just turn it off in the BIOS,
> because I
> don't like something that may thrash my cache running, even if
> it might
> occasionally help :)
>
> Cheers,
> Clint
>
> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> Hello,Would like go check if my understanding is correct for
> compiling
> Atlas on a machine that has multiple CPUs and hyperthreading.
> I have two types of machine:
> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core-
> 2 CPU,
> each with 8 Cores, hyperthreaded, 2 threads per core
> So when I compile Atlas, is it correct that I should use:
> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the
> affinity ID
> is from 0-7 and 0-15).
> That means the number 8 or 16 is the total cores on the
> machine, not
> number of cores per CPU. Am I correct ?
> I also read somewhere saying that Atlas supports
> Hyperthreading. What
> does this mean ?
> Does this mean:1. I do not need to disable hyperthreading in
> BIOS (no
> performance difference whether it is enabled or disabled, as
> long as the
> number of threads and affinity IDs are set correctly when
> compiling
> Atlas)2. Or I can make use of the hyperthread, that is, -tl 16
> and -tl 32 ?
> Thank you very much,
> lixin
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> <mailto:Mat...@li...><mailto:Mat...@li...
> <mailto:Mat...@li...>>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel>
>
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> <mailto:Mat...@li...><mailto:Mat...@li...
> <mailto:Mat...@li...>>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> <mailto:Mat...@li...><mailto:Mat...@li...
> <mailto:Mat...@li...>>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel>
>
>
>
> --
> Jeff Hammond
> jef...@gm...
> <mailto:jef...@gm...><mailto:jef...@gm...
> <mailto:jef...@gm...>>
> http://jeffhammond.github.io/
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> <mailto:Mat...@li...><mailto:Mat...@li...
> <mailto:Mat...@li...>>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel>
>
> --
> **********************************************************************
> ** R. Clint Whaley, PhD * Assoc Prof, LSU *
> www.csc.lsu.edu/~whaley
> <http://www.csc.lsu.edu/%7Ewhaley><http://www.csc.lsu.edu/~whaley
> <http://www.csc.lsu.edu/%7Ewhaley>> **
> **********************************************************************
>
>
>
>
> --
> Jeff Hammond
> jef...@gm...
> <mailto:jef...@gm...><mailto:jef...@gm...
> <mailto:jef...@gm...>>
> http://jeffhammond.github.io/
>
>
>
> -- 
> **********************************************************************
> ** R. Clint Whaley, PhD * Assoc Prof, LSU *
> www.csc.lsu.edu/~whaley <http://www.csc.lsu.edu/%7Ewhaley> **
> **********************************************************************
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> <mailto:Mat...@li...>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> <https://lists.sourceforge.net/lists/listinfo/math-atlas-devel>
>
>
>
>
> -- 
> Jeff Hammond
> jef...@gm... <mailto:jef...@gm...>
> http://jeffhammond.github.io/
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
From: Jeff H. <jef...@gm...> - 2017年06月29日 23:33:40
On Thu, Jun 29, 2017 at 4:10 PM, R. Clint Whaley <rcw...@ls...> wrote:
> Yeah, if it can't get that perf w/o hyperthreading, its not fully tuned.
>
>
Agreed. BLIS is just a framework and I'm using the default blocking
parameters. I know from discussions with Greg Henry that scaling all the
way out on the high-core-count Xeon processors requires some algorithm
changes. I expect that if I play around with the knobs of BLIS, it will
perform optimally with 1 HT per core.
> Back in day when I investigated HT, the problem really is in cache
> stomping, as two threads compete for the same cache. This makes the
> effects unpredictable (if the cache wasn't being fully utilized, maybe no
> effect, if you get lucky on the replacement, maybe tiny effect, and if you
> get unlucky, an truly bad dropoff).
>
>
The implementation of HT has improved over the years, so please don't
assume results obtained on older processors are applicable to the current
ones. I used to be a HT skeptic but almost everything runs faster with
them on Haswell and later, particularly the client parts (i.e. Core series
as opposed to Xeon).
> You might try running an actual application, where you get a mix of
> kernels. This tends to stress the cache more, and can sometimes expose the
> downside of HT.
>
>
On the other hand, idle HTs help with OS interrupts and other stuff that
happens quite a bit in an HPC environment once one starts using MPI etc.
This is one of the reasons I encourage everyone to enable HT in the BIOS
even if their applications don't use them.
> I remember finding slight speedup in some case leading me to think HT was
> helpful, but then I had performance collapses other places, which led to me
> to recommend turning it off (or using affinity to avoid it, like MKL is
> doing, if you can't turn it off) to maximize performance.
>
>
If nothing else, HT doubles the number of threads, which hurts any part of
a code that scales poorly, and it makes it harder to manage affinity. I
had to spend quite a bit of time helping users with SMT (2-4 HW threads per
core) on Blue Gene/Q in my old job.
> So, for instance, take LAPACK or ATLAS LU or QR (or your own version) and
> hook them up to the two BLAS. Does the non-MKL HT-liking kernel get
> anywhere close to MKL performance despite it's gemm looking as good with
> HT, or does it collapse its performance while MKL maintains?
>
>
I don't have test driver for those already so I'm afraid I'm not going to
punt on those experiments. However, if somebody else posts the code, I'll
certainly run it and post results for generally available hardware.
> My guess is the MKL group got the same "HT not-reliable, non-HT is"
> results, and that's why its behaving in this way.
>
>
Maybe. In any case, it simplifies the design space to not have to think
about >1 threads sharing an L1.
Jeff
> Thanks for results!
> Clint
>
> On 06/29/2017 05:56 PM, Hammond, Jeff R wrote:
>
>> Good catch. strace shows only 35 calls to clone in both cases with MKL.
>> I didn’t know that MKL was doing these tricks.
>>
>> However, I tested another DGEMM implementation that supports AVX2 and it
>> uses all of the HTs and it performs on par with MKL, but only when HT is
>> used.
>>
>> Jeff
>>
>>
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
>> KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x 2>&1 |
>> head -n5000 | grep -c clone
>> 71
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
>> KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x 2>&1 |
>> head -n5000 | grep -c clone
>> 35
>>
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
>> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%"
>> blis_dgemm_nn_rrr 384 384 384 204.027 8.27e-18
>> PASS
>> blis_dgemm_nn_rrr 768 768 768 650.820 5.36e-18
>> PASS
>> blis_dgemm_nn_rrr 1152 1152 1152 816.355 4.40e-18
>> PASS
>> blis_dgemm_nn_rrr 1536 1536 1536 835.650 7.02e-18
>> PASS
>> blis_dgemm_nn_rrr 1920 1920 1920 832.179 9.96e-18
>> PASS
>> blis_dgemm_nn_rrr 2304 2304 2304 863.123 6.28e-18
>> PASS
>> blis_dgemm_nn_rrr 2688 2688 2688 844.502 8.28e-18
>> PASS
>> blis_dgemm_nn_rrr 3072 3072 3072 860.262 9.92e-18
>> PASS
>> blis_dgemm_nn_rrr 3456 3456 3456 851.694 5.80e-18
>> PASS
>> blis_dgemm_nn_rrr 3840 3840 3840 856.526 6.79e-18
>> PASS
>>
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
>> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%"
>> blis_dgemm_nn_rrr 384 384 384 161.331 8.27e-18
>> PASS
>> blis_dgemm_nn_rrr 768 768 768 437.967 5.36e-18
>> PASS
>> blis_dgemm_nn_rrr 1152 1152 1152 545.498 4.40e-18
>> PASS
>> blis_dgemm_nn_rrr 1536 1536 1536 616.338 7.02e-18
>> PASS
>> blis_dgemm_nn_rrr 1920 1920 1920 606.650 9.96e-18
>> PASS
>> blis_dgemm_nn_rrr 2304 2304 2304 611.153 6.28e-18
>> PASS
>> blis_dgemm_nn_rrr 2688 2688 2688 603.314 8.28e-18
>> PASS
>> blis_dgemm_nn_rrr 3072 3072 3072 631.292 9.92e-18
>> PASS
>> blis_dgemm_nn_rrr 3456 3456 3456 625.833 5.80e-18
>> PASS
>>
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72
>> KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%"
>> blis_dgemm_nn_rrr 384 384 384 159.789 8.27e-18
>> PASS
>> blis_dgemm_nn_rrr 768 768 768 443.810 5.36e-18
>> PASS
>> blis_dgemm_nn_rrr 1152 1152 1152 536.077 4.40e-18
>> PASS
>> blis_dgemm_nn_rrr 1536 1536 1536 596.069 7.02e-18
>> PASS
>> blis_dgemm_nn_rrr 1920 1920 1920 595.763 9.96e-18
>> PASS
>> blis_dgemm_nn_rrr 2304 2304 2304 616.531 6.28e-18
>> PASS
>> blis_dgemm_nn_rrr 2688 2688 2688 591.823 8.28e-18
>> PASS
>> blis_dgemm_nn_rrr 3072 3072 3072 615.153 9.92e-18
>> PASS
>> blis_dgemm_nn_rrr 3456 3456 3456 621.714 5.80e-18
>> PASS
>>
>> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36
>> KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%"
>> blis_dgemm_nn_rrr 384 384 384 189.615 8.27e-18
>> PASS
>> blis_dgemm_nn_rrr 768 768 768 423.504 5.36e-18
>> PASS
>> blis_dgemm_nn_rrr 1152 1152 1152 445.424 4.40e-18
>> PASS
>> blis_dgemm_nn_rrr 1536 1536 1536 444.830 7.02e-18
>> PASS
>> blis_dgemm_nn_rrr 1920 1920 1920 442.893 9.96e-18
>> PASS
>> blis_dgemm_nn_rrr 2304 2304 2304 445.979 6.28e-18
>> PASS
>> blis_dgemm_nn_rrr 2688 2688 2688 445.694 8.28e-18
>> PASS
>> blis_dgemm_nn_rrr 3072 3072 3072 451.026 9.92e-18
>> PASS
>> blis_dgemm_nn_rrr 3456 3456 3456 454.909 5.80e-18
>> PASS
>>
>>
>> On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley <rcw...@ls...
>> <mailto:rcw...@ls...>> wrote:
>> Jeff,
>>
>> Have you run a thread monitor to see if MKL is simply not using the
>> hyperthreading regardless of whether it is on or off in BIOS?
>>
>> You also may want to try something like LU.
>>
>> Cheers,
>> Clint
>>
>>
>> On 06/29/2017 05:15 PM, Jeff Hammond wrote:
>> I don't see any negative impact from using HT relative to not using HT, at
>> least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is
>> irrelevant and may be due to thermal effects (this box is in my cubicle,
>> not an air-conditioned machine room).
>>
>> $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
>> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
>> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
>> Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
>> Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
>>
>> HT on
>>
>> $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
>> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
>> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
>> Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
>> Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
>>
>> I would be interested to see folks post data to support the argument
>> against HT.
>>
>> Jeff
>>
>> On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
>> mat...@li...<mailto:math-atlas-
>> de...@li...>> wrote:
>>
>> Thank you very much for quick response. Just to check if my understanding
>> is correct :
>>
>> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas
>> right?
>>
>> 2. The N in -t N is the total number of threads on the machine, not per
>> Cpu right ?
>>
>> 3. One more question I have is, how to set the correct -t N for mpi based
>> application.
>> Let's say on the 2-cpu machine with 4 cores per CPU, should I use
>> -t
>> 4 or -t 8 if I rum my application with 2 mpi processes :
>> mpirun -n 2 myprogram
>>
>> Many thanks !
>>
>> Sent from Yahoo Mail on Android
>>
>> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
>> <wh...@my...<mailto:wh...@my...>> wrote:
>> Hyperthreading is an optimization aimed at addressing poorly optimized
>> code. The idea is that most codes cannot drive the backend hardware
>> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you
>> can, amongst several threads, find enough work to keep the backend busy.
>>
>> ATLAS (or any optimized linear algebra library) already runs the FPU at
>> its maximal rate supported by the cache architecture after cache blocking.
>>
>> If you can already drive the backend at >90% of peak, then
>> hyperthreading can actually *lose* you performance, as the threads bring
>> conflicting data in the cache.
>>
>> It's usually not a night and day difference, but I haven't measured it
>> in the huge blocking era used by recent developer releases (it may be
>> worse there).
>>
>> My general recommendation is turn off hyperthreading for highly
>> optimized codes, and turn it on for relatively unoptimized codes.
>>
>> As to which core IDs correspond to the physical cores, that varies by
>> machine. On x86, you can use CPUID to determine that if you are
>> super-knowledgeable. I usually just turn it off in the BIOS, because I
>> don't like something that may thrash my cache running, even if it might
>> occasionally help :)
>>
>> Cheers,
>> Clint
>>
>> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
>> Hello,Would like go check if my understanding is correct for compiling
>> Atlas on a machine that has multiple CPUs and hyperthreading.
>> I have two types of machine:
>> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU,
>> each with 8 Cores, hyperthreaded, 2 threads per core
>> So when I compile Atlas, is it correct that I should use:
>> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID
>> is from 0-7 and 0-15).
>> That means the number 8 or 16 is the total cores on the machine, not
>> number of cores per CPU. Am I correct ?
>> I also read somewhere saying that Atlas supports Hyperthreading. What
>> does this mean ?
>> Does this mean:1. I do not need to disable hyperthreading in BIOS (no
>> performance difference whether it is enabled or disabled, as long as the
>> number of threads and affinity IDs are set correctly when compiling
>> Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32
>> ?
>> Thank you very much,
>> lixin
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...<mailto:Math-atlas-
>> de...@li...>
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...<mailto:Math-atlas-
>> de...@li...>
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...<mailto:Math-atlas-
>> de...@li...>
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>>
>>
>> --
>> Jeff Hammond
>> jef...@gm...<mailto:jef...@gm...>
>> http://jeffhammond.github.io/
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...<mailto:Math-atlas-
>> de...@li...>
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>> --
>> **********************************************************************
>> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley<
>> http://www.csc.lsu.edu/~whaley> **
>> **********************************************************************
>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jef...@gm...<mailto:jef...@gm...>
>> http://jeffhammond.github.io/
>>
>>
>>
> --
> **********************************************************************
> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
> **********************************************************************
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
-- 
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
From: R. C. W. <rcw...@ls...> - 2017年06月29日 23:14:53
just realized my reply only went to Jeff.
-------- Forwarded Message --------
Subject: Re: [atlas-devel] Compiling Atlas with hyperthreading
Date: 2017年6月29日 17:22:05 -0500
From: R. Clint Whaley <rcw...@ls...>
To: Jeff Hammond <jef...@gm...>
Jeff,
Have you run a thread monitor to see if MKL is simply not using the 
hyperthreading regardless of whether it is on or off in BIOS?
You also may want to try something like LU.
Cheers,
Clint
On 06/29/2017 05:15 PM, Jeff Hammond wrote:
> I don't see any negative impact from using HT relative to not using HT, at
> least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is
> irrelevant and may be due to thermal effects (this box is in my cubicle,
> not an air-conditioned machine room).
> 
> $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
> Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
> 
> HT on
> 
> $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
> Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
> 
> I would be interested to see folks post data to support the argument
> against HT.
> 
> Jeff
> 
> On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
> mat...@li...> wrote:
>>
>> Thank you very much for quick response. Just to check if my understanding
> is correct :
>>
>> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas
> right?
>>
>> 2. The N in -t N is the total number of threads on the machine, not per
> Cpu right ?
>>
>> 3. One more question I have is, how to set the correct -t N for mpi based
> application.
>> Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t
> 4 or -t 8 if I rum my application with 2 mpi processes :
>> mpirun -n 2 myprogram
>>
>> Many thanks !
>>
>> Sent from Yahoo Mail on Android
>>
>> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
>> <wh...@my...> wrote:
>> Hyperthreading is an optimization aimed at addressing poorly optimized
>> code. The idea is that most codes cannot drive the backend hardware
>> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you
>> can, amongst several threads, find enough work to keep the backend busy.
>>
>> ATLAS (or any optimized linear algebra library) already runs the FPU at
>> its maximal rate supported by the cache architecture after cache blocking.
>>
>> If you can already drive the backend at >90% of peak, then
>> hyperthreading can actually *lose* you performance, as the threads bring
>> conflicting data in the cache.
>>
>> It's usually not a night and day difference, but I haven't measured it
>> in the huge blocking era used by recent developer releases (it may be
>> worse there).
>>
>> My general recommendation is turn off hyperthreading for highly
>> optimized codes, and turn it on for relatively unoptimized codes.
>>
>> As to which core IDs correspond to the physical cores, that varies by
>> machine. On x86, you can use CPUID to determine that if you are
>> super-knowledgeable. I usually just turn it off in the BIOS, because I
>> don't like something that may thrash my cache running, even if it might
>> occasionally help :)
>>
>> Cheers,
>> Clint
>>
>> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
>>> Hello,Would like go check if my understanding is correct for compiling
> Atlas on a machine that has multiple CPUs and hyperthreading.
>>> I have two types of machine:
>>> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU,
> each with 8 Cores, hyperthreaded, 2 threads per core
>>> So when I compile Atlas, is it correct that I should use:
>>> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID
> is from 0-7 and 0-15).
>>> That means the number 8 or 16 is the total cores on the machine, not
> number of cores per CPU. Am I correct ?
>>> I also read somewhere saying that Atlas supports Hyperthreading. What
> does this mean ?
>>> Does this mean:1. I do not need to disable hyperthreading in BIOS (no
> performance difference whether it is enabled or disabled, as long as the
> number of threads and affinity IDs are set correctly when compiling
> Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
>>> Thank you very much,
>>> lixin
>>>
>>>
>>>
>>>
> ------------------------------------------------------------------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>
>>>
>>>
>>> _______________________________________________
>>> Math-atlas-devel mailing list
>>> Mat...@li...
>>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>>>
>>
>>
>>
> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
>>
>>
> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>>
> 
> 
> 
> --
> Jeff Hammond
> jef...@gm...
> http://jeffhammond.github.io/
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: R. C. W. <rcw...@ls...> - 2017年06月29日 23:10:56
Yeah, if it can't get that perf w/o hyperthreading, its not fully tuned.
Back in day when I investigated HT, the problem really is in cache 
stomping, as two threads compete for the same cache. This makes the 
effects unpredictable (if the cache wasn't being fully utilized, maybe 
no effect, if you get lucky on the replacement, maybe tiny effect, and 
if you get unlucky, an truly bad dropoff).
You might try running an actual application, where you get a mix of 
kernels. This tends to stress the cache more, and can sometimes expose 
the downside of HT.
I remember finding slight speedup in some case leading me to think HT 
was helpful, but then I had performance collapses other places, which 
led to me to recommend turning it off (or using affinity to avoid it, 
like MKL is doing, if you can't turn it off) to maximize performance.
So, for instance, take LAPACK or ATLAS LU or QR (or your own version) 
and hook them up to the two BLAS. Does the non-MKL HT-liking kernel get 
anywhere close to MKL performance despite it's gemm looking as good with 
HT, or does it collapse its performance while MKL maintains?
My guess is the MKL group got the same "HT not-reliable, non-HT is" 
results, and that's why its behaving in this way.
Thanks for results!
Clint
On 06/29/2017 05:56 PM, Hammond, Jeff R wrote:
> Good catch. strace shows only 35 calls to clone in both cases with MKL. I didn’t know that MKL was doing these tricks.
> 
> However, I tested another DGEMM implementation that supports AVX2 and it uses all of the HTs and it performs on par with MKL, but only when HT is used.
> 
> Jeff
> 
> 
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=compact,granularity=fine strace ../test_libblis.x 2>&1 | head -n5000 | grep -c clone
> 71
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine strace ../test_libblis.x 2>&1 | head -n5000 | grep -c clone
> 35
> 
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%"
> blis_dgemm_nn_rrr 384 384 384 204.027 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 650.820 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 816.355 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 835.650 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 832.179 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 863.123 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 844.502 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 860.262 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 851.694 5.80e-18 PASS
> blis_dgemm_nn_rrr 3840 3840 3840 856.526 6.79e-18 PASS
> 
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%"
> blis_dgemm_nn_rrr 384 384 384 161.331 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 437.967 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 545.498 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 616.338 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 606.650 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 611.153 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 603.314 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 631.292 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 625.833 5.80e-18 PASS
> 
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine ../test_libblis.x | grep -v "%"
> blis_dgemm_nn_rrr 384 384 384 159.789 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 443.810 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 536.077 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 596.069 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 595.763 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 616.531 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 591.823 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 615.153 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 621.714 5.80e-18 PASS
> 
> [jrhammon@esgmonster testsuite]$ OMP_NUM_THREADS=36 KMP_AFFINITY=compact,granularity=fine ../test_libblis.x | grep -v "%"
> blis_dgemm_nn_rrr 384 384 384 189.615 8.27e-18 PASS
> blis_dgemm_nn_rrr 768 768 768 423.504 5.36e-18 PASS
> blis_dgemm_nn_rrr 1152 1152 1152 445.424 4.40e-18 PASS
> blis_dgemm_nn_rrr 1536 1536 1536 444.830 7.02e-18 PASS
> blis_dgemm_nn_rrr 1920 1920 1920 442.893 9.96e-18 PASS
> blis_dgemm_nn_rrr 2304 2304 2304 445.979 6.28e-18 PASS
> blis_dgemm_nn_rrr 2688 2688 2688 445.694 8.28e-18 PASS
> blis_dgemm_nn_rrr 3072 3072 3072 451.026 9.92e-18 PASS
> blis_dgemm_nn_rrr 3456 3456 3456 454.909 5.80e-18 PASS
> 
> On Thu, Jun 29, 2017 at 3:22 PM, R. Clint Whaley <rcw...@ls...<mailto:rcw...@ls...>> wrote:
> Jeff,
> 
> Have you run a thread monitor to see if MKL is simply not using the hyperthreading regardless of whether it is on or off in BIOS?
> 
> You also may want to try something like LU.
> 
> Cheers,
> Clint
> 
> 
> On 06/29/2017 05:15 PM, Jeff Hammond wrote:
> I don't see any negative impact from using HT relative to not using HT, at
> least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is
> irrelevant and may be due to thermal effects (this box is in my cubicle,
> not an air-conditioned machine room).
> 
> $ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
> Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
> 
> HT on
> 
> $ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
> ./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
> BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
> Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
> Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
> 
> I would be interested to see folks post data to support the argument
> against HT.
> 
> Jeff
> 
> On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
> mat...@li...<mailto:mat...@li...>> wrote:
> 
> Thank you very much for quick response. Just to check if my understanding
> is correct :
> 
> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas
> right?
> 
> 2. The N in -t N is the total number of threads on the machine, not per
> Cpu right ?
> 
> 3. One more question I have is, how to set the correct -t N for mpi based
> application.
> Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t
> 4 or -t 8 if I rum my application with 2 mpi processes :
> mpirun -n 2 myprogram
> 
> Many thanks !
> 
> Sent from Yahoo Mail on Android
> 
> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
> <wh...@my...<mailto:wh...@my...>> wrote:
> Hyperthreading is an optimization aimed at addressing poorly optimized
> code. The idea is that most codes cannot drive the backend hardware
> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you
> can, amongst several threads, find enough work to keep the backend busy.
> 
> ATLAS (or any optimized linear algebra library) already runs the FPU at
> its maximal rate supported by the cache architecture after cache blocking.
> 
> If you can already drive the backend at >90% of peak, then
> hyperthreading can actually *lose* you performance, as the threads bring
> conflicting data in the cache.
> 
> It's usually not a night and day difference, but I haven't measured it
> in the huge blocking era used by recent developer releases (it may be
> worse there).
> 
> My general recommendation is turn off hyperthreading for highly
> optimized codes, and turn it on for relatively unoptimized codes.
> 
> As to which core IDs correspond to the physical cores, that varies by
> machine. On x86, you can use CPUID to determine that if you are
> super-knowledgeable. I usually just turn it off in the BIOS, because I
> don't like something that may thrash my cache running, even if it might
> occasionally help :)
> 
> Cheers,
> Clint
> 
> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> Hello,Would like go check if my understanding is correct for compiling
> Atlas on a machine that has multiple CPUs and hyperthreading.
> I have two types of machine:
> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU,
> each with 8 Cores, hyperthreaded, 2 threads per core
> So when I compile Atlas, is it correct that I should use:
> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID
> is from 0-7 and 0-15).
> That means the number 8 or 16 is the total cores on the machine, not
> number of cores per CPU. Am I correct ?
> I also read somewhere saying that Atlas supports Hyperthreading. What
> does this mean ?
> Does this mean:1. I do not need to disable hyperthreading in BIOS (no
> performance difference whether it is enabled or disabled, as long as the
> number of threads and affinity IDs are set correctly when compiling
> Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
> Thank you very much,
> lixin
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...<mailto:Mat...@li...>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...<mailto:Mat...@li...>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...<mailto:Mat...@li...>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
> 
> 
> --
> Jeff Hammond
> jef...@gm...<mailto:jef...@gm...>
> http://jeffhammond.github.io/
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...<mailto:Mat...@li...>
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
> --
> **********************************************************************
> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley<http://www.csc.lsu.edu/~whaley> **
> **********************************************************************
> 
> 
> 
> 
> --
> Jeff Hammond
> jef...@gm...<mailto:jef...@gm...>
> http://jeffhammond.github.io/
> 
> 
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: Jeff H. <jef...@gm...> - 2017年06月29日 22:16:03
I don't see any negative impact from using HT relative to not using HT, at
least with MKL DGEMM on E5-2699v3 (Haswell). The 0.1-0.5% gain here is
irrelevant and may be due to thermal effects (this box is in my cubicle,
not an air-conditioned machine room).
$ OMP_NUM_THREADS=36 KMP_AFFINITY=scatter,granularity=fine
./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
 BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
Intel MKL (parallel) 15360 15360 1536 0.8582699 844.4612765
Intel MKL (parallel) 15360 15360 1536 0.8627163 840.1089930
HT on
$ OMP_NUM_THREADS=72 KMP_AFFINITY=scatter,granularity=fine
./dgemm_perf_PMKL.x $((384*40)) $((384*40)) $((384*4))
 BLAS_NAME dim1 dim2 dim3 seconds Gflop/s
Intel MKL (parallel) 15360 15360 1536 0.8636520 839.1988073
Intel MKL (parallel) 15360 15360 1536 0.8644268 838.4465853
I would be interested to see folks post data to support the argument
against HT.
Jeff
On Thu, Jun 29, 2017 at 7:57 AM, lixin chu via Math-atlas-devel <
mat...@li...> wrote:
>
> Thank you very much for quick response. Just to check if my understanding
is correct :
>
> 1. By turning off cpuid in bios, I only need to use -t N to build Atlas
right?
>
> 2. The N in -t N is the total number of threads on the machine, not per
Cpu right ?
>
> 3. One more question I have is, how to set the correct -t N for mpi based
application.
> Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t
4 or -t 8 if I rum my application with 2 mpi processes :
> mpirun -n 2 myprogram
>
> Many thanks !
>
> Sent from Yahoo Mail on Android
>
> On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley
> <wh...@my...> wrote:
> Hyperthreading is an optimization aimed at addressing poorly optimized
> code. The idea is that most codes cannot drive the backend hardware
> (ALU/FPU, etc) at the maximal rate, so if you duplicate registers you
> can, amongst several threads, find enough work to keep the backend busy.
>
> ATLAS (or any optimized linear algebra library) already runs the FPU at
> its maximal rate supported by the cache architecture after cache blocking.
>
> If you can already drive the backend at >90% of peak, then
> hyperthreading can actually *lose* you performance, as the threads bring
> conflicting data in the cache.
>
> It's usually not a night and day difference, but I haven't measured it
> in the huge blocking era used by recent developer releases (it may be
> worse there).
>
> My general recommendation is turn off hyperthreading for highly
> optimized codes, and turn it on for relatively unoptimized codes.
>
> As to which core IDs correspond to the physical cores, that varies by
> machine. On x86, you can use CPUID to determine that if you are
> super-knowledgeable. I usually just turn it off in the BIOS, because I
> don't like something that may thrash my cache running, even if it might
> occasionally help :)
>
> Cheers,
> Clint
>
> On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> > Hello,Would like go check if my understanding is correct for compiling
Atlas on a machine that has multiple CPUs and hyperthreading.
> > I have two types of machine:
> > - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU,
each with 8 Cores, hyperthreaded, 2 threads per core
> > So when I compile Atlas, is it correct that I should use:
> > -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID
is from 0-7 and 0-15).
> > That means the number 8 or 16 is the total cores on the machine, not
number of cores per CPU. Am I correct ?
> > I also read somewhere saying that Atlas supports Hyperthreading. What
does this mean ?
> > Does this mean:1. I do not need to disable hyperthreading in BIOS (no
performance difference whether it is enabled or disabled, as long as the
number of threads and affinity IDs are set correctly when compiling
Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
> > Thank you very much,
> > lixin
> >
> >
> >
> >
------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > _______________________________________________
> > Math-atlas-devel mailing list
> > Mat...@li...
> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
> >
>
>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
>
>
------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
--
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
From: lixin c. <lix...@ya...> - 2017年06月29日 15:12:02
Thank you very much for quick response. Just to check if my understanding is correct :
1. By turning off cpuid in bios, I only need to use -t N to build Atlas right?
2. The N in -t N is the total number of threads on the machine, not per Cpu right ?
3. One more question I have is, how to set the correct -t N for mpi based application.   Let's say on the 2-cpu machine with 4 cores per CPU, should I use -t 4 or -t 8 if I rum my application with 2 mpi processes :   mpirun -n 2 myprogram 
Many thanks !
Sent from Yahoo Mail on Android 
 
 On Thu, Jun 29, 2017 at 22:20, R. Clint Whaley<wh...@my...> wrote: Hyperthreading is an optimization aimed at addressing poorly optimized 
code. The idea is that most codes cannot drive the backend hardware 
(ALU/FPU, etc) at the maximal rate, so if you duplicate registers you 
can, amongst several threads, find enough work to keep the backend busy.
ATLAS (or any optimized linear algebra library) already runs the FPU at 
its maximal rate supported by the cache architecture after cache blocking.
If you can already drive the backend at >90% of peak, then 
hyperthreading can actually *lose* you performance, as the threads bring 
conflicting data in the cache.
It's usually not a night and day difference, but I haven't measured it 
in the huge blocking era used by recent developer releases (it may be 
worse there).
My general recommendation is turn off hyperthreading for highly 
optimized codes, and turn it on for relatively unoptimized codes.
As to which core IDs correspond to the physical cores, that varies by 
machine. On x86, you can use CPUID to determine that if you are 
super-knowledgeable. I usually just turn it off in the BIOS, because I 
don't like something that may thrash my cache running, even if it might 
occasionally help :)
Cheers,
Clint
On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading.
> I have two types of machine:
> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core
> So when I compile Atlas, is it correct that I should use:
> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15).
> That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ?
> I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ?
> Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
> Thank you very much,
> lixin
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Math-atlas-devel mailing list
Mat...@li...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
 
From: R. C. W. <wh...@my...> - 2017年06月29日 14:20:34
Hyperthreading is an optimization aimed at addressing poorly optimized 
code. The idea is that most codes cannot drive the backend hardware 
(ALU/FPU, etc) at the maximal rate, so if you duplicate registers you 
can, amongst several threads, find enough work to keep the backend busy.
ATLAS (or any optimized linear algebra library) already runs the FPU at 
its maximal rate supported by the cache architecture after cache blocking.
If you can already drive the backend at >90% of peak, then 
hyperthreading can actually *lose* you performance, as the threads bring 
conflicting data in the cache.
It's usually not a night and day difference, but I haven't measured it 
in the huge blocking era used by recent developer releases (it may be 
worse there).
My general recommendation is turn off hyperthreading for highly 
optimized codes, and turn it on for relatively unoptimized codes.
As to which core IDs correspond to the physical cores, that varies by 
machine. On x86, you can use CPUID to determine that if you are 
super-knowledgeable. I usually just turn it off in the BIOS, because I 
don't like something that may thrash my cache running, even if it might 
occasionally help :)
Cheers,
Clint
On 06/28/2017 10:32 PM, lixin chu via Math-atlas-devel wrote:
> Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading.
> I have two types of machine:
> - 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core
> So when I compile Atlas, is it correct that I should use:
> -tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15).
> That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ?
> I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ?
> Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
> Thank you very much,
> lixin
> 
> 
> 
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
> 
From: lixin c. <lix...@ya...> - 2017年06月29日 03:32:37
Hello,Would like go check if my understanding is correct for compiling Atlas on a machine that has multiple CPUs and hyperthreading.
I have two types of machine:
- 2 CPU, each with 4 Core, hyperthreaded, 2 threads per core- 2 CPU, each with 8 Cores, hyperthreaded, 2 threads per core
So when I compile Atlas, is it correct that I should use:
-tl 8 0,1,2,3,4,5,6,7 and -tl 16 0,1,....15 (assuming the affinity ID is from 0-7 and 0-15).
That means the number 8 or 16 is the total cores on the machine, not number of cores per CPU. Am I correct ?
I also read somewhere saying that Atlas supports Hyperthreading. What does this mean ?
Does this mean:1. I do not need to disable hyperthreading in BIOS (no performance difference whether it is enabled or disabled, as long as the number of threads and affinity IDs are set correctly when compiling Atlas)2. Or I can make use of the hyperthread, that is, -tl 16 and -tl 32 ?
Thank you very much,
lixin
From: R. C. W. <rcw...@ls...> - 2017年03月20日 13:47:19
So far, it still must be compile-time chosen. We need it for affinity, 
which is necessary when OS does a poor job of managing the threads.
Eventually I may be able to support run-time choice for the OpenMP 
implementation, which has its own scheduler (though in the cases where 
ATLAS used affinity in past it got horrible performance). Right now, I 
have not yet gotten time to look at that part of the threading package, 
as I'm in the middle of big kernel redesign still.
Regards,
Clint
On 03/19/2017 12:20 PM, José Luis García Pallero wrote:
> Hello:
>
> I've not used ATLAS for a while and I would like to ask if the library
> has yet the ability to select the number of execution thread at
> execution time instead of at compilation time. I remember that this
> feature was discussed in the past, but I'm not sure if finally it was
> considered for the future
>
> Thanks
>
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: José L. G. P. <jgp...@gm...> - 2017年03月19日 17:21:06
Hello:
I've not used ATLAS for a while and I would like to ask if the library
has yet the ability to select the number of execution thread at
execution time instead of at compilation time. I remember that this
feature was discussed in the past, but I'm not sure if finally it was
considered for the future
Thanks
-- 
*****************************************
José Luis García Pallero
jgp...@gm...
(o<
/ / \
V_/_
Use Debian GNU/Linux and enjoy!
*****************************************
From: Jeff H. <jef...@gm...> - 2017年01月18日 19:30:22
I have no idea why this email is full of formatting puke but if it is my
fault, I sincerely apologize. Gmail has been going downhill for a while.
Jeff
On Wed, Jan 18, 2017 at 11:26 AM, Jeff Hammond <jef...@gm...>
wrote:
>
> <div dir="ltr"><br><div class="gmail_extra"><br><div
> class="gmail_quote">On Wed, Jan 18, 2017 at 4:31 AM, john skaller
> <span dir="ltr">&lt;<a href="mailto:sk...@us..."
> target="_blank">sk...@us...</a>&gt;</span>
> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0
> .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
> class="">&gt;<br>
> &gt; Who would demand this?&nbsp; No one in the Windows world cares
> about C99.&nbsp; The only folks I know who want MSVC to support C99
> are HPC developers who still think Windows support matters.<br>
> &gt;<br>
> <br>
> </span><a href="
https://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler
"
> data-saferedirecturl="
https://www.google.com/url?hl=en&amp;q=http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNE3-KZfmfFadCsdnsu8tEzyE-juBA
"
> rel="noreferrer"
> target="_blank">http://stackoverflow.com/
<wbr>questions/9610747/which-c99-<wbr>features-are-available-in-the-<wbr>ms-visual-studio-compiler</a><br>
> <span class=""><br></span></blockquote><div><br></div><div>That has
> useful information on it, but doesn't answer the question of who would
> demand C99 support from MSVC.</div><div>&nbsp;</div><blockquote
> class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
> solid;padding-left:1ex"><span class="">
> &gt; I have no idea why anyone would want long long anyhow.<br>
> &gt; Use intptr_t instead.<br>
> &gt;<br>
> &gt;<br>
> &gt; "long long" must be at least 64-bits, regardless of how wide
> pointers are.&nbsp; On a 32-bit OS, you would see sizeof(long
> long)=2*sizeof(intptr_t), no?<br>
> <br>
> </span>Sure. Is there a HPC computing platform that isn’t 64 bit?<br>
> <span class="im
> HOEnZb"><br></span></blockquote><div><br></div><div>From what I've
> seen on this list, ATLAS is popular with folks that want to run BLAS
> on 32-bit platforms, perhaps in an embedded context. &nbsp;These are
> not supercomputers but performance
> matters.</div><div><br></div><div>Jeff</div><div>&nbsp;</div><blockquote
> class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
> solid;padding-left:1ex"><span class="im HOEnZb">
> —<br>
> john skaller<br>
> <a href="mailto:sk...@us...">
sk...@us...</a><br>
> <a href="http://felix-lang.org"
> data-saferedirecturl="
https://www.google.com/url?hl=en&amp;q=http://felix-lang.org&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNG9D6Tlg8YMMKRW7KWKwmKZSJKOrg
"
> rel="noreferrer" target="_blank">http://felix-lang.org</a><br>
> <br>
> <br>
>
------------------------------<wbr>------------------------------<wbr>------------------<br>
> </span><span class="im HOEnZb">Check out the vibrant tech community on
> one of the world's most<br>
> engaging tech sites, SlashDot.org! <a href="http://sdm.link/slashdot"
> data-saferedirecturl="
https://www.google.com/url?hl=en&amp;q=http://sdm.link/slashdot&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNEdgxepWifaye3YEOSX8vHxzwBZKg
"
> rel="noreferrer" target="_blank">http://sdm.link/slashdot</a><br>
> </span><div class="HOEnZb"><div
> class="h5">______________________________<wbr>_________________<br>
> Math-atlas-devel mailing list<br>
> <a href="mailto:Mat...@li...
">Math-atlas-devel@lists.<wbr>sourceforge.net</a><br>
> <a href="https://lists.sourceforge.net/lists/listinfo/math-atlas-devel"
> data-saferedirecturl="
https://www.google.com/url?hl=en&amp;q=https://lists.sourceforge.net/lists/listinfo/math-atlas-devel&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNHcMGmclmITvH_FWV-G52_ODGYDZw
"
> rel="noreferrer"
> target="_blank">https://lists.sourceforge.net/
<wbr>lists/listinfo/math-atlas-<wbr>devel</a><br>
> </div></div></blockquote></div><br><br clear="all"><div><br></div>--
> <br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff
> Hammond<br><a href="mailto:jef...@gm..."
> target="_blank">jef...@gm...</a><br><a
> href="https://jeffhammond.github.io/"
> target="_blank">http://jeffhammond.github.io/</a></div>
> </div></div>
--
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
From: Jeff H. <jef...@gm...> - 2017年01月18日 19:27:22
<div dir="ltr"><br><div class="gmail_extra"><br><div
class="gmail_quote">On Wed, Jan 18, 2017 at 4:31 AM, john skaller
<span dir="ltr">&lt;<a href="mailto:sk...@us..."
target="_blank">sk...@us...</a>&gt;</span>
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">&gt;<br>
&gt; Who would demand this?&nbsp; No one in the Windows world cares
about C99.&nbsp; The only folks I know who want MSVC to support C99
are HPC developers who still think Windows support matters.<br>
&gt;<br>
<br>
</span><a href="https://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler"
data-saferedirecturl="https://www.google.com/url?hl=en&amp;q=http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNE3-KZfmfFadCsdnsu8tEzyE-juBA"
rel="noreferrer"
target="_blank">http://stackoverflow.com/<wbr>questions/9610747/which-c99-<wbr>features-are-available-in-the-<wbr>ms-visual-studio-compiler</a><br>
<span class=""><br></span></blockquote><div><br></div><div>That has
useful information on it, but doesn't answer the question of who would
demand C99 support from MSVC.</div><div>&nbsp;</div><blockquote
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><span class="">
&gt; I have no idea why anyone would want long long anyhow.<br>
&gt; Use intptr_t instead.<br>
&gt;<br>
&gt;<br>
&gt; "long long" must be at least 64-bits, regardless of how wide
pointers are.&nbsp; On a 32-bit OS, you would see sizeof(long
long)=2*sizeof(intptr_t), no?<br>
<br>
</span>Sure. Is there a HPC computing platform that isn’t 64 bit?<br>
<span class="im
HOEnZb"><br></span></blockquote><div><br></div><div>From what I've
seen on this list, ATLAS is popular with folks that want to run BLAS
on 32-bit platforms, perhaps in an embedded context. &nbsp;These are
not supercomputers but performance
matters.</div><div><br></div><div>Jeff</div><div>&nbsp;</div><blockquote
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><span class="im HOEnZb">
—<br>
john skaller<br>
<a href="mailto:sk...@us...">sk...@us...</a><br>
<a href="http://felix-lang.org"
data-saferedirecturl="https://www.google.com/url?hl=en&amp;q=http://felix-lang.org&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNG9D6Tlg8YMMKRW7KWKwmKZSJKOrg"
rel="noreferrer" target="_blank">http://felix-lang.org</a><br>
<br>
<br>
------------------------------<wbr>------------------------------<wbr>------------------<br>
</span><span class="im HOEnZb">Check out the vibrant tech community on
one of the world's most<br>
engaging tech sites, SlashDot.org! <a href="http://sdm.link/slashdot"
data-saferedirecturl="https://www.google.com/url?hl=en&amp;q=http://sdm.link/slashdot&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNEdgxepWifaye3YEOSX8vHxzwBZKg"
rel="noreferrer" target="_blank">http://sdm.link/slashdot</a><br>
</span><div class="HOEnZb"><div
class="h5">______________________________<wbr>_________________<br>
Math-atlas-devel mailing list<br>
<a href="mailto:Mat...@li...">Math-atlas-devel@lists.<wbr>sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/math-atlas-devel"
data-saferedirecturl="https://www.google.com/url?hl=en&amp;q=https://lists.sourceforge.net/lists/listinfo/math-atlas-devel&amp;source=gmail&amp;ust=1484852481953000&amp;usg=AFQjCNHcMGmclmITvH_FWV-G52_ODGYDZw"
rel="noreferrer"
target="_blank">https://lists.sourceforge.net/<wbr>lists/listinfo/math-atlas-<wbr>devel</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>--
<br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff
Hammond<br><a href="mailto:jef...@gm..."
target="_blank">jef...@gm...</a><br><a
href="https://jeffhammond.github.io/"
target="_blank">http://jeffhammond.github.io/</a></div>
</div></div>
From: R. C. W. <rcw...@ls...> - 2017年01月18日 15:43:22
Thanks to everyone on the C99 stuff. After the *very* helpful comments, 
it seems only // is safe, and I've not yet found the courage to start 
using even that. The aesthete in me really wants //, but the engineer 
says "you are planning to break standards compliance for something that 
doesn't appear in the compiled code, and making aesthetic arguments in 
code where you use shifts rather than division & multiplication?" :)
On 01/18/2017 06:31 AM, john skaller wrote:
> Sure. Is there a HPC computing platform that isn’t 64 bit?
For me, at least, ATLAS is not aimed just at HPC computing platforms, 
which are usually adequately served by vendor-supplied BLAS. ATLAS was 
created because I couldn't get BLAS for some platforms I wanted to work 
on. While I don't concentrate on 32-bit, I definitely want everything 
to work there, and design for it. For x86, 32-bit has code size 
implications that may be important if Intel keeps pouring most of their 
engineering into power rather than performance.
Historically, I have tried to support any machine with a pipelined FPU, 
and I think ATLAS has been used (mainly for blocking) on even a few w/o 
an pipelined FPU :)
Cheers,
Clint
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: john s. <sk...@us...> - 2017年01月18日 12:31:33
> 
> Who would demand this? No one in the Windows world cares about C99. The only folks I know who want MSVC to support C99 are HPC developers who still think Windows support matters.
> 
http://stackoverflow.com/questions/9610747/which-c99-features-are-available-in-the-ms-visual-studio-compiler
> I have no idea why anyone would want long long anyhow.
> Use intptr_t instead.
> 
> 
> "long long" must be at least 64-bits, regardless of how wide pointers are. On a 32-bit OS, you would see sizeof(long long)=2*sizeof(intptr_t), no?
Sure. Is there a HPC computing platform that isn’t 64 bit?
—
john skaller
sk...@us...
http://felix-lang.org
From: Jeff H. <jef...@gm...> - 2017年01月17日 22:21:09
On Sat, Jan 14, 2017 at 1:11 PM, Andrew Reilly <ar...@bi...>
wrote:
>
> Hi Clint,
>
> The two compilers with least support for c99 features that I'm aware of
are MSVC and TI CodeComposer. Both have most of the support for C99
library features, but both (being primarily C++ compilers) don't have good
support for the C99 language features that aren't in C++.
>
> So: you'll find // comments everywhere.
> You'll need a macro to define inline _inline on some systems.
+1 to ATLAS_INLINE macro.
>
> You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC.
Alas you can't actually use or redefine the keyword "restrict", because
that is already a magic keyword used in the Windows header files, and some
other Windows magic compilation directives.
+1 to ATLAS_RESTRICT macro.
> I'm fairly sure that modern versions of MSVC support long long int and
%llu, although you might have to spell the former as __int64 on some
versions.
Fixed-width integer types are part of C++11 (
http://en.cppreference.com/w/cpp/types/integer) so I would expect that MSVC
supports them, but I have made no attempts to verify this.
>
> You will need a macro to define snprintf to _snprintf on MSVC, and you'll
need to define _CRT_SECURE_NO_WARNINGS before including any of the standard
headers to turn off the deprecation warnings.
>
> I haven't tried to use _Complex or _Thread_local myself. I have a memory
of _Atomic being supported in many places though. I expect that the others
are too.
>
*_Complex* - This is a C99 feature and I don't know of a compiler that
doesn't support it. However, just to be safe, you should typedef
atlas_complex_{float,double} and follow e.g.
https://stackoverflow.com/questions/1063406/c99-complex-support-with-visual-studio
if C99 support isn't available.
I can't remember what ISO C and Fortran say about the interoperability of
their respective complex types but I doubt it is an issue in practice.
Clint probably knows what works (and doesn't) already anyways.
*_Atomic* - This is a C11 feature and it is a bad one. It is also
completely optional (see __STDC_NO_ATOMICS__). You should use the explicit
types like atomic_int rather than "_Atomic int" and the explicit API (e.g.
atomic_load) rather than relying on operator overloading (ding ding ding -
this is why _Atomic is evil and totally un-C-like).
The Intel compiler supports the explicit C11 atomics API but not _Atomic
and it correctly reports the lack of complete support for C11 atomics via
__STDC_NO_ATOMICS__, so you have to explicitly test for the explicit API or
query the compiler version macro.
https://github.com/jeffhammond/HPCInfo/blob/master/atomics/ping-pong/c11-ping-pong.c
demonstrates the latter (it also notes a show-stopper GCC bug if you use
mix with OpenMP).
GCC and Clang support both C11 atomics APIs. I have not tested Cray C11
support exhaustively, but they have at least the explicit API.
*_Thread_local* - This is a C11 feature and it is also strictly option (see
__STDC_NO_THREADS__).
I recommend you have a macro ATLAS_THREAD_LOCAL for the C11 _Thread_local,
GCC __thread, MSVC __declspec(thread), and any other implementation-defined
equivalents.
One must be careful when mixing TLS (thread-local storage) specifiers with
different threading models. I don't think one can guarentee that the TLS
attribute associated with C11, GCC and OpenMP are *guaranteed* to work
across C11, POSIX and OpenMP threads.
Best,
Jeff
>
> Cheers,
>
> Andrew Reilly
> M: 0409-824-272
> ar...@bi...
>
>
>
> > On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote:
> >
> > Guys,
> >
> > In the developer release, I am considering relaxing ATLAS's present
> > strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume
> > stuff from C99. Frankly, the lack // is slowly killing me.
> >
> > Right now, any C99 features are enabled only by macros that can be shut
> > off.
> >
> > There is little benefit aside from aesthetics to this (though safe
> > string ops would be *so* nice), so I don't want to do it if anybody
> > reports using a compiler that doesn't support these features, but I'm
> > thinking that while their might still be some compilers w/o full C99
> > support, they'll all have the features I most want to add.
> >
> > Here's the list of things I'd definitely like to assume support for that
> > I think all compilers support (even likely obscure ones on embedded
> > systems):
> > // style comments
> > inline
> > restrict
> > long long int, %llu
> > Safe string operations, like snprintf (this lack is painful)
> >
> > In addition there are more advanced features that might be useful, but
> > I'm not sure if I can count on them being universally available:
> > _Complex support
> > _Atomic
> > _Thread_local
> >
> > Does anyone have comments on this idea?
> >
> > Thanks,
> > Clint
> >
> > --
> > **********************************************************************
> > ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
> > **********************************************************************
> >
> >
------------------------------------------------------------------------------
> > Developer Access Program for Intel Xeon Phi Processors
> > Access to Intel Xeon Phi processor-based developer platforms.
> > With one year of Intel Parallel Studio XE.
> > Training and support from Colfax.
> > Order your platform today. http://sdm.link/xeonphi
> > _______________________________________________
> > Math-atlas-devel mailing list
> > Mat...@li...
> > https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
>
>
------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
--
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
From: Jeff H. <jef...@gm...> - 2017年01月17日 22:00:36
On Sat, Jan 14, 2017 at 6:33 PM, john skaller <sk...@us...
> wrote:
>
> > On 15 Jan. 2017, at 08:11, Andrew Reilly <ar...@bi...> wrote:
> >
> > Hi Clint,
> >
> > The two compilers with least support for c99 features that I'm aware of
> are MSVC and TI CodeComposer. Both have most of the support for C99
> library features, but both (being primarily C++ compilers) don’t have good
> support for the C99 language features that aren't in C++.
>
> Doesn’t modern MSVC provide full C99 support?
>
>From what I've heard, there has been no progress on this except the cases
where C99 features were added to C++11.
> I though MS caved in to demands?
>
>
Who would demand this? No one in the Windows world cares about C99. The
only folks I know who want MSVC to support C99 are HPC developers who still
think Windows support matters.
>
> > I’m fairly sure that modern versions of MSVC support long long int and
> %llu, although you might have to spell the former as __int64 on some
> versions.
>
>
%llu works for "long long unsigned". For int64_t, you need the PRId64
macro. Since __int64 isn't standard, one does whatever the compiler docs
specify.
> I have no idea why anyone would want long long anyhow.
> Use intptr_t instead.
>
>
"long long" must be at least 64-bits, regardless of how wide pointers are.
On a 32-bit OS, you would see sizeof(long long)=2*sizeof(intptr_t), no?
Jeff, speaking in a strictly personal capacity
>
> —
> john skaller
> sk...@us...
> http://felix-lang.org
>
>
> ------------------------------------------------------------
> ------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
-- 
Jeff Hammond
jef...@gm...
http://jeffhammond.github.io/
From: James C. <cl...@jh...> - 2017年01月17日 21:26:05
>>>>> "RCW" == R Clint Whaley <rcw...@ls...> writes:
RCW> Unfortunately, I can't just macro my way around this lack: supporting
RCW> both snprintf and sprintf doubles all my string handling code, which
RCW> I'm unwilling to do from a code maintenance perspective, so I'll just
RCW> continue with my present C89 behavior there :(
You can always include an snprintf(3) implementation.
The one from musl is small and is licensed MIT.
-JimC
-- 
James Cloos <cl...@jh...> OpenPGP: 0x997A9F17ED7DAEA6
From: J. R. J. <J.R...@ba...> - 2017年01月16日 12:32:32
This is fine for me. You could make use of the feature test macros for C99 to
produce a helpful error if the support you need isn't there.
Jess
On 2017年1月14日, R. Clint Whaley wrote:
> Guys,
>
> In the developer release, I am considering relaxing ATLAS's present
> strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume
> stuff from C99. Frankly, the lack // is slowly killing me.
>
> Right now, any C99 features are enabled only by macros that can be shut
> off.
>
> There is little benefit aside from aesthetics to this (though safe
> string ops would be *so* nice), so I don't want to do it if anybody
> reports using a compiler that doesn't support these features, but I'm
> thinking that while their might still be some compilers w/o full C99
> support, they'll all have the features I most want to add.
>
> Here's the list of things I'd definitely like to assume support for that
> I think all compilers support (even likely obscure ones on embedded
> systems):
> // style comments
> inline
> restrict
> long long int, %llu
> Safe string operations, like snprintf (this lack is painful)
>
> In addition there are more advanced features that might be useful, but
> I'm not sure if I can count on them being universally available:
> _Complex support
> _Atomic
> _Thread_local
>
> Does anyone have comments on this idea?
>
> Thanks,
> Clint
>
>
From: R. C. W. <rcw...@ls...> - 2017年01月15日 16:58:54
Andrew,
On 01/14/2017 03:11 PM, Andrew Reilly wrote:
> Hi Clint,
>
> The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don't have good support for the C99 language features that aren't in C++.
>
> So: you'll find // comments everywhere.
> You'll need a macro to define inline _inline on some systems.
> You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC. Alas you can't actually use or redefine the keyword "restrict", because that is already a magic keyword used in the Windows header files, and some other Windows magic compilation directives.
> I'm fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions.
> You will need a macro to define snprintf to _snprintf on MSVC, and you'll need to define _CRT_SECURE_NO_WARNINGS before including any of the standard headers to turn off the deprecation warnings.
>
> I haven't tried to use _Complex or _Thread_local myself. I have a memory of _Atomic being supported in many places though. I expect that the others are too.
>
Thank you very much for this! It looks like I can't really change much 
about ATLAS's C use, other than allowing myself to use // then. Of my 
proposed list, only this and the safe string functions were things that 
would immediately make my life a lot better, so lack of snprintf support 
is the only real disappointment.
Unfortunately, I can't just macro my way around this lack: supporting 
both snprintf and sprintf doubles all my string handling code, which I'm 
unwilling to do from a code maintenance perspective, so I'll just 
continue with my present C89 behavior there :(
Since I have a soft dependence on gcc for the install, I could still 
switch to snprintf, but the fact that MSVC doesn't support it still 
means I'm less confident that no embedded system has only 1 compiler 
that doesn't support snprintf, and for which modern gcc is not ported, 
so I'll not switch to snprint.
For anyone concerned about security here: The string handling doesn't 
wind up in the ATLAS library, so it's not really a matter for user 
security. I do a lot of string handling during the tuning and generation 
stages, which would either be vastly simplified or made less likely to 
segfault using snprints, which is why I would have liked to make the change.
On the long long type name, I'm already using a macro that can be 
changed to another name, but I had no way to print out such values in 
C89, even though many compilers supported the type, so the fact that llu 
will work is good.
Many thanks,
Clint
> Cheers,
>
> Andrew Reilly
> M: 0409-824-272
> ar...@bi...
>
>
>
>> On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote:
>>
>> Guys,
>>
>> In the developer release, I am considering relaxing ATLAS's present
>> strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume
>> stuff from C99. Frankly, the lack // is slowly killing me.
>>
>> Right now, any C99 features are enabled only by macros that can be shut
>> off.
>>
>> There is little benefit aside from aesthetics to this (though safe
>> string ops would be *so* nice), so I don't want to do it if anybody
>> reports using a compiler that doesn't support these features, but I'm
>> thinking that while their might still be some compilers w/o full C99
>> support, they'll all have the features I most want to add.
>>
>> Here's the list of things I'd definitely like to assume support for that
>> I think all compilers support (even likely obscure ones on embedded
>> systems):
>> // style comments
>> inline
>> restrict
>> long long int, %llu
>> Safe string operations, like snprintf (this lack is painful)
>>
>> In addition there are more advanced features that might be useful, but
>> I'm not sure if I can count on them being universally available:
>> _Complex support
>> _Atomic
>> _Thread_local
>>
>> Does anyone have comments on this idea?
>>
>> Thanks,
>> Clint
>>
>> --
>> **********************************************************************
>> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
>> **********************************************************************
>>
>> ------------------------------------------------------------------------------
>> Developer Access Program for Intel Xeon Phi Processors
>> Access to Intel Xeon Phi processor-based developer platforms.
>> With one year of Intel Parallel Studio XE.
>> Training and support from Colfax.
>> Order your platform today. http://sdm.link/xeonphi
>> _______________________________________________
>> Math-atlas-devel mailing list
>> Mat...@li...
>> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
>
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
>
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: john s. <sk...@us...> - 2017年01月15日 02:49:43
> On 15 Jan. 2017, at 08:11, Andrew Reilly <ar...@bi...> wrote:
> 
> Hi Clint,
> 
> The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don’t have good support for the C99 language features that aren't in C++.
Doesn’t modern MSVC provide full C99 support?
I though MS caved in to demands?
> I’m fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions.
I have no idea why anyone would want long long anyhow.
Use intptr_t instead.
—
john skaller
sk...@us...
http://felix-lang.org
From: Andrew R. <ar...@bi...> - 2017年01月14日 21:41:00
Hi Clint,
The two compilers with least support for c99 features that I'm aware of are MSVC and TI CodeComposer. Both have most of the support for C99 library features, but both (being primarily C++ compilers) don't have good support for the C99 language features that aren't in C++.
So: you'll find // comments everywhere.
You'll need a macro to define inline _inline on some systems.
You'll need a macro to define ATLAS_RESTRICT _restrict on at least MSVC. Alas you can't actually use or redefine the keyword "restrict", because that is already a magic keyword used in the Windows header files, and some other Windows magic compilation directives.
I'm fairly sure that modern versions of MSVC support long long int and %llu, although you might have to spell the former as __int64 on some versions.
You will need a macro to define snprintf to _snprintf on MSVC, and you'll need to define _CRT_SECURE_NO_WARNINGS before including any of the standard headers to turn off the deprecation warnings.
I haven't tried to use _Complex or _Thread_local myself. I have a memory of _Atomic being supported in many places though. I expect that the others are too.
Cheers,
Andrew Reilly
M: 0409-824-272
ar...@bi...
> On 15 Jan 2017, at 04:35 , R. Clint Whaley <rcw...@ls...> wrote:
> 
> Guys,
> 
> In the developer release, I am considering relaxing ATLAS's present 
> strict adherence to ANSI/ISO 9899-1990 standard, so that I can assume 
> stuff from C99. Frankly, the lack // is slowly killing me.
> 
> Right now, any C99 features are enabled only by macros that can be shut 
> off.
> 
> There is little benefit aside from aesthetics to this (though safe 
> string ops would be *so* nice), so I don't want to do it if anybody 
> reports using a compiler that doesn't support these features, but I'm 
> thinking that while their might still be some compilers w/o full C99 
> support, they'll all have the features I most want to add.
> 
> Here's the list of things I'd definitely like to assume support for that 
> I think all compilers support (even likely obscure ones on embedded 
> systems):
> // style comments
> inline
> restrict
> long long int, %llu
> Safe string operations, like snprintf (this lack is painful)
> 
> In addition there are more advanced features that might be useful, but 
> I'm not sure if I can count on them being universally available:
> _Complex support
> _Atomic
> _Thread_local
> 
> Does anyone have comments on this idea?
> 
> Thanks,
> Clint
> 
> -- 
> **********************************************************************
> ** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
> **********************************************************************
> 
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Math-atlas-devel mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/math-atlas-devel
9 messages has been excluded from this view by a project administrator.

Showing results of 2001

1 2 3 .. 81 > >> (Page 1 of 81)
Thanks for helping keep SourceForge clean.
X





Briefly describe the problem (required):
Upload screenshot of ad (required):
Select a file, or drag & drop file here.
Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

AltStyle によって変換されたページ (->オリジナル) /