SourceForge logo
SourceForge logo
Menu

math-atlas-devel — ATLAS developers' list

You can subscribe to this list here.

2001 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
(8)
Oct
(17)
Nov
(29)
Dec
(30)
2002 Jan
(19)
Feb
(19)
Mar
(29)
Apr
(3)
May
(38)
Jun
(14)
Jul
(6)
Aug
(7)
Sep
(12)
Oct
(6)
Nov
(9)
Dec
2003 Jan
(6)
Feb
(5)
Mar
(8)
Apr
(10)
May
(4)
Jun
(11)
Jul
(5)
Aug
(3)
Sep
(12)
Oct
(1)
Nov
(9)
Dec
(45)
2004 Jan
(7)
Feb
(6)
Mar
(4)
Apr
(7)
May
(7)
Jun
(30)
Jul
(7)
Aug
(6)
Sep
(1)
Oct
(4)
Nov
(18)
Dec
(25)
2005 Jan
(11)
Feb
(10)
Mar
(3)
Apr
(7)
May
Jun
Jul
(1)
Aug
(29)
Sep
(6)
Oct
(8)
Nov
(2)
Dec
(5)
2006 Jan
Feb
(16)
Mar
(2)
Apr
(9)
May
(15)
Jun
(24)
Jul
(10)
Aug
(39)
Sep
(20)
Oct
(8)
Nov
(30)
Dec
(28)
2007 Jan
(1)
Feb
(19)
Mar
(11)
Apr
(3)
May
(12)
Jun
(7)
Jul
(20)
Aug
(9)
Sep
(7)
Oct
(7)
Nov
(8)
Dec
(6)
2008 Jan
(3)
Feb
(8)
Mar
Apr
May
(7)
Jun
(16)
Jul
(38)
Aug
(11)
Sep
(6)
Oct
(2)
Nov
Dec
(4)
2009 Jan
(6)
Feb
(25)
Mar
(13)
Apr
(5)
May
Jun
Jul
(1)
Aug
(8)
Sep
(16)
Oct
(17)
Nov
(2)
Dec
(1)
2010 Jan
(3)
Feb
(3)
Mar
(2)
Apr
(5)
May
Jun
(2)
Jul
Aug
Sep
Oct
(16)
Nov
(53)
Dec
(7)
2011 Jan
(10)
Feb
(37)
Mar
(30)
Apr
(12)
May
(5)
Jun
(14)
Jul
(7)
Aug
(8)
Sep
(37)
Oct
(3)
Nov
(5)
Dec
(60)
2012 Jan
(25)
Feb
(5)
Mar
(4)
Apr
(7)
May
(12)
Jun
(28)
Jul
(28)
Aug
(2)
Sep
(5)
Oct
(6)
Nov
Dec
(17)
2013 Jan
(18)
Feb
(10)
Mar
(30)
Apr
(21)
May
Jun
(10)
Jul
(8)
Aug
Sep
(39)
Oct
(54)
Nov
(8)
Dec
(6)
2014 Jan
(17)
Feb
(14)
Mar
(16)
Apr
(67)
May
(2)
Jun
(8)
Jul
(7)
Aug
(9)
Sep
(6)
Oct
(9)
Nov
(12)
Dec
2015 Jan
(5)
Feb
(9)
Mar
(1)
Apr
(2)
May
Jun
(1)
Jul
(2)
Aug
(6)
Sep
(1)
Oct
(1)
Nov
Dec
(3)
2016 Jan
Feb
Mar
Apr
May
Jun
(3)
Jul
(22)
Aug
Sep
(1)
Oct
Nov
(21)
Dec
2017 Jan
(20)
Feb
Mar
(2)
Apr
May
Jun
(8)
Jul
Aug
(1)
Sep
Oct
Nov
Dec
2018 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
(3)
Nov
Dec
S M T W T F S





1
2
(5)
3
4
5
6
7
8
9
(1)
10
11
12
13
14
15
16
(1)
17
18
19
20
21
22
23
24
25
26
(1)
27
28
29
30

Showing 8 results of 8

From: R. C. W. <rcw...@ls...> - 2013年11月26日 22:23:21
Guys,
I've released 3.11.22. Other than bugfixes, its main extension is that 
the max complex blocking factor is now tuned independently of the real. 
 The performance difference on most machines is minuscule, but it could 
be key on machines that lack an L3.
Cheers,
Clint
ATLAS 3.11.22 released 11/26/13, highlights of changes from 3.11.21:
 * Changed it so complex block-major gemm installed for non-default 
installs
 * Changed it so ARM block-major gemm kernels default to HARDFP ABI
 * Added NB tuning for complex access-major gemm
 * Uglied up atlas_install to avoid gcc's unalterable BS warnings
 * Updated archdefs for Corei364AVXMAC
 * Plugged several one-time mem leaks in lanbsrch
 * Added basic config support for cross-compilation
 * Updated complex cmat2blk to correct prototype & type def for complex
 * Rakib wrote cmat2blk complex
 * Changed emit_uamm to handle multiple installs
 * Boatload of TI_C99_BM accelerator patches from Tony Castaldo
From: R. C. W. <rcw...@ls...> - 2013年11月16日 19:48:33
Guys,
I have released 3.11.21, which fixes some widespread K-cleanup bugs. I 
have also gotten the new access-major gemm working with archdefs, though 
I have added only archdefs for one machine. Right now, even using 
archdefs does a bunch of unnecessary timings; some of these can be 
eliminated later once I have finalized the ammm tuning process.
Cheers,
Clint
ATLAS 3.11.21 released 11/16/13, highlights of changes from 3.11.20:
 * Made it so AMMM result files are included in archdefs
 * Added ammm archdefs for Corei264AVX
 * Fixed error in ammm (all precisions) for K-cleanup
-- 
**********************************************************************
** R. Clint Whaley, PhD * Assoc Prof, LSU * www.csc.lsu.edu/~whaley **
**********************************************************************
From: R. C. W. <rcw...@ls...> - 2013年11月09日 01:20:24
OK, here are the numbers for 3.5 Ghz Haswell. This machine is just 
ridiculous. A kernel I wrote for the AMD gets > 90% peak, and the peak 
is eye-poppingly high.
The block major does so poorly because I don't have an fma3 kernel for it.
Cheers,
Clint
 ARCH = Corei364AVXMAC
 ARCHDEFS = -DATL_OS_Linux -DATL_ARCH_Corei3 -DATL_CPUMHZ=3500 
-DATL_AVXMAC -DATL_AVX -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_USE64BITS 
-DATL_GAS_x8664
drteeth>./xdmmtst_amm2 -N 2000 8000 2000 ; ./xzmmtst_amm2 -N 2000 8000 
2000 ; ./xsmmtst_amm2 -N 2000 8000 2000 ; ./xcmmtst_amm2 -N 2000 8000 2000
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
 1 N N 2000 2000 2000 1.0 1.0 0.63 25378.8 1.00 ---
 1 N N 2000 2000 2000 1.0 1.0 0.33 48748.4 1.92 YES
 2 N N 4000 4000 4000 1.0 1.0 4.96 25813.8 1.00 ---
 2 N N 4000 4000 4000 1.0 1.0 2.56 50022.4 1.94 YES
 3 N N 6000 6000 6000 1.0 1.0 16.66 25924.6 1.00 ---
 3 N N 6000 6000 6000 1.0 1.0 8.68 49771.5 1.92 YES
 4 N N 8000 8000 8000 1.0 1.0 39.44 25964.5 1.00 ---
 4 N N 8000 8000 8000 1.0 1.0 20.26 50533.5 1.95 YES
NTEST=4, NUMBER PASSED=4, NUMBER FAILURES=0
99.096u 0.612s 1:39.90 99.7%	0+0k 0+0io 0pf+0w
TEST TA TB M N K ralph ialph rbeta ibeta Time Mflop 
SpUp PASS
==== == == === === === ===== ===== ===== ===== ====== ===== 
==== ====
 1 N N 2000 2000 2000 1.0 0.0 1.0 0.0 2.56 25012.4 
1.00 ---
 1 N N 2000 2000 2000 1.0 0.0 1.0 0.0 1.28 50087.6 
2.00 YES
 2 N N 4000 4000 4000 1.0 0.0 1.0 0.0 23.01 22255.7 
1.00 ---
 2 N N 4000 4000 4000 1.0 0.0 1.0 0.0 10.22 50086.2 
2.25 YES
 3 N N 6000 6000 6000 1.0 0.0 1.0 0.0 79.11 21842.5 
1.00 ---
 3 N N 6000 6000 6000 1.0 0.0 1.0 0.0 34.33 50332.4 
2.30 YES
 4 N N 8000 8000 8000 1.0 0.0 1.0 0.0 189.00 21671.9 
1.00 ---
 4 N N 8000 8000 8000 1.0 0.0 1.0 0.0 81.78 50088.6 
2.31 YES
NTEST=4, NUMBER PASSED=4, NUMBER FAILURES=0
429.152u 2.068s 7:12.07 99.8%	0+0k 0+0io 0pf+0w
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
 1 N N 2000 2000 2000 1.0 1.0 0.33 48341.8 1.00 ---
 1 N N 2000 2000 2000 1.0 1.0 0.16 97504.0 2.02 YES
 2 N N 4000 4000 4000 1.0 1.0 2.59 49328.7 1.00 ---
 2 N N 4000 4000 4000 1.0 1.0 1.26 101739.0 2.06 YES
 3 N N 6000 6000 6000 1.0 1.0 8.75 49384.9 1.00 ---
 3 N N 6000 6000 6000 1.0 1.0 4.21 102530.8 2.08 YES
 4 N N 8000 8000 8000 1.0 1.0 20.65 49579.9 1.00 ---
 4 N N 8000 8000 8000 1.0 1.0 9.90 103476.6 2.09 YES
NTEST=4, NUMBER PASSED=4, NUMBER FAILURES=0
TEST TA TB M N K ralph ialph rbeta ibeta Time Mflop 
SpUp PASS
==== == == === === === ===== ===== ===== ===== ====== ===== 
==== ====
 1 N N 2000 2000 2000 1.0 0.0 1.0 0.0 1.33 48084.7 
1.00 ---
 1 N N 2000 2000 2000 1.0 0.0 1.0 0.0 0.64 100767.1 
2.10 YES
 2 N N 4000 4000 4000 1.0 0.0 1.0 0.0 10.74 47658.0 
1.00 ---
 2 N N 4000 4000 4000 1.0 0.0 1.0 0.0 4.97 103009.8 
2.16 YES
 3 N N 6000 6000 6000 1.0 0.0 1.0 0.0 40.99 42160.7 
1.00 ---
 3 N N 6000 6000 6000 1.0 0.0 1.0 0.0 16.86 102480.4 
2.43 YES
 4 N N 8000 8000 8000 1.0 0.0 1.0 0.0 97.79 41887.5 
1.00 ---
 4 N N 8000 8000 8000 1.0 0.0 1.0 0.0 39.11 104731.5 
2.50 YES
NTEST=4, NUMBER PASSED=4, NUMBER FAILURES=0
From: R. C. W. <rcw...@ls...> - 2013年11月02日 18:30:33
Guys,
I have released 3.11.20. I plugged a memory leak in threaded QR, and 
modernized lanbsrch to work with the new framework. After this, I could 
rerun the archdefs for the lapack header files, and all this together 
seemed to fix the threaded QR performance problems I was seeing.
Unfortunately, to get the improved performance you need to redo the 
archdefs for lapack if you aren't on a Corei264AVX (my desktop arch), 
which is a bit of a pain. The quickest approach is to untar your 
archdef tarfile in ATLAS/CONFIG/ARCHS, delete the files in the 
lapack/gcc directory, and then remake the tar. It's probably not worth 
messing with for most folks, but if QR is very important it might help.
Cheers,
Clint
ATLAS 3.11.20 released 11/02/13, highlights of changes from 3.11.19:
 * Fixed possible memory leak in threaded QR
 * Updated lanbsrch to work with ammm
 * Updated Corei264AVX lapack header archdefs to work with ammm
From: R. C. W. <rcw...@ls...> - 2013年11月02日 18:30:20
Guys,
I have released 3.11.20. I plugged a memory leak in threaded QR, and 
modernized lanbsrch to work with the new framework. After this, I could 
rerun the archdefs for the lapack header files, and all this together 
seemed to fix the threaded QR performance problems I was seeing.
Unfortunately, to get the improved performance you need to redo the 
archdefs for lapack if you aren't on a Corei264AVX (my desktop arch), 
which is a bit of a pain. The quickest approach is to untar your 
archdef tarfile in ATLAS/CONFIG/ARCHS, delete the files in the 
lapack/gcc directory, and then remake the tar. It's probably not worth 
messing with for most folks, but if QR is very important it might help.
Cheers,
Clint
ATLAS 3.11.20 released 11/02/13, highlights of changes from 3.11.19:
 * Fixed possible memory leak in threaded QR
 * Updated lanbsrch to work with ammm
 * Updated Corei264AVX lapack header archdefs to work with ammm
From: R. C. W. <rcw...@ls...> - 2013年11月02日 04:17:35
Guys,
I've released 3.11.19. The main work is in reducing the amount of 
workspace the new framework allocates. I had first removed the 
dependencies on the block-major stuff in the parallel BLAS, which left 
them working in a simplified way. Then, I noticed that I had a parallel 
performance regression in QR, which is probably related to not re-tuning 
NB for the new framework. I started a big tuning job, and had to hard 
reset the machine due to swapping making it impossible to type.
This was my big clue I needed to reduce workspace being used in the new 
GEMM. I have still not ensured that the parallel stuff is as fast as it 
should be, will return to that later.
Cheers,
Clint
ATLAS 3.11.19 released 11/01/13, highlights of changes from 3.11.18:
 * Removed block-major GEMM dep from all threading code
 * Performed recursion for K > 3000 in order to put a ceiling on 
workspace
 in ammm
 * Added ammm MNK loop order to save workspace for non-square GEMM
From: Dmitri A. S. <das...@gm...> - 2013年11月02日 03:39:44
On Fri, Nov 1, 2013 at 8:00 PM, <mic...@th...>wrote:
> Here is my cpuinfo:
>
> cat /proc/cpuinfo
>
> processor : 0
>
> vendor_id : AuthenticAMD
>
> cpu family : 15
>
> model : 65
>
> model name : Dual-Core AMD Opteron(tm) Processor 8220
>
> stepping : 3
>
> cpu MHz : 1000.000
>
>
So, here is the problem -- 8220 Opteron should run at 2.8GHz not at 1GHz.
That suggest that throttling is in effect.
Dmitri.
-- 
From: <mic...@th...> - 2013年11月02日 01:13:53
Here is my cpuinfo:
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 0
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 2
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 2
siblings : 2
core id : 0
cpu cores : 2
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 3
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 2
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 4
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 5
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 6
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 2
siblings : 2
core id : 1
cpu cores : 2
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 7
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8220
stepping : 3
cpu MHz : 1000.000
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 2009.31
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Thanks,
Mike
From: "Dmitri A. Sergatskov" <das...@gm...<mailto:das...@gm...>>
Reply-To: "List for developer discussion, NOT SUPPORT." <mat...@li...<mailto:mat...@li...>>
Date: Wednesday, October 30, 2013 4:47 PM
To: "List for developer discussion, NOT SUPPORT." <mat...@li...<mailto:mat...@li...>>
Subject: Re: [atlas-devel] Atlas 3.1.10 not even trying to build on Suse
Please post some hardware info...
At least results of "cat /proc/cpuinfo"
("cpupower frequency-info" would be nice too)
Most likely you have a M/B that enables throttling no matter what.
(I have one of those.)
Dmitri.
--
On Wed, Oct 30, 2013 at 3:15 PM, <mic...@th...<mailto:mic...@th...>> wrote:
CPU throttling on Suse linux enterprise server 11
We have this on the server:
cat /proc/acpi/processor/*/info | grep throttling
throttling control: no
throttling control: no
throttling control: no
throttling control: no
throttling control: no
throttling control: no
throttling control: no
throttling control: no
But I get from Configure on Atlas 3.10.0:
CPU Throttling apparently enabled!
Aborting...
Not sure how to resolve. Any ideas appreciated.
Thanks,
Mike
------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Math-atlas-devel mailing list
Mat...@li...<mailto:Mat...@li...>
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Showing 8 results of 8

Thanks for helping keep SourceForge clean.
X





Briefly describe the problem (required):
Upload screenshot of ad (required):
Select a file, or drag & drop file here.
Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL:

AltStyle によって変換されたページ (->オリジナル) /