Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

Profile

#Profile TheThe first step of improving the performance is to profile it. I recommend running this using Xcode's profile option and see where the time is spent. I suspect (but don't know for sure) that it will be in the calls to sin() and cos().

Avoid Casts

#Avoid Casts OneOne thing that can slow down calculations is lots of casts between types. You're using k, n, and N as integers in most of the code, but need to cast them to Double to calculate q. You could keep a parallel dk, dn, and dN that are floating point copies of k, n, and N to avoid the casts. You'll need to manually increment them in the loops, though.

Do more at once

#Do more at once IfIf you look in <Accelerate/vfp.h>, you'll find vsinf() and vcosf() and more importantly, vsincosf() which calculate sine, cosine, and both at once for a whole vector of Floats. The precision is less than Double, so I don't know if it meets your precision needs, but I'd look into it. This should allow you to work on 16 elements at a time instead of only 1.

#Profile The first step of improving the performance is to profile it. I recommend running this using Xcode's profile option and see where the time is spent. I suspect (but don't know for sure) that it will be in the calls to sin() and cos().

#Avoid Casts One thing that can slow down calculations is lots of casts between types. You're using k, n, and N as integers in most of the code, but need to cast them to Double to calculate q. You could keep a parallel dk, dn, and dN that are floating point copies of k, n, and N to avoid the casts. You'll need to manually increment them in the loops, though.

#Do more at once If you look in <Accelerate/vfp.h>, you'll find vsinf() and vcosf() and more importantly, vsincosf() which calculate sine, cosine, and both at once for a whole vector of Floats. The precision is less than Double, so I don't know if it meets your precision needs, but I'd look into it. This should allow you to work on 16 elements at a time instead of only 1.

Profile

The first step of improving the performance is to profile it. I recommend running this using Xcode's profile option and see where the time is spent. I suspect (but don't know for sure) that it will be in the calls to sin() and cos().

Avoid Casts

One thing that can slow down calculations is lots of casts between types. You're using k, n, and N as integers in most of the code, but need to cast them to Double to calculate q. You could keep a parallel dk, dn, and dN that are floating point copies of k, n, and N to avoid the casts. You'll need to manually increment them in the loops, though.

Do more at once

If you look in <Accelerate/vfp.h>, you'll find vsinf() and vcosf() and more importantly, vsincosf() which calculate sine, cosine, and both at once for a whole vector of Floats. The precision is less than Double, so I don't know if it meets your precision needs, but I'd look into it. This should allow you to work on 16 elements at a time instead of only 1.

Source Link

answered Jan 31, 2017 at 4:37

user1118321

answered Jan 31, 2017 at 4:37

user1118321

11.9k
1
20
46

default