-
Notifications
You must be signed in to change notification settings - Fork 944
Add cholesky and gradients #1492
Conversation
googlebot
commented
Jan 10, 2019
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
📝 Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here (e.g. I signed it!
) and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
- The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
- The email used to register you as an authorized contributor must also be attached to your GitHub account.
@ethanluoyc Have You looked at PR#1356? It already contains a Cholesky Decomposition. Your implementation is a lot cleaner and uses TFJS methods as building blocks, which is a big upside. But as of yet, PR#1356 seems to be significantly faster. The Benchmark is done on CPU backend since PR#1492 runs faster there than on WebGL on my machine. Feel free to double check the benchmark since I'm known to make mistakes and mix things up.
P.S.: I hope its okay that I included Your code in the Benchmark. I specifically mentioned the source and exempted it from the license.
Hi @DirkToewe,
Thanks for letting me know your PR! I checked the issues page but didn't see any mentions of cholesky so I thought no one has tackled this.
The version currently I have here is WIP. One downside is indeed the memory issue. I think mine has not been optimized for that yet. I guess maybe we can work together on implementing cholesky and merge it into the codebase.
BTW, do you have the gradients implemented? I have a preliminary version which I think is correct (more testing needed still). Another thing I haven't implemented is the blocked version of the algorithm, same for the gradients. I need the differentiable version because I use it as part of my models (Gaussian processes).
Updates
@DirkToewe I took a closer look at your benchmarks. I appear to be using the algorithm from wikipedia? I am using the one mentioned in https://homepages.inf.ed.ac.uk/imurray2/pub/16choldiff/choldiff.pdf, which theoretically can be made faster by blocking and in-memory updates. I currently do not know the best way to implement them
idiomatically in tfjs but I guess maybe someone from tfjs can advise?
googlebot
commented
Jan 10, 2019
CLAs look good, thanks!
Yes gradients are implemented symbollically, see here. But it requires quite a few new operations (bandPart
, triangularSolve
, setDiag
, ...) to be introduced, which is why the PR became so massive.
Updates
@DirkToewe I took a closer look at your benchmarks. I appear to be using the algorithm from wikipedia? I am using the one mentioned in https://homepages.inf.ed.ac.uk/imurray2/pub/16choldiff/choldiff.pdf, which theoretically can be made faster by blocking and in-memory updates. I currently do not know the best way to implement them
idiomatically in tfjs but I guess maybe someone from tfjs can advise?
That's correct. It was the implementation of choice for me because:
- it is simple and
- it calculates elements in 64-bit before rounding to float32
(In NDJS, Kahan Summation was used to potentially improve float64 precision as well).
But there's no doubt there's room for performance improvements. If You want to do blocked operations, You're going to have to work with nested loops. As an example for that, take a look at the CPU MatMul implementation.
But a slight word of warning: From my experience, micro-optimizations will not yield as much performance as You may hope for.
Uh oh!
There was an error while loading. Please reload this page.
This change is Reviewable