Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Aug 15, 2019. It is now read-only.

Add cholesky and gradients #1492

Open
ethanluoyc wants to merge 2 commits into tensorflow:master
base: master
Choose a base branch
Loading
from ethanluoyc:cholesky

Conversation

Copy link

@ethanluoyc ethanluoyc commented Jan 10, 2019
edited by dsmilkov
Loading

This change is Reviewable

Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

Copy link
Contributor

DirkToewe commented Jan 10, 2019
edited
Loading

@ethanluoyc Have You looked at PR#1356? It already contains a Cholesky Decomposition. Your implementation is a lot cleaner and uses TFJS methods as building blocks, which is a big upside. But as of yet, PR#1356 seems to be significantly faster. The Benchmark is done on CPU backend since PR#1492 runs faster there than on WebGL on my machine. Feel free to double check the benchmark since I'm known to make mistakes and mix things up.

P.S.: I hope its okay that I included Your code in the Benchmark. I specifically mentioned the source and exempted it from the license.

Copy link
Author

ethanluoyc commented Jan 10, 2019
edited
Loading

Hi @DirkToewe,

Thanks for letting me know your PR! I checked the issues page but didn't see any mentions of cholesky so I thought no one has tackled this.

The version currently I have here is WIP. One downside is indeed the memory issue. I think mine has not been optimized for that yet. I guess maybe we can work together on implementing cholesky and merge it into the codebase.

BTW, do you have the gradients implemented? I have a preliminary version which I think is correct (more testing needed still). Another thing I haven't implemented is the blocked version of the algorithm, same for the gradients. I need the differentiable version because I use it as part of my models (Gaussian processes).

Updates
@DirkToewe I took a closer look at your benchmarks. I appear to be using the algorithm from wikipedia? I am using the one mentioned in https://homepages.inf.ed.ac.uk/imurray2/pub/16choldiff/choldiff.pdf, which theoretically can be made faster by blocking and in-memory updates. I currently do not know the best way to implement them
idiomatically in tfjs but I guess maybe someone from tfjs can advise?

Copy link

CLAs look good, thanks!

Copy link
Contributor

Yes gradients are implemented symbollically, see here. But it requires quite a few new operations (bandPart, triangularSolve, setDiag, ...) to be introduced, which is why the PR became so massive.

Copy link
Contributor

Updates
@DirkToewe I took a closer look at your benchmarks. I appear to be using the algorithm from wikipedia? I am using the one mentioned in https://homepages.inf.ed.ac.uk/imurray2/pub/16choldiff/choldiff.pdf, which theoretically can be made faster by blocking and in-memory updates. I currently do not know the best way to implement them
idiomatically in tfjs but I guess maybe someone from tfjs can advise?

That's correct. It was the implementation of choice for me because:

  • it is simple and
  • it calculates elements in 64-bit before rounding to float32
    (In NDJS, Kahan Summation was used to potentially improve float64 precision as well).

But there's no doubt there's room for performance improvements. If You want to do blocked operations, You're going to have to work with nested loops. As an example for that, take a look at the CPU MatMul implementation.

But a slight word of warning: From my experience, micro-optimizations will not yield as much performance as You may hope for.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /