Zero-knowledge code hosting? [closed]

Question 1

In light of recent revelations about widespread government monitoring of data stored by online service providers, zero-knowledge services are all the rage now.

A zero-knowledge service is one where all data is stored encrypted with a key that is not stored on the server. Encryption and decryption happens entirely on the client side, and the server never sees either plaintext data or the key. As a result, the service provider is unable to decrypt and provide the data to a third party, even if it wanted to.

To give an example: SpiderOak can be viewed as a zero-knowledge version of Dropbox.

As programmers, we rely heavily on, and trust some of our most sensitive data - our code - to a particular class of online service providers: code hosting providers (like Bitbucket, Assembla, and so on). I am of course talking about private repositories here - the concept of zero-knowledge does not make sense for public repositories.

My questions are:

Are there any technological barriers to creating a zero-knowledge code hosting service? For example, is there something about the network protocols used by popular version control systems like SVN, Mercurial, or Git that would make it difficult (or impossible) to implement a scheme where the data being communicated between the client and the server is encrypted with a key the server does not know?
Are there any zero-knowledge code hosting services in existence today?

Question 2

Without homomorphic encryption, I don't see how a zero-knowledge code hosting site could provide any sort of benefit over a zero-knowledge version of drop-box. I don't believe anyone has yet come up with such a scheme which is both secure (i.e., secure enough that the experts trust it) and fast enough to be usable.

Question 3

@AndresF. I can only assume SpiderOak means that diff-generation occurs on the client, the server stores encrypted diffs, and then diff-to-base application occurs again on the client when the diff and base are encrypted. I agree that their language is very unclear.

Question 4

@apsillers: Or you could deliberately stuff such content into a file and use it to identify the file itself (e.g., if someone was trying to use encryption to hide piracy).

Question 5

It's not something i have any experience in, but i can imagine one possible technological barrier to having a zero-knowledge code hosting service: won't all users need to know/use the exact same key? And if that's the case, what will be the authentication mechanism that ensures different levels of user access?

Question 6

@gnat: I'm not asking for a recommendation. I'm merely asking for whether a service of the sort I described exists. The existence of such a service would provide evidence that the technological barriers that I ask about earlier in the question are overcomable.

Question 7

You can encrypt each line seperately. If you can afford to leak your file names and approximate line lengths and the line numbers on which lines changes occur, you can use something like this:

https://github.com/ysangkok/line-encryptor

As each line is encrypted seperately (but with the same key), the uploaded changes will (like usually) only involve the relevant lines.

If it is presently not convenient enough, you could make two Git repositories, one with plaintext and one with ciphertext. When you commit in the plaintext repository (which is local), a commit hook could take the diff and run it through the line encryptor referenced above, which would apply it to the ciphertext repository. The ciphertext repository changes would be committed and uploaded.

The line encryptor above is SCM agnostic, but can read unified diff files (of plaintext) and encrypt the changes and apply them to the ciphertext. This makes it usable on any SCM that will generate you a unified diff (like Git).

Question 8

Couldn't you use git's smudge-clean for this?

Question 9

@svick: You could, but that way, I don't see how you would nicely allow avoiding re-encrypting the whole file. But of course, it wouldn't matter much for code since the file sizes are small. But there is no need for a "line encryptor" then, you can just use any encryption tool.

Question 10

Wouldn't lots of text samples (with a known structure) be something that would make it easier to attack the key? Every blank line would encrypt the same. Every start and end of a javadoc would be the same. Now you know the clear text and the cipher text for some segment of the code which can be used. This likely wouldn't be useful against any but hobbyists (anyone with trained crypto types or sufficient computing power could break it with enough effort).

Question 11

@MichaelT: No, because of IV's. Try it out yourself :) Using the linked implementation, lines encrypt to <IV>,<ciphertext>.

Question 12

@svick: Lines are encrypted individually. If you change a line, the whole line would get re-encrypted, but with a new IV (as always). But the rest of the file won't be touched! Encryption is deterministic, but the IV's are inputs too, and they are pseudo-randomly chosen.

Question 13

I don't think there are any barriers - consider SVN, what gets sent to the server for storage is the delta between what the previous and current version of your code - so you change 1 line, just that line gets sent to the server. The server then 'blindly' stores it without doing any inspection of the data itself. If you encrypted the delta and sent that instead, there would be no impact on the server, in fact you wouldn't even need to modify the server at all.

There are other bits that might matter, such as meta data properties that are not easily encryptable - such as mime type - but others could be encrypted, eg comments in the history log, just as long as you know you have to decrypt them on the client to view. I'm not sure if the directory structure would be visible, I think it would not be visible due to the way SVN stores directories, but its possible I'm wrong. This might not matter to you if the contents are secure however.

This would mean you couldn't have a web site with the various code view features, no server-side repository browser or log viewer. No code diffs, no online code review tools.

Something like this already exists, to a point, Mozy stores your data encrypted with your private key (you can use their own, and they make noises about "if you lose your own key, too bad, we can't restore your data for you", but that's more targeted at the common user). Mozy also stores a history of your files, so you can retrieve previous versions. Where it falls down is that upload is on a regular basis, not checkin when you want, and I believe it discards old versions when you run out of storage space. But the concept is there, they could modify it to provide secure source control using their existing system.

Question 14

Re: "This would mean you couldn't have a web site with the various code view features, no server-side repository browser or log viewer. No code diffs, no online code review tools." - You could still have these if the application logic was in client-side JS and it made you enter your password/key (but not send it to the server), right?

Question 15

Yes, it could.... Anything would as long as it knew it was receiving encrypted data over the network. It's just an obvious limitation of the server that it cannot decrypt the data.

Question 16

I hate to do one of those 'this isn't quite going to answer your question' answers.. but..

I can think of two ready solutions which should address these worries.

Host a private Git server on your own. Then put that server on a VPN to which you give your team members access. All communication to and from the server would be encrypted, and you could of course encrypt the server at the OS-level.
BitSync should do the trick as well. Everything would be encrypted, and in a huge network which would be available from anywhere. Might actually be a really good application of all this BitCoin/BitMessage/BitSync technology..

Lastly, the folks over at https://security.stackexchange.com/ might have some more insight.

Question 17

Regarding BitSync: are you suggesting that it be used as a replacement for a version control system, or somehow used together with a version control system? If the former, then sure, but that's not very interesting. I could just as well share the files over SpiderOak and it would be centralized, but still zero-knowledge. If the latter, then how?

Question 18

@HighCommander4 Haven't tried it, but shouldn't be any reason for it to not work.. Couldn't you setup sync to share your initialized git folder, then just do a normal 'git push ./syncedFolderActingAsServer/MyAwesomeProject/src/'? You could also do git level permissions, etc.. someone should try this!

Question 19

As I understand it, the way git pull works is that the server sends you a pack file that contains all the objects that you want, but don't have currently. And vice versa for git push.

I think you couldn't do it like this directly (because this means the server has to understand the objects). What you could do instead is to let the server work just with a series of encrypted pack files.

To do pull, you download all the pack files that were added since your last pull, decrypt them and apply to your git repo. To do push, you first have to do pull, so that you know the state of the server. If there are no conflicts, you create a pack file with your changes, encrypt it and upload it.

With this approach, you would end up with large number of tiny pack files, which would be quite inefficient. To fix that, you could download a series of pack files, decrypt, combine them into one pack file, encrypt and upload them to the server, marking them as a replacement for that series.

Janus Troelsen Janus Troelsen 2792 silver badges17 bronze badges · Answer 1 · 2013-08-22 12:02:06Z

You can encrypt each line seperately. If you can afford to leak your file names and approximate line lengths and the line numbers on which lines changes occur, you can use something like this:

https://github.com/ysangkok/line-encryptor

As each line is encrypted seperately (but with the same key), the uploaded changes will (like usually) only involve the relevant lines.

If it is presently not convenient enough, you could make two Git repositories, one with plaintext and one with ciphertext. When you commit in the plaintext repository (which is local), a commit hook could take the diff and run it through the line encryptor referenced above, which would apply it to the ciphertext repository. The ciphertext repository changes would be committed and uploaded.

The line encryptor above is SCM agnostic, but can read unified diff files (of plaintext) and encrypt the changes and apply them to the ciphertext. This makes it usable on any SCM that will generate you a unified diff (like Git).

@svick: You could, but that way, I don't see how you would nicely allow avoiding re-encrypting the whole file. But of course, it wouldn't matter much for code since the file sizes are small. But there is no need for a "line encryptor" then, you can just use any encryption tool.
Wouldn't lots of text samples (with a known structure) be something that would make it easier to attack the key? Every blank line would encrypt the same. Every start and end of a javadoc would be the same. Now you know the clear text and the cipher text for some segment of the code which can be used. This likely wouldn't be useful against any but hobbyists (anyone with trained crypto types or sufficient computing power could break it with enough effort).
@MichaelT: No, because of IV's. Try it out yourself :) Using the linked implementation, lines encrypt to <IV>,<ciphertext>.
@svick: Lines are encrypted individually. If you change a line, the whole line would get re-encrypted, but with a new IV (as always). But the rest of the file won't be touched! Encryption is deterministic, but the IV's are inputs too, and they are pseudo-randomly chosen.

gbjbaanb gbjbaanb 48.8k7 gold badges106 silver badges174 bronze badges · Answer 2 · 2013-08-21 10:37:54Z

I don't think there are any barriers - consider SVN, what gets sent to the server for storage is the delta between what the previous and current version of your code - so you change 1 line, just that line gets sent to the server. The server then 'blindly' stores it without doing any inspection of the data itself. If you encrypted the delta and sent that instead, there would be no impact on the server, in fact you wouldn't even need to modify the server at all.

There are other bits that might matter, such as meta data properties that are not easily encryptable - such as mime type - but others could be encrypted, eg comments in the history log, just as long as you know you have to decrypt them on the client to view. I'm not sure if the directory structure would be visible, I think it would not be visible due to the way SVN stores directories, but its possible I'm wrong. This might not matter to you if the contents are secure however.

This would mean you couldn't have a web site with the various code view features, no server-side repository browser or log viewer. No code diffs, no online code review tools.

Something like this already exists, to a point, Mozy stores your data encrypted with your private key (you can use their own, and they make noises about "if you lose your own key, too bad, we can't restore your data for you", but that's more targeted at the common user). Mozy also stores a history of your files, so you can retrieve previous versions. Where it falls down is that upload is on a regular basis, not checkin when you want, and I believe it discards old versions when you run out of storage space. But the concept is there, they could modify it to provide secure source control using their existing system.

Re: "This would mean you couldn't have a web site with the various code view features, no server-side repository browser or log viewer. No code diffs, no online code review tools." - You could still have these if the application logic was in client-side JS and it made you enter your password/key (but not send it to the server), right?
Yes, it could.... Anything would as long as it knew it was receiving encrypted data over the network. It's just an obvious limitation of the server that it cannot decrypt the data.

Rubber Duck Rubber Duck 3373 silver badges9 bronze badges · Answer 3 · 2013-08-20 20:44:34Z

I hate to do one of those 'this isn't quite going to answer your question' answers.. but..

I can think of two ready solutions which should address these worries.

Host a private Git server on your own. Then put that server on a VPN to which you give your team members access. All communication to and from the server would be encrypted, and you could of course encrypt the server at the OS-level.
BitSync should do the trick as well. Everything would be encrypted, and in a huge network which would be available from anywhere. Might actually be a really good application of all this BitCoin/BitMessage/BitSync technology..

Lastly, the folks over at https://security.stackexchange.com/ might have some more insight.

Regarding BitSync: are you suggesting that it be used as a replacement for a version control system, or somehow used together with a version control system? If the former, then sure, but that's not very interesting. I could just as well share the files over SpiderOak and it would be centralized, but still zero-knowledge. If the latter, then how?
@HighCommander4 Haven't tried it, but shouldn't be any reason for it to not work.. Couldn't you setup sync to share your initialized git folder, then just do a normal 'git push ./syncedFolderActingAsServer/MyAwesomeProject/src/'? You could also do git level permissions, etc.. someone should try this!

svick svick 10.1k1 gold badge39 silver badges54 bronze badges · Answer 4 · 2013-08-22 20:06:20Z

As I understand it, the way git pull works is that the server sends you a pack file that contains all the objects that you want, but don't have currently. And vice versa for git push.

I think you couldn't do it like this directly (because this means the server has to understand the objects). What you could do instead is to let the server work just with a series of encrypted pack files.

To do pull, you download all the pack files that were added since your last pull, decrypt them and apply to your git repo. To do push, you first have to do pull, so that you know the state of the server. If there are no conflicts, you create a pack file with your changes, encrypt it and upload it.

With this approach, you would end up with large number of tiny pack files, which would be quite inefficient. To fix that, you could download a series of pack files, decrypt, combine them into one pack file, encrypt and upload them to the server, marking them as a replacement for that series.

Stack Exchange Network

Zero-knowledge code hosting? [closed]

4 Answers 4

Hot Network Questions

Zero-knowledge code hosting? [closed]

4 Answers 4

Related

Hot Network Questions