86

I'm starting up a Git repository for a group project. Does it make sense to store documents in the same Git repository as code - it seems like this conflicts with the nature of the git revision flow.

Here is a summary of my question(s):

  • Is the Git revisioning style going to be confusing if both code and documents are checked into the same repository? Experiences with this?

  • Is Git a good fit for documentation revision control?

  • I am NOT asking if a Revision Control System in general should or shouldn't be used for documentation - it should.

Thanks for the feedback so far!

asked Jun 17, 2011 at 6:36
5
  • Ah, okay... thanks for the clarification. I don't see why it would be a problem, but I don't have any personal experience with GIT (just a theoretical understanding), so I'll let someone with more direct experience answer that question. Commented Jun 17, 2011 at 7:24
  • 1
    I don't quite see how this is on topic. You're talking about software documentation and committing with a DVCS Commented Jun 17, 2011 at 9:36
  • Probably depends on the documentation and your needs. Do you need diffs and is it in a format that can handle it? If git gives the required services sure. Beats having a separate document management system... Commented Mar 15, 2012 at 11:33
  • 1
    If your documentation is in plain text - fine. If it is a binary format, you essentially need a version control system that understand the binary format - this is vendor lock-in in its purest form. Commented Sep 8, 2012 at 9:20
  • related: What Part of Your Project Should be in Source Code Control? Commented Mar 4, 2014 at 15:40

9 Answers 9

57

We store documentation in SVN all the time. In fact, our entire user manual is written in LaTeX, and stored in SVN. We chose LaTeX specifically because it is a text-based language, and easy to show line-by-line diffs.

We also store some non-text formatted files, like Microsoft Office .doc files, spread sheets, .zip files, etc, when necessary... but some of the benefit of a RCS is lost when you can't see the the incremental diffs.

The key is really to make sure your documentation is well organized, so that people can find (and update) the documentation (and the source) when they need it.

answered Jun 17, 2011 at 6:46
10
  • 13
    If you're a Microsoft shop, TortoiseSVN supports MS Office line-by-line diffs. Commented Jun 17, 2011 at 13:46
  • 4
    Dropping binary doc formats would make the world a better place. o given that docs are plain-text, there should be no real problem with a DVCS either. Commented Jun 17, 2011 at 18:28
  • Oh, and first time I heard about TortoiseSVN and doc files, so +1 for that. Wonder if that'll end up on Tortoise[AnyDVCS] anytime in the future. Commented Jun 17, 2011 at 18:28
  • @Phil: How does TortoiseSVN accomplish this? Is the doc-diff viewer integrated with the SVN client, or can it be used independently? Commented Jun 19, 2011 at 6:13
  • 2
    A cool option would be to use Pandoc so that most of your documentation is in Markdown, but the crucial bits can still use TeX. Since it compiles the Markdown to LaTeX, the results look the same. However, this would also let you export it to different formats and would make the source easier to read. Commented Sep 8, 2012 at 17:23
22

Well it depends on what format do you use for the documentation. If it is something text based it is all good.

Git can also store binary content and you can track revisions, but the diff output will not make sense.

It is also possible to store documentation in the code itself like perldoc pod, java also has some format/annotation for this.

answered Jun 17, 2011 at 6:49
5
  • I agree, while it's possible to store non-text documentation, git will do a lot better if you store text instead. There's been talk of a diff driver that knows how to diff word (or similar) documents, but I'm not sure if it was implemented or not Commented Jun 17, 2011 at 9:27
  • I though Word moved their format away from binary to XML. Commented Jun 17, 2011 at 13:48
  • 4
    @karategeek6 Word's 'XML' format is not human readable. And one line of text does not correspond to one line of Word's XML, even in approximation. So it might as well be binary. Commented Jun 17, 2011 at 13:56
  • You can instruct Word to save your output in uncompressed XML. Choose Save As, then select Word XML Document (*.xml) instead of the default Word Document (*.docx). The XML is pretty complex, so this is no guarantee the changes will be easily readable, but at least it won't be binary. Commented Aug 17, 2011 at 17:06
  • > but the diff output will not make sense. Incase of diff, we could open 2 revision of a document side by side and compare by our eyes :) Commented Jun 7, 2018 at 2:44
18

It is clear that using some kind of Version Control System for storing docs is a nobrainer. The more interesting part of the question is if it is good idea to store documents in the SAME location as the source code? The possible problem here is that it might be hard to set different access privileges for code and documentation in that case. And in many business cases people will need access to docs but not the source code, like marketing or BA departments.

answered Jun 17, 2011 at 12:25
4
  • 4
    Yes, the "same location" aspect is one of the key parts of this question! Commented Jun 17, 2011 at 18:13
  • Same location is good if you can manage it, because it avoids the need to either have tribal knowledge (knowing where to look), or the need to go searching for where the stuff is. Commented Jun 22, 2011 at 8:06
  • 1
    They may not need access to the code but it shouldn't hurt for them to have that access. They don't have to look at it. Secrets generally shouldn't be in version control anyway. Commented Nov 8, 2016 at 21:17
  • Maybe the best balance here would be storing in a version control system but using some kind of website to pull the docs in for viewing for non-technical users. Commented Dec 11, 2020 at 15:45
14

Just like source code, documentation should have a full history and the ability to revert to an earlier version if that becomes necessary. A version control system is perfect for this.

answered Jun 17, 2011 at 7:31
4
  • 6
    Only if the documentation is in a text form. Binary blobs do not fully benefit from version control. Commented Jun 22, 2011 at 8:00
  • 2
    @ThorbjørnRavnAndersen: Even so, unless you have a binary-specific versioning system, it's probably better to keep even binary files in Git rather than on their own. Commented Sep 8, 2012 at 17:24
  • @TikhonJelvis I did not question whether it is a good idea to put binary files in git - if they are the original artifacts, it is. Try, however, to run "git diff" on Word documents. Commented Sep 8, 2012 at 17:46
  • @user1249 : you could "export" 2 revision to desktop, say my_docs_rev15.docx and my_docs_rev14.docx then open it side by side and compare by your eyes and brain, its not that hard :) Commented Jun 7, 2018 at 2:48
13
  • Having more than just source code in a repository is a very good thing.

    It groups all of your resources together and turns the project into a cohesive, centralized entity rather than a scattered collection of files. Contributors/employees know where to find everything, rather than sending "Where do I change the documentation for feature x?" emails.

    You'll want to keep things organized. Have a system for separating the src from the images from the docs. You can always add a .gitignore to a directory to keep the repository and history clean. Because Git commits are file-based,* you can decouple source changes from documentation changes as strongly as you like.

  • As others have said, Git is great for documentation versioning as long as it's text-based.

  • I completely agree; documentation should be versioned right alongside the code.

My credibility comes from being a GitHub user and contributing to one project and exploring many others. In my experience, a complete, unified project is easy to tell from a half-missing one. I try to contain all of my projects within single directories whenever possible.


* This isn't quite accurate, because there are ways to specify parts of a file to be committed (here's one example).

answered Sep 7, 2012 at 22:47
9

In the company that I work we put documentation in SVN. However, after few conflicts and the need to share it, we decided to move it to Mediawiki.

At first it was trac, after that moved to Mediawiki cause it was easer to use...

The main problem with SVN was the sharing cause we had authorization system for SVN.

answered Jun 17, 2011 at 11:04
4
  • 2
    Don't you mean Mediawiki, the wiki engine that Wikipedia uses? Commented Jun 17, 2011 at 18:49
  • @Martijn, I'd assume so Commented Jun 18, 2011 at 0:57
  • @Martijn: Yes, edited Commented Jun 22, 2011 at 7:55
  • I would rather stick with a wiki rather than sending a lot of files that are not course to a SCM, but thats more to do with personal preference. There is much more you can do with it. I specially like Foswiki and their website based/ project based template. Glad someone pointed deciding to use a wiki due to problems :) +1. Commented Mar 15, 2012 at 12:49
4

I came here with a similar question. We come from a SVN-environment, where it basically is a no-brainer to keep all materials related to a project in the same repository. Due to SVN's nature, you can easily check out parts of the repository, so if you just need the sourcecode (for example, a website deployment), that's no problem.

With Git, things are different. A checkout is always at the root level, so if you want to put everything in the same repository, you will always end up with the same directory structure. One approach I have come across is to put everything in separate branches, i.e. you have code-branches (which would typically be your normal master, develop, etc. branches) and a doc branch, which has its own, separate directory structure. I'm not certain yet that's the best idea, but it is a suggestion which circumvents the problem which I imagine is at the base of your question.

answered Mar 15, 2012 at 11:17
2
  • Different branches with radically different directory structures has a very bad code smell to me. I would leave it all in one repo, making it easy for contributors to more easily add a mix of code and documentation. In fact, literate programming (Google that!) demands it. Commented Nov 7, 2015 at 21:07
  • When distributing packages, I'm partial to the .deb style that allows me to download executables to all servers, while my development box also has the documentation packages. Commented Nov 7, 2015 at 21:07
1

I use a wiki for internal docs...get revision PLUS prominent access/easy editing. When documentation is out of sync, update it right then and there. For end-user documentation, consider a professional tool like Madcap Flare They use an XML dialect for sharing, composing, and transforming documentation.

answered Jun 17, 2011 at 17:52
-1

In code, thoughts are typically separated line-by-line. I tend to write documentation with soft line wraps. When I commit those files, lines are a whole paragraph long. That's not very useful to read in git diff. That's the problem I was trying to solve when I Googled and found this page. Thanks to Arne Hartherz for introducing me to git diff --word-diff. You might like git diff --color-words even better.

answered Nov 7, 2015 at 20:51

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.