Carnegie Mellon
SCS logo
Computer Science Department

15-410 Git Quickstart


This document is a work in progress. It may not be complete. To the best of our knowledge, the information that is here is correct. If you have issues following the instructions in this document, or you have suggestions to make this document clearer, please send e-mail to staff-410 at the CS domain.

To the end of more facile development of your projects, we've written this quick-start guide for using a modern and popular system for source control: the Git version control system. This document will serve first as a user's reference, second as an explanation of concepts (although you need not understand all of the concepts to use Git), and third as evangelism for Git and other distributed version control systems (although you need not drink my kool-aid to use Git). In theory, each part should stand alone; you need not know of the concepts to use the reference, and you need not know of the reference to be evangelized to. In practice, you may find it useful to read all three parts to get a deeper understanding of what Git is doing while you aren't looking.

Should you use Git, or something simpler? On the one hand, other things might be simpler and faster to learn right now. On the other hand, time spent learning Git will pay off if you join a project that already uses Git. Because there are so many revision-control systems currently in use, there is no guarantee you won't have to learn something else, but Git is among the more popular systems, so it's a plausible investment.

Quick-start

Obtaining/installing Git

Telling Git about you

Getting your project set up

Traditionally, git's default branch was called "master". Some people prefer to use a different name for the main branch, e.g., "main" or "mainline". If you prefer the traditional name, "master", skip the commands below marked with "## main". If you would prefer a name other than "master" or "main", change the two commands marked "## main" to use your preferred name instead of "main".

 ############################################################
 # ONE PARTNER executes these commands exactly once
 ############################################################
 $ cd ~/410/usr/$USER/mygroup/REPOSITORY
 $ git init --bare p2
 $ cd p2 && git symbolic-ref HEAD refs/heads/main ## main
 ############################################################
 # BOTH YOU AND YOUR PARTNER do these
 ############################################################
 $ cd ~/410/usr/$USER/scratch
 $ git clone file://$HOME/410/usr/$USER/mygroup/REPOSITORY/p2
 $ cd p2
 ############################################################
 # ONE OF YOU adds the .gitignore via your personal repository
 ############################################################
 $ cd ~/410/usr/$USER/scratch/p2
 $ git checkout -b main ## main
 $ cp ~/410/pub/gitignore .gitignore
 $ git add .gitignore
 $ git commit -m "Initial commit"
 $ git push origin `git symbolic-ref --short HEAD` -u
 ############################################################
 # BOTH YOU AND YOUR PARTNER do these
 ############################################################
 $ git pull
 $ git checkout main ## main

Working with Git on a day-to-day basis

These operations will become your new best friends. You will use them many times per day; it will pay off to become familiar with their operation and their quirks.

It behooves you to try to make "good" commits, both for yourself and for your partner. To see what a "good" commit looks like, we should probably first look at what a "bad" commit is. Here's a transcript of one of your TAs making quite a few mistakes, all in one short command:

joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit -a -m "Whee!"
[master 6946f54] Whee!
 2 files changed, 2 insertions(+), 2 deletions(-)
joshua@escape:~/school/15-410-ta/p3-s09p4$

What's so wrong about this? Well, the most obvious is the message; the message "Whee!" conveys absolutely no information to your TA's partner (well, maybe it tells my partner that I was excited about this, but not much more than that). But there are more substantial issues here. Let's go back and do this again and see what your TA missed.

joshua@escape:~/school/15-410-ta/p3-s09p4$ git status
# On branch master
# Changed but not updated:
# (use "git add ..." to update what will be committed)
# (use "git checkout -- ..." to discard changes in working
# directory)
#
# modified: kern/mutex.c
# modified: user/progs/vm_explode.c
# Untracked files:
# (use "git add ..." to include in what will be committed)
#
# user/progs/mytest.c
no changes added to commit (use "git add" and/or "git commit -a")
joshua@escape:~/school/15-410-ta/p3-s09p4$ git diff
diff --git a/kern/mutex.c b/kern/mutex.c
index 4a13af1..55f4569 100644
--- a/kern/mutex.c
+++ b/kern/mutex.c
@@ -39,7 +39,7 @@ void mutex_init(mutex_t *mp)
 void mutex_lock(mutex_t *mp)
 {
- make_mutexes_work();
+ make_mutexes_not_work(); // XXX changed briefly to test my demo program
 mutex_level++;
diff --git a/user/progs/vm_explode.c b/user/progs/vm_explode.c
index ee8c2b9..94c8c94 100644
--- a/user/progs/vm_explode.c
+++ b/user/progs/vm_explode.c
@@ -72,7 +72,7 @@ int main() {
 vanish();
 }
 }
- printf("parent: all balls accounted for!\n");
+ printf("parent: all children accounted for!\n");
 set_status(0);
 vanish();
 }
joshua@escape:~/school/15-410-ta/p3-s09p4$ git add user/progs/vm_explode.c user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit
in your TA's editor...
Modify vm_explode to more accurately describe what it's doing instead of punning
on the P2 test 'juggle', and create a spinoff, mytest.
mytest makes sure that the frubulator is frobbed in the mutexes; you can
make it fail by commenting out the call to make_mutexes_work() in kern/mutex.c.
But make sure not to commit that change! Otherwise we'll both be sad.
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Committer: Joshua Wise <joshua@escape.joshuawise.com>
#
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: user/progs/mytest.c
# modified: user/progs/vm_explode.c
#
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: kern/mutex.c
and back at the shell...
".git/COMMIT_EDITMSG" 30L, 1112C written
[master 6ef906e] Modify vm_explode to more accurately describe what it's doing instead
of punning on the P2 test 'juggle', and create a spinoff, mytest.
 1 files changed, 1 insertions(+), 1 deletions(-)
 create mode 100644 user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$

Much better! This time, your TA checked to see what he was changing before he committed it, added only the files he wanted to commit, and then wrote a descriptive commit message so that his partner could test this for himself. Importantly, your TA did not commit the change that would break his kernel's mutexes while doing this, and hence did not get strangled in his sleep by his partner.

Strive to emulate this workflow. You may find that you don't need quite such verbose messages, and git commit -m will work fine for you. That's OK; but try to make your commit messages at least somewhat useful.

Time travel with Git

In an ideal world, we would make no errors while writing code. Sadly, sometimes we wish to travel back to the past and determine what broke. It is generally considered inadvisable to modify history; if you do, you run the risk of killing one or more of your parents, and being in a paradoxical state of existance. If you wish to modify history, you might wish to create an alternate universe; in Git, we call these alternate universes "branches". Luckily, branches aren't needed to just go back and look. You may use these commands somewhat less frequently, but they are no less important.

Splitting reality with Git

At some point, you may wish that you could make a change on a previous version of your tree without affecting the current version (yet); or you may wish to split reality in half, and work on an experimental side-project without disrupting main development of your project. Branches in Git are designed to allow you to do just those things; split away from the main view of reality from some point in time (be that time now or the past).

Don't lose your data!

Git is meant to track versions of files, but that doesn't mean that you can't lose data when working with git. There are multiple kinds of data that might get lost if something goes wrong with git. For more information about what data you might lose, see How to use git to lose data.

You can protect yourself against some git-related data loss by adding these settings to the config file of your shared central repository (e.g., ~/410/$USER/mygroup/REPOSITORY/p2/config).

[receive]
 fsckObjects = true
 denyDeletes = true
 denyNonFastForwards = true
[gc]
 reflogExpire = never
 reflogExpireUnreachable = never
 pruneExpire = never
 rerereresolved = never
 rerereunresolved = never
[core]
 logAllRefUpdates = true

You can do this by editing the config file directly, or by using these commands:

############################################################
# One person does these, once.
############################################################
$ cd ~/410/$USER/mygroup/REPOSITORY/p2
$ git config receive.fsckObjects true
$ git config receive.denyDeletes true
$ git config receive.denyNonFastforwards true
$ git config gc.reflogExpire never
$ git config gc.reflogExpireUnreachable never
$ git config gc.pruneExpire never
$ git config gc.rerereresolved never
$ git config gc.rerereunresolved never
$ git config core.logAllRefUpdates true

You may also wish to apply some or all of these settings to your personal repository, though this is less important because in theory you are frequently pushing your work to a well-configured central repository.

Special last-minute warning!

Git is a powerful system which includes the ability for other people to do many things that you, personally, should not do.

One particular thing you should not do is experiment with certain dangerous, exciting, or fancy commands frantically in the last few hours before an assignment is due. Unless you already are completely expert in what these commands do, the last day is the wrong time to find out. These commands include:

  1. git reset
  2. git revert
  3. git rebase

"The last minute" is also not a good time to use any --hard or --force parameters. Basically each one is designed to throw away data in some situation, and you are now in a situation in which you want to avoid throwing away data!

What should you do if you are in turmoil?

  1. First, store a copy of your personal repository somewhere safe, e.g., a tarball. Do not skip this step.
  2. If your repository is not in a clean state, commit. Do not skip this step.
  3. If your personal work is not yet pushed to your group's central repository, push it. If you think you can't, but you carefully made a copy of your repository as indicated above, you may be able to skip this step.
  4. It should be safe for you to git checkout a previous commit. You will most likely want to specify a branch name with the -b parameter, e.g.,
    % git checkout -b turmoil 3435c3f792
    
  5. If your commits are fine-grained, you may well be able to use git cherry-pick to hoist particular commits from one branch onto another. Regardless, whatever you do on this branch should be unable to corrupt your group's central repository, and you should be able to go back to the saved copy of your personal repository.
  6. If you end up with something you like, you can push it to your common repository and create the remote branch with
    % git push -u
    
  7. Submit from this "turmoil" branch--of course, only after having pushed it to your central repository first! Then you can get help from a git expert at leisure, after submission, on how to merge this branch onto the trunk or how to replace the old trunk with this new branch.

Explanation of Concepts

The above involved some simplifications of the underlying concepts of Git for the purposes of readability and for the purposes of understandability of an introduction. The simplifications are not disastrous in terms of your comprehension of what Git is doing behind your back, but you may find it helpful to know how Git stores data to better work with Git. Tommi Virtanen's excellent page Git for Computer Scientists may provide some insight as well, for those who like to talk about DAGs and are big fans of arrows pointing every which way.

Commits

The basic unit of a point in time stored in Git is a commit. Each time we spoke of recording changes earlier, it would have been more correct to say "creating a commit"; I used the words "recording changes" to distinguish the operation from pushing and publishing your changes to your partner. A commit, by its nature, is comprised of a few pieces of information:

A commit is identified by the SHA1 hash of all of the information that it contains. This hash is one common form of a refspec -- that is to say, it is one common way to specify a single commit. Recall that when you did a checkout to go back in time, you specified a SHA1 hash; in that case, you were using the SHA1 hash as a refspec.

You may have inferred by now that commits exist in a sort of a tree. Each commit may have one or more parent commits (a commit with more than one parent is called a merge commit), and each commit may have zero or more child commits. You can view the commit tree using gitk, as we saw above; each commit was identified by a dot, and gitk drew lines for us between each commit to explicitly show the branches of the tree.

This tree of cryptographic hashes gives Git a few very useful properties. Git can assure you that nobody has changed the tree that you have based your work on, because every element in the tree, down to the blobs, is identified by its cryptographic hash (its SHA1). If a parent object has changed, either by malicious intent or by disk corruption, Git simply will not be able to find the parent object, instead of giving you the incorrect data. This makes Git relatively immune to AFS corrupting its metadata.

Further, it makes it impossible to throw away history. Some version control systems that we discussed in lecture have versions per file; so deleting a file may delete its version history, or otherwise create a discontinuity in how the file is linked in terms of time. Similarly, renaming a file is not disastrous (although somewhat quirky); the only changes happen locally in the commit object. If a delete required a change of history, then the cryptographic hashes would change, and the entire tree's parent hash would have to change. The cryptographic hash system, then, makes Git resistant to inadvertant deletion of history.

Branches, tags, and refspecs -- oh my!

In this section, until now, you've seen only one kind of refspec -- a SHA1 hash of a commit. But in the quick-start above, you've worked with more types of refspecs; when you checked out a branch, you used the refspec that refers to the branch.

Further Reading

Here are some sources you might consult.


[Last modified Friday February 09, 2024]

AltStyle によって変換されたページ (->オリジナル) /