Source code control

Experimental html version of downloadable textbook, see https://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% %%%% This text file is part of the source of %%%% `Introduction to High-Performance Scientific Computing' %%%% by Victor Eijkhout, copyright 2012-2020 %%%% %%%% mathjax.tex : macros to facility mathjax use in html version %%%% %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \newcommand\macro[1]{$\langle$#1$\rangle$} \newcommand\dtdxx{\frac{\alpha\Delta t}{\Delta x^2}} \] 26.1 : Workflow in source code control systems
26.2 : Mercurial (hg) and Git
26.3 : Create and populate a repository
26.4 : New files
26.5 : Editing files
26.6 : Oops! Undo!
26.7 : Conflicts
26.8 : Inspecting the history
26.9 : Branching
26.10 : Pull requests and forks
26.11 : Other issues
26.11.1 : Transport
Back to Table of Contents

26 Source code control

Source code control

systems, also called revision control systems|see{source code control}} or version control systems, are a way of storing software, where not only the current version is stored, but also all previous versions. This is done by maintaining a repository for all versions, while one or more users work on a `checked out' copy of the latest version. Those of the users that are developers can then commit their changes to the repository. Other users then update their local copy. The repository typically resides on a remote machine that is reliably backup up.

There are various reasons for keeping your source in a repository.

  • If you work in a team, it is the best way to synchronize your work with your colleagues. It is also a way to document what changes were made, by whom, and why.
  • It will allow you to roll back a defective code to a version that worked.
  • It allows you to have branches, for instance for customizations that need to be kept out of the main development line. If you are working in a team, a branch is a way to develop a major feature, stay up to date with changes your colleagues make, and only add your feature to the main development when it is sufficiently tested.
  • If you work alone, it is a way to synchronize between more than one machine. (You could even imagine traveling without all your files, and installing them from the repository onto a borrowed machine as the need arises.)
  • Having a source code repository is one way to backup your work.

    There are various source code control systems; in this tutorial you can learn the basics of Subversion (also called svn source code control systems, and Mercurial (or~ hg ), which is an example of the new generation of code control} systems.

    26.1 Workflow in source code control systems

    crumb trail: > svn > Workflow in source code control systems

    Source code control systems are built around the notion of repository : a central store of the files of a project, together with their whole history. Thus, a repository allows you to share files with multiple people, but also to roll back changes, apply patches to old version, et cetera.

    The basic actions on a repository are:

  • Creating the repository; this requires you to have space and write permissions on some server. Maybe your sysadmin has to do it for you.
  • Checking out the repository, that is, making a local copy of its contents in your own space.
  • Adding your changes to the repository, and
  • Updating your local copy with someone else's changes.

    Adding your own changes is not always possible: there are many projects where the developer allows you to check out the repository, but not to incorporate changes. Such a repository is said to be read-only.

    Figure~ 26.1 illustrates these actions for the Subversion system.

    FIGURE 26.1: Workflow in traditional source code control systems such as Subversion

    Users who have checked out the repository can edit files, and check in the new versions with the commit command; to get the changes committed by other users you use update .

    One of the uses of committing is that you can roll your code back to an earlier version if you realize you made a mistake or introduced a bug. It also allows you to see the difference between different code version. However, committing many small changes may be confusing to other developers, for instance if they come to rely on something you introduce which you later remove again. For this reason, distributed source code control systems use two levels of repositories.

    There is still a top level that is authoritative, but now there is a lower level, typically of local copies, where you can commit your changes and accumulate them until you finally add them to the central repository.

    FIGURE 26.2: Workflow in distributed source code control systems such as Mercurial

    This also makes it easier to contribute to a read-only repository: you make your local changes, and when you are finished you tell the developer to inspect your changes and pull them into the top level repository. This structure is illustrated in figure~ 26.2 .

    26.2 Mercurial (hg) and Git

    crumb trail: > svn > Mercurial (hg) and Git

    Mercurial

    of distributed source code control systems. Many commands are the same as for subversion, but there are some new ones, corresponding to the new level of sophistication. Mercurial and git share some commands, but there are also differences. Git is ultimately more powerful, but mercurial is easier to use at first.

    Here is a translation between the two systems: https://github.com/sympy/sympy/wiki/Git-hg-rosetta-stone .

    Minimal introduction to git: http://rogerdudler.github.io/git-guide/

    This lab should be done two people, to simulate a group of programmers working on a joint project. You can also do this on your own by using two clones of the repository, preferably opening two windows on your computer.

    Best practices for distributed version control: https://homes.cs.washington.edu/~mernst/advice/version-control.html

    26.3 Create and populate a repository

    crumb trail: > svn > Create and populate a repository

    In this section you will create a repository and make a local copy to work on.

    Clone an existing repository
    mercurialgit
    hg clone [ localdir ] git clone [ localdir ]
    Create a completely new repository
    git init \# current directory or add explicit
    git remote add origin git@github.com:Name/Repo.git \# or other URL
    git push --set-upstream origin master \# when you first push

    First we need to have a repository. In practice, you will often use one that has been previously set up, but there are several ways to set up a repository yourself. There are commercial and free hosting services such as http://bitbucket.org . (Academic users can have more private repositories.)

    Let's assume that one student has created a repository your-project on Bitbucket. Both students can then clone it:

    %% hg clone https://YourName@bitbucket.org/YourName/your-project updating to branch default 0 files updated, 0 files merged, 0 files removed, 0 files unresolved

    or

    %% git clone git@bitbucket.org:YourName/yourproject.git Cloning into 'yourproject'... warning: You appear to have cloned an empty repository.

    You now have an empty directory your-project .

    \practical{Go into the project directory and see if it is really empty.}{There is a hidden directory .hg or .git }{}

    26.4 New files

    crumb trail: > svn > New files

    Creating an untracked file

    In this section you will make some simple changes: creating a new file and editing an existing file

    mercurialgit
    hg status [ path] git status [ path ]
    hg add [ files ] git add [ files ]
    once for each fileevery time the file is changed

    One student now makes a file to add to the repository:

    %% cat > firstfile a b c d e f ^D

    (where ^D stands for control-D, which terminates the input.) This file is unknown to hg :

    %% hg status ? firstfile

    Git is a little more verbose:

    git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) firstfile nothing added to commit but untracked files present (use "git add" to track)

    Adding the file to the repository

    FIGURE 26.3: Add local changes to the remote repository

    We need to declare the file as belonging to the repository; a subsequent hg commit command then copies it into the repository.

    %% hg add firstfile %% hg status A firstfile %% hg commit -m "made a first file"

    or

    %% git add firstfile %% git status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: firstfile %% git commit -a -m "adding a first file" [master (root-commit) f4b738c] adding a first file 1 file changed, 5 insertions(+) create mode 100644 firstfile

    mercurialgit
    hg commit -m lt;messagegt; git commit -m lt;messagegt;
    hg push git push origin master

    Unlike with Subversion, the file has now only been copied into the local repository, so that you can, for instance, roll back your changes. If you want this file added to the main repository, you need the hg push command:

    %% hg push https://YourName@bitbucket.org/YourName/your-project pushing to https://YourName@bitbucket.org/YourName/your-project searching for changes remote: adding changesets remote: adding manifests remote: adding file changes remote: added 1 changesets with 1 changes to 1 files remote: bb/acl: YourName is allowed. accepted payload.

    In the push step you were probably asked for your password. You can prevent that by having some lines in your $HOME/.hgrc file: {\footnotesize

    [paths] projectrepo = https://YourName:yourpassword@bitbucket.org/YourName/my-project [ui] username=Your Name <you@somewhere.youruniversity.edu>

    } Now the command hg push projectrepo will push the local changes to the global repository without asking for your password. Of course, now you have a file with a cleartext password, so you should set the permissions of this file correctly.

    With git you need to be more explicit, since the ties between your local copy and the `upstream' repository can be more fluid.

    git remote add origin git@bitbucket.org:YourName/yourrepo.git git push origin master

    FIGURE 26.4: Get changes that were made to the remote repository

    The second student now needs to update their repository. Just like the upload took two commands, this pass also takes two. First you do \n{hg pull} to update your local repository. This does not update the local files you have: for that you need to do hg update .

    \practical{Do this and check that the contents of the file are correct.}{In order to do the update command, you have to be in a checked-out copy of the repository.}

    \practical{Let both students create a new directory with a few files. Declare the directory and commit it. Pull and update to obtain the changes the other mde.}{You can do hg add on the directory, this will also add the files contained in it.}{}

    Since you will mostly be doing an update immediately after a pull, you can combine them:

    hg pull -u

    Git will report what files are updated; for Hg you need to take the changeset number and query:

    hg status --change 82ffb99c79fd

    In order for Mercurial to keep track of your files, you should never do the shell commands cp or mv on files that are in the repository. Instead, do hg cp or hg mv . Likewise, there is a command hg rm .

    26.5 Editing files

    crumb trail: > svn > Editing files

    Mercurial and Git differ in their behavior regarding edited files. Both show them as modified, but

  • Mercurial will automatically include them in a commit, and push them upstream.
  • Git needs an explicit git add each time the file is edited, otherwise it will not become part of a commit. Do not git add -A since this will probably add too much to your repo, such as auto-save and binary files. You can simplify your life by doing git add -u which adds all `updated' files to a commit.

    26.6 Oops! Undo!

    crumb trail: > svn > Oops! Undo!

    One of the reasons for having source code control is to be able to revert changes. The easiest undo is to go back to the last stored version in the repository.

    mercurialgit
    hg revert lt;yourfilegt; git checkout -- lt;yourfilegt;

    26.7 Conflicts

    crumb trail: > svn > Conflicts

    In this section you will learn about how do deal with conflicting edits by two users of the same repository.

    Now let's see what happens when two people edit the same file. Let both students make an edit to firstfile , but one to the top, the other to the bottom. After one student commits the edit, the other can commit changes, after all, these only affect the local repository. However, trying to push that change gives an error:

    %% emacs firstfile # make some change %% hg commit -m ``first again'' %% hg push test abort: push creates new remote head b0d31ea209b3! (you should pull and merge or use push -f to force)

    The solution is to get the other edit, and commit again. This takes a couple of commands: {\small

    %% hg pull myproject searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) %% hg merge merging firstfile 0 files updated, 1 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) %% hg status M firstfile %% hg commit -m ``my edit again'' %% hg push test pushing to https://VictorEijkhout:***@bitbucket.org/ VictorEijkhout/my-project searching for changes remote: adding changesets remote: adding manifests remote: adding file changes remote: added 2 changesets with 2 changes to 1 files remote: bb/acl: VictorEijkhout is allowed. accepted payload.

    }

    This may seem complicated, but you see that Mercurial prompts you for what commands to execute, and the workflow is clear, if you refer to figure  26.2 .

    \practical{Do a cat on the file that both of you have been editing. You should find that both edits are incorporated. That is the `merge' that Mercurial referred to.}{}{}

    If both students make edits on the same part of the file, version control can no longer resolve the conflicts. For instance, let one student insert a line between the first and the second, and let the second student edit the second line. Whoever tries to push second, will get messages like this: {\small

    %% hg pull test added 3 changesets with 3 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) %% hg merge merging firstfile warning: conflicts during merge. merging firstfile incomplete! (edit conflicts, then use 'hg resolve --mark') 0 files updated, 0 files merged, 0 files removed, 1 files unresolved use 'hg resolve' to retry unresolved file merges or 'hg update -C .' to abandon

    }

    For git:

    CONFLICT (content): Merge conflict in <name of some file> Automatic merge failed; fix conflicts and then commit the result. [solutions-mpi-c:955] emacs <name of some file> [solutions-mpi-c:956] git add !$ && git commit -m "fix conflict" && git pull && git push

    There are now the following options:

  • There is usually a way to indicate whether to use the local or the remote version.
  • There are graphical programs to resolve conflicts. They will typically show you 3 colums, for the two versions, and for your resolution. You can then indicate `take this from the local version, and this from the remote'.
  • You can also edit the file to resolve the conflicts yourself. We will discuss that shortly.

    Both will give you several options. It is easiest to resolve the conflict with a text editor. If you open the file that has the conflict you'll see something like:

    <<<<<<< local aa bbbb ======= aaa a2 b >>>>>>> other c

    indicating the difference between the local version (`mine') and the other, that is the version that you pulled and tried to merge. You need to edit the file to resolve the conflict.

    After this, you tell hg that the conflict was resolved:

    hg resolve --mark %% hg status M firstfile ? firstfile.orig

    or

    git add <name of that file> git commit -m "fixed conflict in <name of that file>"

    After this you can commit and push again. The other student then needs to do another update to get the correction.

    For conflicts in binary files, git has a flag where you can indicate which version to use:

    git checkout --ours path/to/file git checkout --theirs path/to/file

    Not all files can be merged: for binary files Mercurial will ask you:

    %% hg merge merging proposal.tex merging summary.tex merking references.tex no tool found to merge proposal.pdf keep (l)ocal or take (o)ther? o

    This means that the only choices are to keep your local version (type  l and hit return) or take the other version (type  o and hit return). In the case of a binary file that was obvious generated automatically, some people argue that they should not be in the repository to begin with.

    26.8 Inspecting the history

    crumb trail: > svn > Inspecting the history

    In this section, you will learn how to view and compare files in the repository.

    If you want to know where you cloned a repository from, look in the file .hg/hgrc .

    The main sources of information about the repository are hg log and hg id . The latter gives you global information, depending on what option you use. For instance, hg id -n gives the local revision number.

  • [ hg log ] gives you a list of all changesets so far, with the comments you entered.
  • [ hg log -v ] tells you what files were affected in each changeset.
  • [ hg log -r 5 ] or hg log -r 6:8 gives information on one or more changesets.

    To see differences in various revisions of individual files, use hg diff . First make sure you are up to date. Now do hg diff firstfile . No output, right? Now make an edit in firstfile and do \n{hg diff firstfile} again. This gives you the difference between the last commited version and the working copy.

    mercurialgit
    hg diff lt;filegt; git diff HEAD lt;filegt;
    hg diff -r A -r B lt;filegt; git diff A..B lt;filegt;

    For example:

    [src:1151] git pull [...] From github.com:Username/Reponame c5aaa43..2f5ce0b main -> origin/main Updating c5aaa43..2f5ce0b Fast-forward src/net.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) [src:1156] git diff c5aaa43..2f5ce0b net.cpp
  • The pull command tells you what the previous and current commit identifiers are; and
  • With diff , specifying these identifiers, you get the changes in diff format.

    Check for yourself what happens when you do a commit but no push, and you issue the above diff command.

    You can also ask for differences between committed versions with hg diff -r 4:6 firstfile .

    The output of this diff command is a bit cryptic, but you can understand it without too much trouble. There are also fancy GUI implementations of hg for every platform that show you differences in a much nicer way.

    If you simply want to see what a file used to look like, do \n{hg cat -r 2 firstfile}. To get a copy of a certain revision of the repository, do hg export -r 3 . ../rev3 , which exports the repository at the current directory (`dot') to the directory ../rev3 .

    If you save the output of hg diff , it is possible to apply it with the Unix patch command. This is a quick way to send patches to someone without them needing to check out the repository.

    26.9 Branching

    crumb trail: > svn > Branching

    Create branch:

    git checkout -b newbranch

    List available:

    git branch

    After that switch:

    git checkout branchname

    git push --set-upstream origin newbranch

    See what branches there are: hg brances

    See what branch you are working on: hg branch

    Switch to a branch (this undoes local changes): hg update -C branchname

    26.10 Pull requests and forks

    crumb trail: > svn > Pull requests and forks

    Suppose you want to contribute changes to a repository, but you don't have write permissions on a repository. (Of course, you need to have read permissions to the repository in order to make the changes.) Lack of write permission means that you can not create a branch, and push and merge it. A pull request is then a way to communicate your changes so that they can be pulled, rather than you pushing them.

  • In github, creating a fork is made with a press of the `Fork' button.
  • Clone your fork.

    git clone <original-url> repo-fork cd repo-fork
  • Make your edits in the cloned fork.
  • You need to keep your fork up to date with the original repo. First you connect the original repo as `upstream' to your fork:

    git remote add upstream <original-url> git remote -v # check on the state of things

    (the name `upstream' is the conventional name; it is not a keyword.)

  • Optionally, check out a branch. If you git status , you should see

    On branch YourBranch Your branch is up to date with 'origin/YourBranch.
  • Keep your fork updated:

    git fetch upstream git merge upstream/master master # or specific branch

    To make pull request:

  • Create a new branch

    git checkout -b YourChangesBranch
  • Make your changes and commit them:

    git add file1 file2 git commit -m "changes summary"
  • Push your changes to the origin, that is, your repo on github:

    git push --set-upstream origin YourChangesBranch

    (remember that you didn't have write permission to the original repo!)

  • You probably now get a message with URL to visit for creating a pull request.

    26.11 Other issues

    crumb trail: > svn > Other issues

    26.11.1 Transport

    crumb trail: > svn > Other issues > Transport

    Mercurial and git can use either ssh or http as transport . With Git you may need to redefine the transport for a push :

    git remote rm origin git remote add origin git@github.com:TACC/pylauncher.git

    You can change the transport with git remote set-url :

    git remote -v # check current transport git remote set-url origin git@hostname:USERNAME/REPOSITORY.git git remote -v # verify changes

  • Back to Table of Contents