A brief introduction to
version control systems
Tim Staley
Astronomy Group Monday Seminar
Southampton, November 2013
WWW: timstaley.co.uk
The problem No backup Manual copies Centralised VCS Distributed VCS
Aims
Help identify problem that can be
solved.
Introduce basic concepts of version
control.
Explain why various technologies
exist, and which you should choose.
The problem No backup Manual copies Centralised VCS Distributed VCS
The problem No backup Manual copies Centralised VCS Distributed VCS
When you need version
control
Complex documents, built up over
time.
Multiple collaborators (or even just
multiple machines).
Multiple versions which ‘co-evolve.’
Reproducibility (‘snapshots’).
The problem No backup Manual copies Centralised VCS Distributed VCS
Four Evolutionary Stages
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage 0: Not backing up
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage 0: Not backing up
DON’T DO THIS
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage 1: Manual copies
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage 1: Manual copies
Flaws:
Manual = fallible.
Backup: Copies of copies.
Labelling.
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage 1: Manual copies
Flaws:
Manual = fallible.
Backup: Copies of copies.
Labelling.
We need metadata - datestamps,
annotations, attribution.
And tools - make this stuff quick and
easy!
The problem No backup Manual copies Centralised VCS Distributed VCS
Aside: ‘Cloudy’ technologies
Trade off — convenience vs control.
Good for:
Small docs, frequently updated across
multiple locations (e.g. to-do list).
Basic backups of items unlikely to
evolve (photos, etc).
The problem No backup Manual copies Centralised VCS Distributed VCS
Aside: ‘Cloudy’ technologies
Problems:
Versioning is all automated - can’t
choose sensible ‘checkpoints’ to mark
out.
Collaboration is still broken, unless
you’re working on very simple docs.
The problem No backup Manual copies Centralised VCS Distributed VCS
Aside: ‘Cloudy’ technologies
Problems:
Versioning is all automated - can’t
choose sensible ‘checkpoints’ to mark
out.
Collaboration is still broken, unless
you’re working on very simple docs.
NEED MORE METADATA
The problem No backup Manual copies Centralised VCS Distributed VCS
Stage Two
The problem No backup Manual copies Centralised VCS Distributed VCS
Centralised version control
e.g.
‘Concurrent Versions System’ (CVS,
now defunct).
‘Subversion’ (SVN).
The problem No backup Manual copies Centralised VCS Distributed VCS
Basic concepts, 1
Record an annotated history of change
sets.
Trunk, branch
Parents, ancestors
The problem No backup Manual copies Centralised VCS Distributed VCS
Basic concepts, 2
Centralized ⇔ Master copy
Repository
Checkout
Commit / Revision
The problem No backup Manual copies Centralised VCS Distributed VCS
Basic concepts, 3
Merging
In simple cases, merges are automatic!
Tree-records allows us to build the new
combined version.
The problem No backup Manual copies Centralised VCS Distributed VCS
Basic concepts, 3
Manual merging: When conflicts exist,
we have the info and tools to manually
resolve them.
The problem No backup Manual copies Centralised VCS Distributed VCS
Distributed VCS
1986 – early 2000’s: Why would you
make this any more complex? This works.
The problem No backup Manual copies Centralised VCS Distributed VCS
Distributed VCS
1986 – early 2000’s: Why would you
make this any more complex? This works.
INTERWEBS
The problem No backup Manual copies Centralised VCS Distributed VCS
Distributed VCS
1986 – early 2000’s: Why would you
make this any more complex? This works.
INTERWEBS
(See e.g. visualised history of Python,
https://www.youtube.com/watch?v=
cNBtDstOTmA)
The problem No backup Manual copies Centralised VCS Distributed VCS
Centralised doesn’t scale
Many collaborators.
Cannot check-in half-finished work to
master.
Cannot keep track of a branch for
every collaborator.
The problem No backup Manual copies Centralised VCS Distributed VCS
Centralised doesn’t scale
Many collaborators.
Cannot check-in half-finished work to
master.
Cannot keep track of a branch for
every collaborator.
Resort back to hybrid of central copy
under version control, with many
local, manual backups for
intermediate work.
The problem No backup Manual copies Centralised VCS Distributed VCS
The distributed model
Stage 3: Distribute!
Everyone has their own mirror, or
clone of the repository.
Changes are distributed via pushes
and pulls.
The problem No backup Manual copies Centralised VCS Distributed VCS
Distribute!
Benefits for you:
More flexible. Allows different
workflows and collaborative behaviour
etc.
Can commit offline, sync later.
Talk to me later if you want the details.
The problem No backup Manual copies Centralised VCS Distributed VCS
So which should I use?
The problem No backup Manual copies Centralised VCS Distributed VCS
So which should I use?
At this stage, git and mercurial are
functionally equivalent — but git has won
the majority mindshare, therefore: better
support, better chance of collaborators
using same system, etc.
The problem No backup Manual copies Centralised VCS Distributed VCS
Summary
Version control helps with:
Backups
Reproducibility
Comparing arbitrary historical versions.
Maintaining multiple live versions.
Lots of free services and material
online to help you out.
Bit of a learning curve at first - but
payoff is large in long-run. (And now
you have a headstart!)
The problem No backup Manual copies Centralised VCS Distributed VCS
Advanced Reading
To start, google ‘git intro’, etc. Then. . .
Git for Computer Scientists
http://eagain.net/articles/
git-for-computer-scientists/
Understanding Git Conceptually
http://www.sbf5.com/~cduan/
technical/git/
Understanding the Git Workflow
https://sandofsky.com/blog/
git-workflow.html

A brief introduction to version control systems

  • 1.
    A brief introductionto version control systems Tim Staley Astronomy Group Monday Seminar Southampton, November 2013 WWW: timstaley.co.uk
  • 2.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Aims Help identify problem that can be solved. Introduce basic concepts of version control. Explain why various technologies exist, and which you should choose.
  • 3.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS
  • 4.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS When you need version control Complex documents, built up over time. Multiple collaborators (or even just multiple machines). Multiple versions which ‘co-evolve.’ Reproducibility (‘snapshots’).
  • 5.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Four Evolutionary Stages
  • 6.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage 0: Not backing up
  • 7.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage 0: Not backing up DON’T DO THIS
  • 8.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage 1: Manual copies
  • 9.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage 1: Manual copies Flaws: Manual = fallible. Backup: Copies of copies. Labelling.
  • 10.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage 1: Manual copies Flaws: Manual = fallible. Backup: Copies of copies. Labelling. We need metadata - datestamps, annotations, attribution. And tools - make this stuff quick and easy!
  • 11.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Aside: ‘Cloudy’ technologies Trade off — convenience vs control. Good for: Small docs, frequently updated across multiple locations (e.g. to-do list). Basic backups of items unlikely to evolve (photos, etc).
  • 12.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Aside: ‘Cloudy’ technologies Problems: Versioning is all automated - can’t choose sensible ‘checkpoints’ to mark out. Collaboration is still broken, unless you’re working on very simple docs.
  • 13.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Aside: ‘Cloudy’ technologies Problems: Versioning is all automated - can’t choose sensible ‘checkpoints’ to mark out. Collaboration is still broken, unless you’re working on very simple docs. NEED MORE METADATA
  • 14.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Stage Two
  • 15.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Centralised version control e.g. ‘Concurrent Versions System’ (CVS, now defunct). ‘Subversion’ (SVN).
  • 16.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Basic concepts, 1 Record an annotated history of change sets. Trunk, branch Parents, ancestors
  • 17.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Basic concepts, 2 Centralized ⇔ Master copy Repository Checkout Commit / Revision
  • 18.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Basic concepts, 3 Merging In simple cases, merges are automatic! Tree-records allows us to build the new combined version.
  • 19.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Basic concepts, 3 Manual merging: When conflicts exist, we have the info and tools to manually resolve them.
  • 20.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Distributed VCS 1986 – early 2000’s: Why would you make this any more complex? This works.
  • 21.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Distributed VCS 1986 – early 2000’s: Why would you make this any more complex? This works. INTERWEBS
  • 22.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Distributed VCS 1986 – early 2000’s: Why would you make this any more complex? This works. INTERWEBS (See e.g. visualised history of Python, https://www.youtube.com/watch?v= cNBtDstOTmA)
  • 23.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Centralised doesn’t scale Many collaborators. Cannot check-in half-finished work to master. Cannot keep track of a branch for every collaborator.
  • 24.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Centralised doesn’t scale Many collaborators. Cannot check-in half-finished work to master. Cannot keep track of a branch for every collaborator. Resort back to hybrid of central copy under version control, with many local, manual backups for intermediate work.
  • 25.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS The distributed model Stage 3: Distribute! Everyone has their own mirror, or clone of the repository. Changes are distributed via pushes and pulls.
  • 26.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Distribute! Benefits for you: More flexible. Allows different workflows and collaborative behaviour etc. Can commit offline, sync later. Talk to me later if you want the details.
  • 27.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS So which should I use?
  • 28.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS So which should I use? At this stage, git and mercurial are functionally equivalent — but git has won the majority mindshare, therefore: better support, better chance of collaborators using same system, etc.
  • 29.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Summary Version control helps with: Backups Reproducibility Comparing arbitrary historical versions. Maintaining multiple live versions. Lots of free services and material online to help you out. Bit of a learning curve at first - but payoff is large in long-run. (And now you have a headstart!)
  • 30.
    The problem Nobackup Manual copies Centralised VCS Distributed VCS Advanced Reading To start, google ‘git intro’, etc. Then. . . Git for Computer Scientists http://eagain.net/articles/ git-for-computer-scientists/ Understanding Git Conceptually http://www.sbf5.com/~cduan/ technical/git/ Understanding the Git Workflow https://sandofsky.com/blog/ git-workflow.html