Intro to Git

What is Git?

Git is…

  • A distributed version control system
  • Used to allow multiple people to collaborate on the same code
  • Also useful for managing your own code and larger projects

Applications for language researchers?

  • Coauthoring papers
  • Working on R scripts or other code
  • Sharing code with others (via github or bitbucket)
  • Managing large projects like a dissertation, thesis, or article

What does Git do?

  • Git repositories store the history of files contained in it
  • Changes to files are tracked and can also be undone
  • Also allows for multiple branches which store multiple versions of the same file (production versions vs development versions vs experimental approaches or various draft verions)
  • Git stores this information locally but can also save it remotely on github or other platforms
  • Mainly accessed through terminal or command line

How do we use Git?

The Basic Local Git Workflow

  1. Create Git repository
  2. Add files so that Git tracks them (staging files)
  3. After done editing commit changes (committing files)

Git config

Basic Git configuration if haven’t done so already

# configure the user which will be used by Git
# this should be not an acronym but your full name
git config --global user.name "Firstname Lastname"

# configure the email address
git config --global user.email "your.email@example.org"

Creating a Git Repository

  • Create a folder for the project (or use an existing one)
  • Navigate via terminal/command line to that folder
  • Then run git init
cd ~/Desktop/sample_git
git init
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## Reinitialized existing Git repository in /Users/brad_rentz/Documents/UH/rentzb.github.io/content/post/.git/

Check Git Status

  • Now that the repository is created we can check the status of the repository
  • You can run this at any point too
cd ~/Desktop/sample_git
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## On branch master
## 
## No commits yet
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  .Rhistory
##  12.jpg
##  2018-02-08-cluster-analysis.Rmd
##  2018-02-08-cluster-analysis.html
##  2018-02-08-cluster-analysis_files/
##  _index.md
##  ca.Rmd
##  ca.html
##  ca_files/
##  cluster.Rmd
##  cluster.html
##  cluster_files/
##  clustering.html
##  demos.csv
##  domains.csv
##  explore.png
##  git.Rmd
##  git.html
##  intro_bayesian_pdf.Rmd
##  intro_bayesian_pdf.html
##  intro_bayesian_pdf_cache/
##  intro_bayesian_pdf_files/
##  messy_data.csv
##  r_markdown_intro.Rmd
##  r_markdown_intro.html
##  r_markdown_intro_files/
##  t_d_vot.csv
##  tidy_data_handout.Rmd
##  tidy_data_handout.html
##  waterfall.png
## 
## nothing added to commit but untracked files present (use "git add" to track)

Staging files

  • We have to tell Git which files to track
  • We add them with git add FILENAME
  • Add sample.R
cd ~/Desktop/sample_git
git add sample.R
# too add all R files in directory
# use 'git add *.R'
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## fatal: pathspec 'sample.R' did not match any files
## On branch master
## 
## No commits yet
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  .Rhistory
##  12.jpg
##  2018-02-08-cluster-analysis.Rmd
##  2018-02-08-cluster-analysis.html
##  2018-02-08-cluster-analysis_files/
##  _index.md
##  ca.Rmd
##  ca.html
##  ca_files/
##  cluster.Rmd
##  cluster.html
##  cluster_files/
##  clustering.html
##  demos.csv
##  domains.csv
##  explore.png
##  git.Rmd
##  git.html
##  intro_bayesian_pdf.Rmd
##  intro_bayesian_pdf.html
##  intro_bayesian_pdf_cache/
##  intro_bayesian_pdf_files/
##  messy_data.csv
##  r_markdown_intro.Rmd
##  r_markdown_intro.html
##  r_markdown_intro_files/
##  t_d_vot.csv
##  tidy_data_handout.Rmd
##  tidy_data_handout.html
##  waterfall.png
## 
## nothing added to commit but untracked files present (use "git add" to track)

Commit files

  • Staging files only tell Git to look for change but not to remember them
  • Commit tell Git to remember how the files are at that moment (it commits all staged files)
  • Use git commit -m "Commit message"
  • -m adds a message argument
cd ~/Desktop/sample_git
git commit -m "First commit"
# the message added should explain the changes
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## On branch master
## 
## Initial commit
## 
## Untracked files:
##  .Rhistory
##  12.jpg
##  2018-02-08-cluster-analysis.Rmd
##  2018-02-08-cluster-analysis.html
##  2018-02-08-cluster-analysis_files/
##  _index.md
##  ca.Rmd
##  ca.html
##  ca_files/
##  cluster.Rmd
##  cluster.html
##  cluster_files/
##  clustering.html
##  demos.csv
##  domains.csv
##  explore.png
##  git.Rmd
##  git.html
##  intro_bayesian_pdf.Rmd
##  intro_bayesian_pdf.html
##  intro_bayesian_pdf_cache/
##  intro_bayesian_pdf_files/
##  messy_data.csv
##  r_markdown_intro.Rmd
##  r_markdown_intro.html
##  r_markdown_intro_files/
##  t_d_vot.csv
##  tidy_data_handout.Rmd
##  tidy_data_handout.html
##  waterfall.png
## 
## nothing added to commit but untracked files present
## On branch master
## 
## No commits yet
## 
## Untracked files:
##   (use "git add <file>..." to include in what will be committed)
## 
##  .Rhistory
##  12.jpg
##  2018-02-08-cluster-analysis.Rmd
##  2018-02-08-cluster-analysis.html
##  2018-02-08-cluster-analysis_files/
##  _index.md
##  ca.Rmd
##  ca.html
##  ca_files/
##  cluster.Rmd
##  cluster.html
##  cluster_files/
##  clustering.html
##  demos.csv
##  domains.csv
##  explore.png
##  git.Rmd
##  git.html
##  intro_bayesian_pdf.Rmd
##  intro_bayesian_pdf.html
##  intro_bayesian_pdf_cache/
##  intro_bayesian_pdf_files/
##  messy_data.csv
##  r_markdown_intro.Rmd
##  r_markdown_intro.html
##  r_markdown_intro_files/
##  t_d_vot.csv
##  tidy_data_handout.Rmd
##  tidy_data_handout.html
##  waterfall.png
## 
## nothing added to commit but untracked files present (use "git add" to track)

Modifying a file

  • Let’s change the file to see what happens
  • Make some changes to sample.R then check the git status
cd ~/Desktop/sample_git
# changed the sample.R file
git status
  • What does the git status say?

Commit changes

  • After you modify a file, you can commit it again by first doing git add filename then git commit -m "message" or
  • You can use git commit -a -m "message"
  • -a tells git to commit all changes to previously staged files
cd ~/Desktop/sample_git
git commit -a -m "made some changes"

Summary (so far)

cd ~/Desktop/sample_git # go to the right folder
git init # create repository (only run first time)
git add *.R # add all .R files (staging)
git commit -a -m "made some changes" # commit the files

Branching

What is branching?

  • Branches are different versions of the same code
  • Git allows us to create branches instead of creating a completely new folder for changes
  • Git manages the changes between version for us and even changes what files appear depending on which branch we are in
  • Git can even merge branches back together if needed

Why is branching helpful?

  • Branching allows us to have multiple versions of the code each with separate histories so we can undo certain features on
  • This is useful if we want to experiment with our code with a new analysis while being able to keep code that we know works
  • Can even use this with non-code like text documents
  • Can have multiple versions of papers or dissertations (especially if using LaTeX or R markdown)
    • Modify an article for different purposes
    • Manage comments from committee members or reviewers

Creating new branch

  • To create a new branch use git branch NAME
  • Using git branch will give you a list of all branches
cd ~/Desktop/sample_git
git branch experimental
git branch
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## fatal: Not a valid object name: 'master'.

Switching between branches

  • To switch between branches use git checkout NAME
  • Notice after switching branches the file name change (even in Finder for macOS)
  • Note: I added a sample3.R to experimental
cd ~/Desktop/sample_git
git checkout master
# see file list before switching branches
ls 
git checkout experimental
# see file list afterward
ls
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## error: pathspec 'master' did not match any file(s) known to git.
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files
## _index.md
## ca.Rmd
## ca.html
## ca_files
## cluster.Rmd
## cluster.html
## cluster_files
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache
## intro_bayesian_pdf_files
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
## error: pathspec 'experimental' did not match any file(s) known to git.
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files
## _index.md
## ca.Rmd
## ca.html
## ca_files
## cluster.Rmd
## cluster.html
## cluster_files
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache
## intro_bayesian_pdf_files
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png

Merging branches

  • Sometimes it is helpful to merge branches together
  • First checkout into the branch you want to keep (master)
  • Then use git merge NAME where name is the branch you want to merge into that branch
  • If there are problematic merges, Git won’t complete the merge. If so use git mergetool then git commit -a -m "message"
cd ~/Desktop/sample_git
git checkout master # switch to branch want to keep
git merge experimental # merge experimental into master
git mergetool # only if it won't merge
git commit -a -m "merged experiemental into master" # only if used diff tool

Deleting Merged Branches

  • After a branch is successfully merged into another branch, you can delete it with git branch -d NAME
  • -d will only delete the branch if it has been merged (this is a safe delete)
  • If you created a branch and you don’t like it and want to delete it without merging use git branch -D NAME
cd ~/Desktop/sample_git
git branch -d experimental
git branch crazy-idea # create this branch
git branch -D crazy-idea # nevermind, too crazy, delete it

Using Git with Github

Creating github repository

  • If you are using your own code, the first step is to make a local git repository, which we did already
  • Then on github make a new repository but an empty one
  • After creating it look for directions like these and run them in terminal (you need to cd to your local git repository first. You may need to enter your github username and password):

Pushing to Github

  • After you link your local repository to github, you can send your changes directly by using git push
  • This will only work if there are no other changes by others
cd ~/Desktop/sample_git
git push # send code to linked github repository
# or if you want to specify branches
git push origin experimental # pushes experimental branch

Pulling files

  • To get the current version of files on github if they differ from local repository use git pull
  • Git won’t let you push something if there are conflicts, so good to pull first, then deal with conflicts, then push.
  • If conflicts arise use git diff to see differences and git mergetool to merge
cd ~/Desktop/sample_git
git pull # get code from linked github repository
git diff # use if conflicts

Summary

cd ~/Desktop/sample_git # go to the right folder
git init # create repository (only run first time)
git branch experimental # creates new branch called experimental
git checkout experimental # switches branches
git add *.R # add all .R files (staging)
git commit -a -m "made some changes" # commit the files
git pull # get current remove version
git mergetool # use if need to resolve conflicts
git push # saves to remote repo

Other Git Things

Git Misc

  • git log gives you the history of changes
  • git clone URL name clones an online git repo to your computer with name name and sets up a git repo for it
  • git grep "STRING" searches through commits for that string
  • .gitignore is file that includes filetypes to ignore (you can find good ones online for each language) [save them in the top folder for each git repo]

Related