What is Git?
Git is…
- A distributed version control system
- Used to allow multiple people to collaborate on the same code
- Also useful for managing your own code and larger projects
Applications for language researchers?
- Coauthoring papers
- Working on R scripts or other code
- Sharing code with others (via github or bitbucket)
- Managing large projects like a dissertation, thesis, or article
What does Git do?
- Git repositories store the history of files contained in it
- Changes to files are tracked and can also be undone
- Also allows for multiple branches which store multiple versions of the same file (production versions vs development versions vs experimental approaches or various draft verions)
- Git stores this information locally but can also save it remotely on github or other platforms
- Mainly accessed through terminal or command line
How do we use Git?
The Basic Local Git Workflow
- Create Git repository
- Add files so that Git tracks them (staging files)
- After done editing commit changes (committing files)
Git config
Basic Git configuration if haven’t done so already
# configure the user which will be used by Git
# this should be not an acronym but your full name
git config --global user.name "Firstname Lastname"
# configure the email address
git config --global user.email "your.email@example.org"
Creating a Git Repository
- Create a folder for the project (or use an existing one)
- Navigate via terminal/command line to that folder
- Then run
git init
cd ~/Desktop/sample_git
git init
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## Reinitialized existing Git repository in /Users/brad_rentz/Documents/UH/rentzb.github.io/content/post/.git/
Check Git Status
- Now that the repository is created we can check the status of the repository
- You can run this at any point too
cd ~/Desktop/sample_git
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## On branch master
##
## No commits yet
##
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .Rhistory
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files/
## _index.md
## ca.Rmd
## ca.html
## ca_files/
## cluster.Rmd
## cluster.html
## cluster_files/
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache/
## intro_bayesian_pdf_files/
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files/
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
##
## nothing added to commit but untracked files present (use "git add" to track)
Staging files
- We have to tell Git which files to track
- We add them with
git add FILENAME
- Add
sample.R
cd ~/Desktop/sample_git
git add sample.R
# too add all R files in directory
# use 'git add *.R'
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## fatal: pathspec 'sample.R' did not match any files
## On branch master
##
## No commits yet
##
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .Rhistory
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files/
## _index.md
## ca.Rmd
## ca.html
## ca_files/
## cluster.Rmd
## cluster.html
## cluster_files/
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache/
## intro_bayesian_pdf_files/
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files/
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
##
## nothing added to commit but untracked files present (use "git add" to track)
Commit files
- Staging files only tell Git to look for change but not to remember them
- Commit tell Git to remember how the files are at that moment (it commits all staged files)
- Use
git commit -m "Commit message"
-m
adds a message argument
cd ~/Desktop/sample_git
git commit -m "First commit"
# the message added should explain the changes
git status
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## On branch master
##
## Initial commit
##
## Untracked files:
## .Rhistory
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files/
## _index.md
## ca.Rmd
## ca.html
## ca_files/
## cluster.Rmd
## cluster.html
## cluster_files/
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache/
## intro_bayesian_pdf_files/
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files/
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
##
## nothing added to commit but untracked files present
## On branch master
##
## No commits yet
##
## Untracked files:
## (use "git add <file>..." to include in what will be committed)
##
## .Rhistory
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files/
## _index.md
## ca.Rmd
## ca.html
## ca_files/
## cluster.Rmd
## cluster.html
## cluster_files/
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache/
## intro_bayesian_pdf_files/
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files/
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
##
## nothing added to commit but untracked files present (use "git add" to track)
Modifying a file
- Let’s change the file to see what happens
- Make some changes to
sample.R
then check thegit status
cd ~/Desktop/sample_git
# changed the sample.R file
git status
- What does the
git status
say?
Commit changes
- After you modify a file, you can commit it again by first doing
git add filename
thengit commit -m "message"
or - You can use
git commit -a -m "message"
-a
tells git to commit all changes to previously staged files
cd ~/Desktop/sample_git
git commit -a -m "made some changes"
Summary (so far)
cd ~/Desktop/sample_git # go to the right folder
git init # create repository (only run first time)
git add *.R # add all .R files (staging)
git commit -a -m "made some changes" # commit the files
Branching
What is branching?
- Branches are different versions of the same code
- Git allows us to create branches instead of creating a completely new folder for changes
- Git manages the changes between version for us and even changes what files appear depending on which branch we are in
- Git can even merge branches back together if needed
Why is branching helpful?
- Branching allows us to have multiple versions of the code each with separate histories so we can undo certain features on
- This is useful if we want to experiment with our code with a new analysis while being able to keep code that we know works
- Can even use this with non-code like text documents
- Can have multiple versions of papers or dissertations (especially if using LaTeX or R markdown)
- Modify an article for different purposes
- Manage comments from committee members or reviewers
Creating new branch
- To create a new branch use
git branch NAME
- Using
git branch
will give you a list of all branches
cd ~/Desktop/sample_git
git branch experimental
git branch
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## fatal: Not a valid object name: 'master'.
Switching between branches
- To switch between branches use
git checkout NAME
- Notice after switching branches the file name change (even in Finder for macOS)
- Note: I added a
sample3.R
to experimental
cd ~/Desktop/sample_git
git checkout master
# see file list before switching branches
ls
git checkout experimental
# see file list afterward
ls
## sh: line 0: cd: /Users/brad_rentz/Desktop/sample_git: No such file or directory
## error: pathspec 'master' did not match any file(s) known to git.
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files
## _index.md
## ca.Rmd
## ca.html
## ca_files
## cluster.Rmd
## cluster.html
## cluster_files
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache
## intro_bayesian_pdf_files
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
## error: pathspec 'experimental' did not match any file(s) known to git.
## 12.jpg
## 2018-02-08-cluster-analysis.Rmd
## 2018-02-08-cluster-analysis.html
## 2018-02-08-cluster-analysis_files
## _index.md
## ca.Rmd
## ca.html
## ca_files
## cluster.Rmd
## cluster.html
## cluster_files
## clustering.html
## demos.csv
## domains.csv
## explore.png
## git.Rmd
## git.html
## intro_bayesian_pdf.Rmd
## intro_bayesian_pdf.html
## intro_bayesian_pdf_cache
## intro_bayesian_pdf_files
## messy_data.csv
## r_markdown_intro.Rmd
## r_markdown_intro.html
## r_markdown_intro_files
## t_d_vot.csv
## tidy_data_handout.Rmd
## tidy_data_handout.html
## waterfall.png
Merging branches
- Sometimes it is helpful to merge branches together
- First checkout into the branch you want to keep (master)
- Then use
git merge NAME
where name is the branch you want to merge into that branch - If there are problematic merges, Git won’t complete the merge. If so use
git mergetool
thengit commit -a -m "message"
cd ~/Desktop/sample_git
git checkout master # switch to branch want to keep
git merge experimental # merge experimental into master
git mergetool # only if it won't merge
git commit -a -m "merged experiemental into master" # only if used diff tool
Deleting Merged Branches
- After a branch is successfully merged into another branch, you can delete it with
git branch -d NAME
-d
will only delete the branch if it has been merged (this is a safe delete)- If you created a branch and you don’t like it and want to delete it without merging use
git branch -D NAME
cd ~/Desktop/sample_git
git branch -d experimental
git branch crazy-idea # create this branch
git branch -D crazy-idea # nevermind, too crazy, delete it
Using Git with Github
Creating github repository
- If you are using your own code, the first step is to make a local git repository, which we did already
- Then on github make a new repository but an empty one
- After creating it look for directions like these and run them in terminal (you need to
cd
to your local git repository first. You may need to enter your github username and password):
Pushing to Github
- After you link your local repository to github, you can send your changes directly by using
git push
- This will only work if there are no other changes by others
cd ~/Desktop/sample_git
git push # send code to linked github repository
# or if you want to specify branches
git push origin experimental # pushes experimental branch
Pulling files
- To get the current version of files on github if they differ from local repository use
git pull
- Git won’t let you
push
something if there are conflicts, so good topull
first, then deal with conflicts, thenpush
. - If conflicts arise use
git diff
to see differences andgit mergetool
to merge
cd ~/Desktop/sample_git
git pull # get code from linked github repository
git diff # use if conflicts
Summary
cd ~/Desktop/sample_git # go to the right folder
git init # create repository (only run first time)
git branch experimental # creates new branch called experimental
git checkout experimental # switches branches
git add *.R # add all .R files (staging)
git commit -a -m "made some changes" # commit the files
git pull # get current remove version
git mergetool # use if need to resolve conflicts
git push # saves to remote repo
Other Git Things
Git Misc
git log
gives you the history of changesgit clone URL name
clones an online git repo to your computer with namename
and sets up a git repo for itgit grep "STRING"
searches through commits for that string.gitignore
is file that includes filetypes to ignore (you can find good ones online for each language) [save them in the top folder for each git repo]
Other Git Resources
Git Tutorials
More Git Resources
- Free book on Git https://git-scm.com/book/en/v2
- Youtube video series https://www.youtube.com/GitHubGuides
- Git command reference guide http://gitref.org/
- Git cheatsheet https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf