How to use Git and Github

I had taken this course - How to use git and github some time last year. This post is an amalgamation of the course notes and other tutorials I have completed in understanding git. I will talk about the most frequently used commands. If you already are confident of your git skills and wants more of practical tutorial, you should head to this post - git and github for data scientists. Let us get started.

git init

Initialises an empty git repository in the directory. Creates a hidden folder .git in your directory. If you click on the .git folder, you will see many sub-directories and files in side this folder. But, you will hardly need to know what these files are. These folders are the guts of Git where all the magic happens.

git status

Gives you the status of the snapshot of your repository. Suppose you made some changes on your local disk, then this change doesn’t automatically gets reflected on GIT. git status tells you what all were modified since the last commit. It’s healthy to run git status often. Sometimes things change and you don’t notice it. Status of your repository could be one of the following:

staged: Files are ready to be committed.

unstaged: Files with changes that have not been prepared to be committed.

untracked: Files aren’t tracked by Git yet. This usually indicates a newly created file.

deleted: File has been deleted and is waiting to be removed from Git.

In a nutshell, you run git status to see if anything has been modified and/or staged since your last commit so you can decide if you want to commit a new snapshot and what will be recorded in it.

Staging Area: A place where we can group files together before we “commit” them to Git. Once, you add the files to Git using git add, the files come to staging area. Staged files are files we have told git that are ready to be committed. The files listed here are in the Staging area, and they are not in our repository yet.

To store our staged changes we run the commit command with a message describing what we’ve changed. After using git add, make sure to use git status to see what all files are there in the staged area and be sure that you want to commit these files only.

You can unstage files by using the git reset command.

git reset filename.txt

git reset merely unstage the files but these files will still be there in your local. So if you want to get rid of these unstaged files, you will have to use git checkout to the commit after which you added these files.

git — version

Gives you the version of git you are using

Commits

Commits are part of GIT VCS (version control systems) which lets you save meaningful changes of your code. It is because of commit that you will be able to go to any of the previous versions of your code.

A commit is a snapshot of every file in your repository at the time of commit. Commits are Git’s way of saving versions, so to save two different versions, you would create two commits.

Commits with multiple files

You will often work with multiple files and not just one code and these file may or may not be directly related. Any collection of such files is called a repository. Now, GIT keeps track of all the changes made to each of the files and these are carried forward i.e when you save a version of a file using commit you will save a version of all the files in that repository.

Git does not rename files when you save a new commit. Instead, Git uses the commit IDs to refer to different versions of the files, and you can use git checkout to access old versions of your files.

git commit -m “added new feature RMSE to the function”

git checkout

Is sort of like restoring previous version.

Say, the commit ID of the most recent commit is 3884eab839af1e82c44267484cf2945a766081f3. You can use this commit ID to return to the latest commit after checking out an older commit.

Format of git checkout

git checkout 3884eab839af1e82c44267484cf2945a766081f3

How often to commit

Since you can choose when to make a commit, you might be wondering how often to commit your changes. It’s usually a good idea to keep commits small. As the diff between two versions gets bigger, it gets harder to understand and less useful. However, you don’t want to make your commits too small either. If you always save a commit every time you change a line of code, your history will be harder to read since it will have a huge number of commits over a short time period.

A good rule of thumb is to make one commit per logical change. For example, if you fixed a typo, then fixed a bug in a separate part of the file, you should use one commit for each change since they are logically separate. If you do this, each commit will have one purpose that can be easily understood. Git allows you to write a short message explaining what was changed in each commit, and that message will be more useful if each commit has a single logical change.

Reflect: Manual Commits

What do you think are the pros and cons of manually choosing when to create a commit, like you do in Git, vs having versions automatically saved, like Google Docs does?

Manually choosing when to create a commit like in Git:

Pros: You won’t have to wait for some time or lines of code and then that commit will be made. It depends on you and you get to decide whether the code change that you have done deserves or needs to be mentioned in the commit.

Cons: Getting you to decide when to commit is a little subjective and different people will have different answer to it and so the size of commits will be different for different user and if you are not used to it, understanding the differences in the commits can prove to be problematic for you

Having versions automatically saved like Google Docs does:

Pros: You don’t need to worry about manually commiting. You know this gets committed automatically

Cons: Too many commits or too less commits will be generated depending on what settings are there for automatic commits

Branching

When developers are working on a feature or bug they’ll often create a copy (aka. branch) of their code they can make separate commits to. Then when they’re done they can merge this branch back into their main master branch.

Branches are what naturally happens when you want to work on multiple features at the same time. You wouldn’t want to end up with a master branch which has Feature A half done and Feature B half done.

Rather you’d separate the code base into two “snapshots” (branches) and work on and commit to them separately. As soon as one was ready, you might merge this branch back into the master branch and push it to the remote server.

Remove all the things!

git rm command will not only remove the actual files from disk, but will also stage the removal of the files for us. Now that you’ve removed all the required files you’ll need to commit your changes. Feel free to run git status to check the changes you’re about to commit.

Removing one file is great and all, but what if you want to remove an entire folder? You can use the recursive option on git. This will recursively remove all folders and files from the given directory.

git rm -r folder_of_cats

git log Think of Git’s log as a journal that remembers all the changes we’ve committed so far, in the order we committed them. Gives you the list of all commits that have been made to the code. It gives you the commit number and an associated message that was added to the commit.

Exiting git log: To stop viewing git log output, press q (which stands for quit).

Getting Colored Output: To get colored diff output, run git config — global color.ui auto

git log — stat

Gives you the statistics of all the commits that has been made, with information like which file changed and whether lines were added or deleted for each commit.

git push The push command tells Git where to put our commits when we’re ready.

git push -u origin master

git push

The -u tells Git to remember the parameters, so that next time we can simply run git push and Git will know what to do.

Pulling Remotely

Let’s pretend some time has passed. We’ve invited other people to our GitHub project who have pulled your changes, made their own commits, and pushed them. We can check for changes on our GitHub repository and pull down any new changes by running this command.

git pull origin master

HEAD

The HEAD is a pointer that holds your position within all your different commits. By default HEAD points to your most recent commit.

git diff Gives you the difference between the commits that you have performed in your code. If you want to understand the differences between 2 commits (say comNo.1 and comNo.2) you just do this:

git diff comNo.1 comNo.2

git diff

Without any extra arguments, a simple git diff will display in unified diff format (a patch) what code or content you’ve changed in your project since the last commit that are not yet staged for the next commit snapshot.

So where git status will show you what files have changed and/or been staged since your last commit, git diff will show you what those changes actually are, line by line. It’s generally a good follow-up command to git status

What is a README?

Many projects contain a file named “README” that gives a general description of what the project does and how to use it. It’s often a good idea to read this file before doing anything with the project, so the file is given this name to make users more likely to read it.

Cloning a Repository

There is a difference between downloading and cloning a repository. When you clone a repository, you don’t just download the files i.e. the latest commit file but the entire commit history as well. To clone a repository, run git clone followed by a space and the repository URL.

git clone

Example: Use the following url to clone the Asteroids repository: https://github.com/udacity/asteroids.git

git clone https://github.com/udacity/asteroids.git

Git Errors and Warnings Solution

Should not be doing an octopus

Octopus is a strategy Git uses to combine many different versions of code together. This message can appear if you try to use this strategy in an inappropriate situation.

You are in ‘detached HEAD’ state

HEAD is what Git calls the commit you are currently on. You can “detach” the HEAD by switching to a previous commit. Despite what it sounds like, it’s actually not a bad thing to detach the HEAD. Git just warns you so that you’ll realize you’re doing it.

Panic! (the ‘impossible’ happened) This is a real error message, but it’s not output by Git. Instead it’s output by GHC, the compiler for a programming language called Haskell. It’s reserved for particularly surprising errors!

Takeaway I hope these errors and warnings amused you as much as they amused me! Now you know what kind of errors Git can throw.

Git command review

Compare two commits, printing each line that is present in one commit but not the other.

git diff will do this. It takes two arguments — the two commit ids to compare.

Make a copy of an entire Git repository, including the history, onto your own computer.

git clone will do this. It takes one argument — the url of the repository to copy.

Temporarily reset all files in a directory to their state at the time of a specific commit.

git checkout will do this. It takes one argument — the commit ID to restore.

Show the commits made in this repository, starting with the most recent.

git log will do this. It doesn’t take any arguments.

Behavior of git clone

If someone else gives you the location of their directory or repository, you can copy or clone it to your own computer. This is true for both copying a directory and cloning a repository.

If you have a URL to a repository, you can copy it to your computer using git clone. For copying a directory, you weren’t expected to know this, but it is possible to copy a directory from one computer to another using the command scp, which stands for “secure copy”. The name was chosen because the scp command lets you securely copy a directory from one computer to another.

The history of changes to the directory or repository is copied.

This is true for cloning a repository, but not for copying a directory. The main reason to use git clone rather than copying the directory is because git clone will also copy the commit history of the repository. However, copying can be done on any directory, whereas git clone only works on a Git repository.

If you make changes to the copied directory or cloned repository, the original will not change.

This is true for both copying a directory and cloning a repository. In both cases, you’re making a copy that you can alter without changing the original.

- The state of every file in the directory or repository is copied.

This is true for both copying a directory and cloning a repository. In both cases, all the files are copied.

Advertiser Disclosure: This post contains affiliate links, which means I receive a commission if you make a purchase using this link. Your purchase helps support my work.

Manish Barnwal

...just another human