Git is Awesome

It's been a while since I've written anything here, and I feel it's time to start up again. With work taking up a lot of my time, I've been much less concerned with this, to be honest, but I feel like I really should use this more.

As anyone reading this probably knows, I love my git. This site gets rebuilt whenever I upload it to my server, which is really nice. I have a script in a small repo that I have that automatically tests the changes you are about to commit before letting you commit if you enable it. It's these kind of things that are so simple and are not even really that special that make me love using git every day.

Things git does that are good

Git does pretty much everything. A lot of people will tell you that git is not an SCM, but instead the backend for an SCM. I would be inclined to agree, but the frontend that has been built on top of that amazing backend is also great. Going to go through a few things that I use a lot and think are essential to using git:

  • Branches. My workflow for a lot of things works really well with git's concept of branches. Whether that was caused by git pushing me into that direction of work, or if it was how I would have worked without git, I don't know, but git does branching Right®. More on this huge thing later on.
  • The staging area/index. This right here is a lifesaver. It helps clean up commits before they have been pushed, lets me do things like adding parts of files then changing the parts that I have staged for commit later if I don't like it. While some people think it's annoying to have to do that then commit, I think it will make your repo cleaner in the long run and help you to write good commits and avoid mistakes.
  • The stash. This kind of goes along the lines of branches, but is very special. If you look at what I did in the system-status script you can see how I used this to clean up my working copy in preparation for tests and such. It's really great when you're doing work and either make some changes that you don't want to commit, but need to work on other features, and when you want to delete your changes while still saving them for some time.
  • The front end. A lot of people complain about this, but I would argue that it is better than a lot of other frontends. It's simpler to work with something else, maybe, but you're not going to be able to do nearly as much as you can with git, even if you don't use the tools in git every day.
  • filter-branch. I don't know a lot of times when this can be used, and I have only used it myself in a trivial manner, but it makes things so nice if you want to split up a repository into smaller pieces or submodules.
  • Being able to rewrite history. Kind of mentioned above, but having the ability to change history without deleting things (thank god for the reflog) is amazing. Because of how git is structured, doing 'destructive' actions will not actually delete things, but lets you change history. Even if you're only going back 5 commits to clean things up before you push and request a pull, it's great to have the ability to do this.

History and Branching

In the last few weeks I was doing some messing around with a remote helper for git, git-remote-hg. It's actually really nice, and after learning about remote helpers I was really happy because it lets me use git even when someone else doesn't see the greatness of it. However, this prompted me to take a deeper look at Mercurial branches and how Mercurial's internals work.

From what I have read and experienced, Mercurial is generally seen as the easier VCS when compared to git. I would disagree based on how Mercurial handles branches and history. First let's discuss history and changing history, and how git outdoes the other VCS's out there.

History

Git has some beautiful internals. You can read about them elsewhere in detail, but suffice to say that the way things are set up is very file-based. Not that there's a file that contains the history, but that there's a file for each 'object,' objects being commits, trees and blobs. This allows changing the history safely without actually deleting things unless you really try.

Mercurial, on the other hand, uses a very different history model. Instead of having each commit get its own object, it appends to a file containing all of the commit data for the repository. In terms of speed, this is probably faster on systems that don't have really fast file creation (I'm looking at you, Windows) but have fast file appending.

Having a history like this makes changing it really hard without reading a large file and having to modify the whole thing. This also makes it a requirement to back up everything before doing things to that history, which I am not sure Mercurial does by default. This means that if you do manage to go change history, you're being extremely destructive.

Having this 'immutable' history adds a very fake sense of security. In both systems it's impossible to silently change the repository history, but in gits you can do so for good. (Note that you can also do terrible things in git, don't get me wrong there)

Another small thing that makes mercurial weird to me is how it represents history. In git, everything is represented by a SHA1, which isn't the best to type, but branches and tags make it really easy to have named places in the history. Mercurial uses local numbers to refer to commits, which in a branching DVCS is not really the greatest thing to do, as everyone is going to have different numbers (they do use checksums as well, but primarily it is numbers.

Branching

In git branches are just pointers to commits. This, to me, makes a whole lot of sense. When you branch something, you're just saying "Instead of showing me commit A show me commit B." The commit is not part of the object, and can be moved around if needed with ease. Looking at branches from a remote repository is also simple, as the branches are just named remote/branchname.

This is completely different in Mercurial. Because of the way that history works in Mercurial, the history is a single file, and files don't just point to other files to show relationships, instead they are written into the commit. This is also where the branch names are stored, in the commit. This means that branches in mercurial don't move, instead branches in mercurial are basically commits in git, just harder and weirder to change. Looking over the docs for Mercurial, the word branch only refers to the act of having more than one commit with the same parent.

It just really seems weird to me to to do something like that in an ecosystem where branches are the main way people develop new features and request pulls.

Note that Mercurial does have something called bookmarks that are similar to git branches which can let you do similar things to git branches.

Things that git does poorly

Git is not perfect. It's really good at a lot of things, but there are flaws that are just inherent to the way git is set up. I recently started reading the git mailing list and a few people have asked how git works with really large files and really large histories. Needless to say, git is going to be slow and unweildy if you start putting 20gb binary files into a repository. Why you would be doing this is a mystery to me, but to each their own. Granted, these files are going to be massive wherever you store them, but it may be easier to work with small parts of the history with other VCS's (but shallow clones can help, but only for a tiny bit).

I don't know everything

This post is just me pointing out things that I see during my time working with git and learning about Mercurial, and I don't know everything so I may be very, very wrong. If so, feel free to tell me!

Comments !