Friday, 30 September 2011

Setting up and using a Data Repository

In this world dominated by programming and engineering it is essential to work upon projects simultaneously in a team. Data repositories(data banks) help the developers achieve this. All the members of a repository can see the data in it, work upon it and sync it.

In a repository, if a developer wants to make some improvements to the latest code all he needs to do is to:

  1. Use an "update" command so as to get the latest version of the files to work upon from the repository.
  2. Work on the code.
  3. Use "diff: command to see what all changes you made since last time you updated in case you get lost somewhere in the middle of writing the code.
  4. If satisfied "commit" the changes back to the repository, making his version the most recent one.
  5. Otherwise you can "Revert" back to the older version of the file. Or maybe not sync the changes at all.

If this has to be done without a repository then emails are to be exchanged with every file and their versions being handled manually which would be a lot of mess.

What is VCS?

Version Control Systems(VCS) are web applications that manage multiple versions(revisions) of same program or information online. It allows developers to work together on the same piece of code from anywhere in the world through online repositories.
Two types of VCS that this blog would be covering are :
  1. Central VCS - In this case all the data is stored on a server from where users can fetch data, work on it and then upload the changes back.The Central VCS covered by us will be Subversion(SVN)
  2. Distributed(decentralised) VCS - In this case the data is with the developers only and the concept of peer-to-peer communication is used(the same way torrents work!). Clients make changes in their own local repositories and then sync with the other users later. The distributed VCS we are going to talk about will be GIT.

Advantages Of SVN
  1. SVN is much simpler to learn. If you know how to create, commit then you are ready to work using a repository. Moreover SVN has a more organised and to the point help module
  2. Git doesn't allow users to check out a sub directory. The user has to check the whole repository. In SVN checkouts at sub directory level are possible.
  3. “Distributed Version Control System” sounds like some alternative to BitTorrent and I don’t support pirating!
  4. Even if errors occur in SVN they are very easy to debug due to the great user interface and descriptions of all the errors.
  5. You can lock files on SVN so as to forbid another user to change the file you are working on.

Advantages Of GIT
  1. With SVN the users cannot work offline i.e. they have to be connected to online repository to be able to commit(save) their work. GIT on the other hand allows to work and commit to a local repository.
  2. It is also much faster than SVN as the copy of data is stored on the machine the user is working on. Moreover there is no network response time.
  3. Branching and merging support is better. Branching is just like creating a separate module and working on it. If it works then all the user has to do is merge it, otherwise he can just remove the branch.
  4. There is less chance of data getting lost as data copies are present on multiple workstations. The number of backups available is the same as the number of users. In SVN data loss results in a complete crash.
  5. Git keeps a track of contents so even a small change is tracked. Where as SVN just keeps a record of the file and the meta data associated with it.

How to set up a Data Repository

A data repository requires space to host, which is provided by Google Code free of cost.

1) Go to, Log in with your Google Account (For some reason, IIIT accounts do not work) and click on create a project.

2) Give the project a name, a description and choose the version control system you wish to use. We chose SVN. Also give the project appropriate labels and click on create project.

Creating a Project

3) Congratulations! Your repository has been set up! You'll be directed to your project home. Before we are done here, we just need to retrieve the auto-generated password which we will use to login. Goto settings, and you'll be able to see your password there.

You can view your code files by going to the "Source" tab.

Viewing the files


We have now setup our repositories. But, we will be coding on our computers itself locally, right? We need tools to let us sync our code with the Data Repositories we have just set up. TortoiseSVN is a popular SVN client for windows and RabbitVCS is its Ubuntu counterpart.


TortoiseSVN acts as shell extension and you can just right click in any directory to access all SVN related functions. You can commit, update, use diff, branch, merge etc all from a context menu, which makes it really simple to use. You have to first checkout to specify a local directory which is to be synced.

First checkout to sync all files for the first time:

Checkout in TortoiseSVN

Once you have checked out, you can now see all the options in the context menu:

Context Menu Options

I have edited some files. Now I'll commit my changes to the repository. Right click, select Commit and a dialog box pops up. Here, choose the files which you wish to commit:

SVN Commit Options

Click on okay. Now to ensure identity, it will prompt you for a username and password. Enter your google username and the password you got from google code. The commit will soon be complete, and voila, your files are now on the server for other developers to view (and edit)!

Commit Succesful

Similarly, you can use the diff feature to check for differences between your local version and version on the repository and you can use the branch feature to create another branch (Create a copy of the Directory)

Branching Succesful


Although Ubuntu has an inbuilt SVN command line based client, but many people often get intimidated by text-based UIs. RabbitVCS is a simple to use VCS client for Ubuntu which also offers shell integration (for both Nautilius and Thunar) like TortoiseSVN.

You can create a repository in your folder by choosing Create repository here from the context menu:

You can also perform other basic functions like checkout:

Checkout settings

Checkout Succesful

and use diff to check for modifications:

Checking for modifications

or use update to download new files from the repository:

Using Update function to get files from the Repository

Personally, we preferred TortoiseSVN in Windows since RabbitVCS was a bit unstable and often crashed. TortoiseSVN was fast, efficient and very easy to use.

Learning Experience

Learning about data repositories was fun because of the fact that we were constantly thinking about how simple it would have made our lives while making group projects if we knew it before hand. Learning something new on your own is a challenge but once you are done with it, you feel awesome.

Thank you, hope you enjoyed this blog post. Feel free to ask any questions in the comments section below.

-Romil Bhardwaj (2011092) and Prakhar Gupta (2011074)

No comments:

Post a Comment