Sunday, 2 October 2011

Setting Up A Data Repository

Our project was setting up a data repository in Linux and Windows.

So what is a data repository?

Data Repository

Data Repository refers to a central place where data is stored and maintained. A repository can be a place where multiple databases or files are located for distribution over a network, or a repository can be a location that is directly accessible to the user without having to travel across a network.

For example a number of programmers are working on a code. In the repository one of programmers can upload a code, others can download it commit changes to it and upload the latest version of the code on the repository.

Advantages Of A Data Repository

1. Helps the data modelers to work on the same data model consistently/collaboratively and merge all work activities in the same data model itself.

2.Helps in creating different Versions of the data model to keep track of the changes.

3.Applying security to the data model.

4.Back up and recovery of the data models.

Features Of A Data Repository


It is used to obtain a working copy from the repository for the local use.


Committing basically means sending the changes you make to your working copy to the server.


One of the most common requirement in project development is to see what has changed. You might want to look at the differences between two revisions of the same file, or the differences between two separate files. For this we have the diff option.


Periodically, you should ensure that changes done by others get incorporated in your local working copy. The process of getting changes from the server to your local copy is known as updating. Updating may be done on single files, a set of selected files, or recursively on entire directory hierarchies .

Difference between SVN and GIT

Before we discuss the difference between SVN and GIT,lets see what exactly they are.

SVN stands for subversion.It is a free/open source version control system initiated in 1999 by CollabNet Inc.

It is used to maintain current and historical versions of file such as source code web pages and documentation.

GIT is a distributed revision control system with an emphasis on speed.

GIT is distributed, SVN is not:

This is by far the *core* difference between GIT and other non-distributed version control systems like SVN, CVS etc. If you can catch this concept well, then you have crossed half the bridge.

GIT like SVN do have centralized repository or server. But, GIT is more intended to be used in distributed mode which means, every developers checking out code from central repository/server will have their own cloned repository installed on their machine

GIT stores content as metadata, SVN stores just files:

Every source control systems stores the metadata of files in hidden folders like .svn, .cvs etc whereas GIT stores entire content inside the .git folder. If you compare the size of .git folder with .svn, you will notice a big difference. So, the .git folder is the cloned repository in your machine, it has everything that the central repository has like tags, branches, version histories etc.

GIT branches are not the same as SVN branches:

Branches in SVN are nothing but just another folder in the repository whereas working with branches is much more easier and fun in GIT. You can quickly switch between branches from the same working directory.

GIT does not have a global revision number like SVN does:

This is one of the biggest advantages of SVN over GIT.

Setting up Data Repository In Linux

We set up a data repository in linux based on client server model using google code as the server and svn command line as client utility.

First of all we created a project on google code which was to serve as server.

We created the project named sm-dr.Link for the project is

To create a project log on and fill the details.

The following screenshots show how to create the project and created project.

After creating the project on google code the next step was installing subversion and uploading our code on googlecode project via subversion command line.

First of all we installed subersion using the command sudo apt-get install subversion libapache2-svn.

Now go to the source tab on google project and there are two checkout commands .Use the command for project members and type it on the terminal. svn checkout sm-dr --username
Output says checked out revision one.

Checkout command creates a local copy for us to edit.

Project members authenticate over HTTPS to allow committing changes by this command.

Now create a directory where the code which has to be uploaded is to be stored.

The directory we created was sm-drp.

Now use the command

Svn checkout sm-dr --username

This created directories within the created directory.

Now to commit files to the repository.

Create the file to be uploaded in the trunk directory which is in sm-drp.

Use the commands svn add filename which adds file locally and to commit changes to the main project we use svn ci filename.

Following screenshots show the process.

Now if some other user commits some change to the code in the repository and you want the change to be implemented in your local copy as well you can use the update command.

svn update filename

The depository had some changes in the file which were updated in our local copy.

Now we want to see how different is our local copy from the file on the repository.

We can use the diff command

svn diff filename

diff command can be used on the repository also to see how is current version different from previous one.

Original file

Modified File

Diff function output

Setting up a Data Repository in Windows:

In this project we used a data repository set up on Google Docs as the server and using Tortoise SVN (for windows), set up a client in our system. The software Tortoise SVN can be downloaded from the link

Once this software is downloaded and installed, you can start working.

Creating a data repository:

To create a data repository in your PC, just right click in any empty folder and choose the 'Create Repository Here'.

A repository will be created in that folder.

Also, you have to update some files in the conf folder. Open the “authz” file through notepad and update the information about authors. For eg. :


joe = /C=XZ/ST=Dessert/L=Snake City/O=Snake Oil, Ltd./OU=Research Institute/CN=Joe Average


harry_and_sally = harry,sally

harry_sally_and_joe = harry,sally,&joe


harry = rw

&joe = r

* =


@harry_and_sally = rw

* = r

And in the “passwd” file, you have to update the usernames and passwords of the authors. For example :


harry = harryssecret

sally = sallyssecret


Next what we want is to upload the files in the google code project made earlier(sm-dr) in our repository. This is done by using the SVN checkout utility. Just right click in the repository folder and choose the SVN Checkout option. Following dialogue box appears:

In the first space, write the url of the repository in the google code. The URL can be seen from the source tab in the project. Then in the next space is the address of the checked out file/folder. After this comes the checkout depth:

Checkout Depth:

This option allows us to choose the depth of recursion of the repository in the child folders like “Fully Recursive” allows us to checkout entire tree, including all child folders and sub-folders.


When a repository is not updated, one can see a red exclamation mark on its icon:

Updating is easy. Just right click on the icon and select the SVN Update option. The update starts automatically. A window will pop up displaying the progress of the update as it runs. Changes done by others will be merged into your files, keeping any changes you may have done to the same files. The repository is not affected by an update.


A non-committed file would have a question mark on its icon :

If your working copy is up to date and there are no conflicts, you are ready to commit your changes. Just right click on the file/folder you wish to commit and click on the SVN commit option. A dialogue box is opened in which you just have to tick mark the file/folder you want to commit.

Commit Progress

After pressing OK, a dialog box appears displaying the progress of the commit.

The progress dialog uses colour coding to highlight different commit actions


Committing a modification.


Committing a new addition.

Dark red

Committing a deletion or a replacement.


All other items.


  • Wikipedia
  • Google
  • tortoiseSVN Help


When we first chose the project we were completly clueless as to what is a data repository but when we started working on the project we found it very interesting and realized how useful data repositories are.Altogether working on this project was fun :)

Submitted By

Apoorva Mittal


Janhavi Agrawal



  1. although this isn't my topic, I'm pretty sure that its spelled Git instead of GIT