Saturday, 1 October 2011

Data Repositories

Data Repository

In school, we undertook many group projects, and we all faced one common problem. Synchronizing different Version updates.
Every member updated the project (programs) at their own homes, and it was difficult to compare and combine the changes.This is an issue with basically every project undertaken as a group effort

To our relief....
Along came Data Repositories!!!!

Data Repositories are basically Version Control Systems located at a centralized server so that different users can utilize the same 'updated' files, programs etc. The logical question that comes to mind is "What is a Version Controlled System?"

Version Controlled System

By version control, we mean updating of versions of the same code when changed by different users,This allows a uniform updated code to be available to all the users.It involves logically or physically portioning data, so as to deal with the problem of ever increasing information in such databases. They are used by financial organizations, the government, university libraries, google spreadsheets, even the National Stock name it.

  • Easy accessibility to multiple, remote users.
  • User friendly because the physical factor is absent, plus various GUI based utilities have been developed
  • Digitalisation of libraries, satellite images, etc. adds value to these objects,Making them easier to preserve and update in the indefinite future
  • Most Importantly-"Control of data redundancy!" :)

SVN v/s Git

SVN and Git are both different types of data repositories.

  • stores data on a central server.
  • users directly make change to the file on server.
  • has one centralized repository, hence if various users are updating simultaneously, data can be lost.
  • provides an easy walk through different versions by using sequential revision numbers.

  • While using git, the user downloads the file from the server and then uses it.It is entirely his wish as to whether he wants to upload it onto the server. If uploaded, however, the file on the server gets replaced by the newer version and is then available for other users
  • Git repositories are smaller, faster and carry their entire history.Having no sequential revision numbers is useful when several users push data in the same git repository.
  • With the lack of sequential commits, data is neither lost, nor immediately merged with other changes
  • It uses the concept of branching by default. Hence, multiple backup copies exist (in some versions).
  • the data file format is also compressed in Git.

Creating the server

Using Visual SVN Server-
Visual svn server can be downloaded and installed from

We then create a repository using visual svn

In order for the server to work,some files in the repository directory >config folder are required to be updated

The "svnserve" file is updated, by adding the following lines through note pad

anon-access = read
auth-access = write
password-db = passwd
authz-db = authz

various users are granted access by updating the "passwd" file(same folder as above)
*username= *password

for testing our server, we start a daemon in command prompt
svnserve --daemon --root "Directory path of the repository"

The server had been created and was functioning properly.

Using Google Code
To create a server using google code, go to
click on create a new project, the following page will open

to know your checkout directory go to source.
The following web page in addition to showing the directory will also give the username and a link to the password generator


TortoiseSVN is a client side GUI utility program to access SVN servers. It can be downloaded and installed from:

SVN Checkout-SVN repository is created and synced with the already existing repository on the server; this is called an SVN Checkout
This is done by right clicking on a suitable directory and selecting SVN Checkout
The following dialog box appears

the following is an SVN checkout for a server set up using Google code

the checkout downloads-
Congratulations!your Data Repository is upto date

Using SVN
On using right click in the SVN folder, we choose TortoiseSVN to explore features in SVN.
TortioseSVN does not require a cmd.

Features in SVN

Commit -changing a file on the server, adding new ones, deleting etc. done by a right clicking on the file and choosing SVNcommmit

Update -extracting a new version of the file from the repository. Done by right clicking on the file and choosing SVNupdate

Diff - comparison utility that outputs the difference between two files.

Merge -commiting the changes made in branches to the trunk

Branch -used to try new features without disturbing the main development with errors

A hell lot of work was required. Constant error corrections were needed to get the server finally working! :) (Thank god it finally ran!)
Learned a nice way to share things.
I would like to thank professor Amarjeet to have given us the opportunity to work on this topic.
also, to Romil Bharadwaj for support

ps- (if movies ,exes, mp3s etc are on a repository....they can be downloaded from it even with firewalls in place! ;) )

Charupriya Sharma
Shikhar Singhal


  1. How is a date repository different from an ftp ?

  2. Would you first please go read about FTP! :/

  3. I've been establishing ftp s since before you even heard about them .