Delivery-Date: Wed Nov 19 22:30:18 2008
Message-Id: <200811200627.mAK6RUKO015854@bourbon.usc.edu>
To: cs551@merlot.usc.edu
Subject: Re: CS551_Final2_index files
Date: Wed, 19 Nov 2008 22:27:30 -0800
From: Bill Cheng <william@bourbon.usc.edu>

Someone wrote:

  > Can we simply use structures to store the names,SHA1 values and bitvectors
  > of all the files present on a node.
  > Although it is given in the Spec to use linear list and binary search tree.
  > Can't we create object of a structure whenever a file gets stored on a node
  > and when a search is to be performed, we can search through all the objects
  > to find a match. This won't be efficient but atleast correct, Right ?

The spec explicitly asks you to implement these structures.
To make it even more clear, I've just updated the spec to
include the following:

    [BC: Added 11/19/2008]
    The "name_index" and "sha1_index" must be sorted.

I've also updated the grading guidelines to add the sorting
requirement to name_index and sha1_index.

  > It is given in the spec that 'The file names must be "kwrd_index", "
  > name_index",  "sha1_index". These files are disk images of the corresponding
  > memory index structures.'
  > Is there any fixed format of how these files should look like ? Or can we
  > simply use any format

Correct.

  > Like for kwrd_index file, it contains
  > bitvector1
  > bitvector2
  > .
  > ..
  > ...
  > bitvectorN

Well, they need to be "correct".  In your example, you
have include at least the corresponding file number.  So,
it can look like the following:

    bitvector1 fn1
    bitvector2 fn2
    .
    ..
    ...
    bitvectorN fnN

where fn# refers to a file number in the HomeDir/files
directory.

Similarly, for name_index, you can use an ASCII format
for it.  For example:

    filename1,fn11,fn12
    filename2,fn21,fn22,fn23,fn24
    .
    ..
    ...
    filenameM,fnM1

There can be multiple file numbers per filename since
multiple files may have the same original FileName.

You can also do the same thing for sha1_index.  So, they
are actually quite easy to implement, especially because
they are not required to be trees!
--
Bill Cheng // bill.cheng@usc.edu <URL:http://merlot.usc.edu/william/usc/>