Return-Path: william@bourbon.usc.edu Delivery-Date: Wed Nov 19 22:30:18 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on merlot.usc.edu X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from bourbon.usc.edu (bourbon.usc.edu [128.125.9.75]) by merlot.usc.edu (8.14.1/8.14.1) with ESMTP id mAK6UIRI005867 for ; Wed, 19 Nov 2008 22:30:18 -0800 Received: from bourbon.usc.edu (localhost.localdomain [127.0.0.1]) by bourbon.usc.edu (8.14.2/8.14.1) with ESMTP id mAK6RUKO015854 for ; Wed, 19 Nov 2008 22:27:30 -0800 Message-Id: <200811200627.mAK6RUKO015854@bourbon.usc.edu> To: cs551@merlot.usc.edu Subject: Re: CS551_Final2_index files Date: Wed, 19 Nov 2008 22:27:30 -0800 From: Bill Cheng Someone wrote: > Can we simply use structures to store the names,SHA1 values and bitvectors > of all the files present on a node. > Although it is given in the Spec to use linear list and binary search tree. > Can't we create object of a structure whenever a file gets stored on a node > and when a search is to be performed, we can search through all the objects > to find a match. This won't be efficient but atleast correct, Right ? The spec explicitly asks you to implement these structures. To make it even more clear, I've just updated the spec to include the following: [BC: Added 11/19/2008] The "name_index" and "sha1_index" must be sorted. I've also updated the grading guidelines to add the sorting requirement to name_index and sha1_index. > It is given in the spec that 'The file names must be "kwrd_index", " > name_index", "sha1_index". These files are disk images of the corresponding > memory index structures.' > Is there any fixed format of how these files should look like ? Or can we > simply use any format Correct. > Like for kwrd_index file, it contains > bitvector1 > bitvector2 > . > .. > ... > bitvectorN Well, they need to be "correct". In your example, you have include at least the corresponding file number. So, it can look like the following: bitvector1 fn1 bitvector2 fn2 . .. ... bitvectorN fnN where fn# refers to a file number in the HomeDir/files directory. Similarly, for name_index, you can use an ASCII format for it. For example: filename1,fn11,fn12 filename2,fn21,fn22,fn23,fn24 . .. ... filenameM,fnM1 There can be multiple file numbers per filename since multiple files may have the same original FileName. You can also do the same thing for sha1_index. So, they are actually quite easy to implement, especially because they are not required to be trees! -- Bill Cheng // bill.cheng@usc.edu