Return-Path: william@bourbon.usc.edu Delivery-Date: Thu Nov 20 08:23:35 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on merlot.usc.edu X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from bourbon.usc.edu (bourbon.usc.edu [128.125.9.75]) by merlot.usc.edu (8.14.1/8.14.1) with ESMTP id mAKGNYb5011215 for ; Thu, 20 Nov 2008 08:23:34 -0800 Received: from bourbon.usc.edu (localhost.localdomain [127.0.0.1]) by bourbon.usc.edu (8.14.2/8.14.1) with ESMTP id mAKGKr1e020236 for ; Thu, 20 Nov 2008 08:20:53 -0800 Message-Id: <200811201620.mAKGKr1e020236@bourbon.usc.edu> To: cs551@merlot.usc.edu Subject: Re: CS551_Final2_index files Date: Thu, 20 Nov 2008 08:20:53 -0800 From: Bill Cheng Someone wrote: > [BC: Added 11/19/2008] > The "name_index" and "sha1_index" must be sorted. > > I've also updated the grading guidelines to add the sorting > requirement to name_index and sha1_index. > > For sort, I sort name_index by key which is filename, and it looks like > > 1 > blondie1.mp3 > 4 > blondie1.mp3 > 2 > chess.jpg > 3 > usctommy.gif > 5 > usctommy.gif > > and same manner for sha1_index > 3 > 11CDDF2A0378D6A892CF43198847CE379E85D149 > 5 > 11CDDF2A0378D6A892CF43198847CE379E85D149 > 2 > F7917FE4976D2C24F225BC4B6C2334E554B91C28 > 1 > FFD3B197E1C0F0C27E7BDC219F553A3E6B139DFB > 4 > FFD3B197E1C0F0C27E7BDC219F553A3E6B139DFB > > I guess this should be ok, right? You need to have a "conflict resolution chain" in your index. Therefore, I would have: blondie1.mp3,1,4 chess.jpg,2 usctommy.gif,3,5 and: 11CDDF2A0378D6A892CF43198847CE379E85D149,3,5 F7917FE4976D2C24F225BC4B6C2334E554B91C28,2 FFD3B197E1C0F0C27E7BDC219F553A3E6B139DFB,1,4 -- Bill Cheng // bill.cheng@usc.edu > On Wed, Nov 19, 2008 at 10:27 PM, Bill Cheng wrote: > > > Someone wrote: > > > > > Can we simply use structures to store the names,SHA1 values and > > bitvectors > > > of all the files present on a node. > > > Although it is given in the Spec to use linear list and binary search > > tree. > > > Can't we create object of a structure whenever a file gets stored on a > > node > > > and when a search is to be performed, we can search through all the > > objects > > > to find a match. This won't be efficient but atleast correct, Right ? > > > > The spec explicitly asks you to implement these structures. > > To make it even more clear, I've just updated the spec to > > include the following: > > > > [BC: Added 11/19/2008] > > The "name_index" and "sha1_index" must be sorted. > > > > I've also updated the grading guidelines to add the sorting > > requirement to name_index and sha1_index. > > > > > It is given in the spec that 'The file names must be "kwrd_index", " > > > name_index", "sha1_index". These files are disk images of the > > corresponding > > > memory index structures.' > > > Is there any fixed format of how these files should look like ? Or can > > we > > > simply use any format > > > > Correct. > > > > > Like for kwrd_index file, it contains > > > bitvector1 > > > bitvector2 > > > . > > > .. > > > ... > > > bitvectorN > > > > Well, they need to be "correct". In your example, you > > have include at least the corresponding file number. So, > > it can look like the following: > > > > bitvector1 fn1 > > bitvector2 fn2 > > . > > .. > > ... > > bitvectorN fnN > > > > where fn# refers to a file number in the HomeDir/files > > directory. > > > > Similarly, for name_index, you can use an ASCII format > > for it. For example: > > > > filename1,fn11,fn12 > > filename2,fn21,fn22,fn23,fn24 > > . > > .. > > ... > > filenameM,fnM1 > > > > There can be multiple file numbers per filename since > > multiple files may have the same original FileName. > > > > You can also do the same thing for sha1_index. So, they > > are actually quite easy to implement, especially because > > they are not required to be trees! > > -- > > Bill Cheng // bill.cheng@usc.edu