Return-Path: william@bourbon.usc.edu Delivery-Date: Thu Nov 13 07:55:18 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on merlot.usc.edu X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from bourbon.usc.edu (bourbon.usc.edu [128.125.9.75]) by merlot.usc.edu (8.14.1/8.14.1) with ESMTP id mADFtI9V014051 for ; Thu, 13 Nov 2008 07:55:18 -0800 Received: from bourbon.usc.edu (localhost.localdomain [127.0.0.1]) by bourbon.usc.edu (8.14.2/8.14.1) with ESMTP id mADFoit3031654 for ; Thu, 13 Nov 2008 07:50:44 -0800 Message-Id: <200811131550.mADFoit3031654@bourbon.usc.edu> To: cs551@merlot.usc.edu Subject: Re: search and get Date: Thu, 13 Nov 2008 07:50:44 -0800 From: Bill Cheng Someone wrote: > "If the existing file is in the permanent space, then you are > done".... we can check if the file we are GET'ting is already in our > file system just BEFORE we actually send out the get request (since we > have sha1,nonce,filename with us from the search responses) > in this case why request the file from some node just to have it > discarded at the end. only copying the file to current directory is > good enough. Doing a GET in this case will only make other nodes > update their lru's, maybe cache the file etc etc. Originally, what you said in the last sentence is what I thought that needs to be done. But come to think of it, this will allow one node to make a file more popular artificially. So, you are correct that this should not be done. I've added another bullet in the section you mentioned: [BC: Bullet added 11/13/2008] Before you flood a GET message, you should use the FileID to determine if the current has the file. If it does, you should not flood a GET message. In this case, if the existing file is in the cache space, its status should be changed so that it is "moved" to the permanent space (you should also do whatever adjustments that are necessary). -- Bill Cheng // bill.cheng@usc.edu On Wed, Nov 12, 2008 at 2:26 PM, Bill Cheng wrote: > Someone wrote: > > > scene: > > > > nodes A,B,C > > > > node A gives store command to store file X, which also gets flodded > > and stored (storeprob +ve) at B and C (and obviously A) so now we have > > 3 copies of same file (same name, sha1 and nonce) > > node A does a search for X and gets back 3 results from A,B and C > > (different fileID's) > > node A does a get for file X stored at node B > > > > now, > > according to spec - If the user at node A attempts to retrieve file X > > and file X was successfully retrieved, node A must serve file X (i.e., > > respond properly to future search messages). File X should be stored > > in the permanent area and not stored in its cache > > > > here, node A already has the file X in its permanent storage, then > > should it save that file again or check if it already has the file. > > would this involve opening all files to compare name,sha1 and nonce? > > FileName, SHA1, and Nonce together uniquely identifies a file. > So, if everything matches, a 2nd copy of the file should not > be saved. > > Once you have received a file, you can use either the filename > index (or the sha1 index) to find all file numbers having the > same filename (or sha1 value). For every one of these files, > you should open the corresponding metadata file and see if > they have the same nonce and sha1 value (or filename). > > > further, in the " additional notes " in the spec it says > > When a node performs a get and the file happens to be in the mini > > filesystem of this node, the following should happen. If the file is > > in the permanent space, a copy of the file should be placed in the > > node's current working directory. > > I think this paragraph was not phrased clearly. Sorry! I've > just rewritten it. Hopefully it's more clear now. Please see: > > http://merlot.usc.edu/cs551-f08/projects/final.html#nospace > > > continuing above example, a get only follows a search/get and in > > search we only display different fileID. so if I found 3 copies of > > file X and 'luckily' do a get for the file on my own node, only then > > will the above happen. in all other cases i will have multiple copies > > of same file on permament storage. If i do 50 gets for X stored on B i > > will end up with 51 copies of X on node A. > > If an action will cause an identical file (defined above) to > be saved, you should not save a copy of this file. > > For the copy you saved in the current working directory, if > there is already a file with the same filename (as in the > metadata), you should probably first prompt the user to see > if it's okay to overwite the file. > -- > Bill Cheng // bill.cheng@usc.edu