WEBVTT 1 00:00:05.009 --> 00:00:14.070 William Cheng: Okay, welcome to the second part of lecture 24 so we were talking about try to improving the performance of the file system. 2 00:00:14.460 --> 00:00:21.570 William Cheng: And we mentioned that if you use a buffer cache, you can improve the performance of the 3 00:00:22.260 --> 00:00:28.620 William Cheng: The file system for reading you can, you know, you could make it as fast as you want. If you have, you know, a large enough buffer cache. 4 00:00:29.250 --> 00:00:39.660 William Cheng: But writing is a little different because writing you know you. There's no way for you to write ahead so you. So what happened is that we mentioned that there are two ways to go. One is to 5 00:00:40.050 --> 00:00:44.280 William Cheng: Write through and the other one is right back right back there's a risk. We're going to address that a little later. 6 00:00:44.970 --> 00:00:54.300 William Cheng: So so so the question right now is that, you know, given that what you tried it before my read operation, you can achieve more than 100% of the transfer capacity of the disk. 7 00:00:55.080 --> 00:01:02.130 William Cheng: For read for football writing, how can you get close to 100% of the transfer capacity, right, we can only get to 50% of transfer capacity. 8 00:01:02.790 --> 00:01:15.090 William Cheng: But we want to get to 100 yeah so so the solution over here is that you know you need a different file system. So the file system will have a different organization over here it says the organization of the, this is a very, very long log 9 00:01:16.500 --> 00:01:24.870 William Cheng: So, so, so, so, so what is a log log is like a, you know, like, like if you have a system, you try to record what happened to a system you have a log 10 00:01:25.410 --> 00:01:45.030 William Cheng: So the log the basic characteristic of a log is that you can only append to the log. He never delete anything from it. Okay, so, so, so, so, so we're going to try to go in that direction and see if you can actually can achieve near 100% of the disk transfer capacity. Yeah. 11 00:01:46.410 --> 00:01:53.160 William Cheng: Alright, so the idea here is that at the bottom, show you what the, what the law structure of our system is get the wrong target file system has 12 00:01:53.940 --> 00:02:04.170 William Cheng: The block zero here. Still, the blue block, block one over here is a super blah and the rest of the disk look like a pen only log. Okay, so it's kind of weird. This is a really weird. 13 00:02:05.430 --> 00:02:10.770 William Cheng: Process and when you try to add stuff to the law. You can only add at the end you can never go back in the middle, or we can change something 14 00:02:11.190 --> 00:02:18.330 William Cheng: Okay, so this is a complete different process them nothing we ever seen before. So when people first proposed this, it was really, really strange. 15 00:02:18.750 --> 00:02:27.750 William Cheng: It wasn't really clear they can actually build a file system out of this. But eventually, you know, people sort of figure out a way. Yeah, so the major principle of a lot of your file system is that it's a pen only 16 00:02:28.020 --> 00:02:31.410 William Cheng: And it's never delete. So if you create a file somewhere inside this the 17 00:02:32.040 --> 00:02:38.610 William Cheng: Inside the log over here, we create a log you a pentacle log. But once you have created. And there's no way to, there's no way to delete it. 18 00:02:39.180 --> 00:02:43.380 William Cheng: Okay, you cannot even modify data, you can I be modified data in the law. 19 00:02:44.220 --> 00:02:50.070 William Cheng: Okay, so why would this achieve, you know, near 100% of that this capacity that the this transfer capacity. 20 00:02:50.310 --> 00:02:58.440 William Cheng: Because when you try to perform I write the dis what you will do is that you will gather enough information. So for example, you can actually wait until you have, you know, sort of a 21 00:02:59.280 --> 00:03:04.230 William Cheng: One cylinder full of information and then you write the entire cylinder full of information in one shot. 22 00:03:04.890 --> 00:03:10.050 William Cheng: Okay, or if that's not fast enough. You know, if that doesn't get you to 100% of the dish transfer capacity. 23 00:03:10.260 --> 00:03:18.870 William Cheng: You can wait for two to cylinder full, full, full of data and then what you do is that you perform one seat waiver wire rotational latency and you keep right. Whereas, remember that right you know 24 00:03:19.680 --> 00:03:32.220 William Cheng: He said one chart, there's seven 750 sectors. So instead of, you know, writing a 750 times now you writing one time, right. So, therefore, all the 750 seek and 750 rotational latency. 25 00:03:32.880 --> 00:03:47.400 William Cheng: The cost basically disappear. Okay. And also, if you have a surfaces eight times amount of 50, you know, then, then you're writing a lot of data. Right. So in that case, when you perform the averages, the average it will become very, very close to 100% of the distress or capacity. 26 00:03:48.600 --> 00:03:57.690 William Cheng: Alright, so, so basically, I'll be here. The, the, sort of, the, the, the key to getting close to 100% of this capacity is that you pay for 27 00:03:58.110 --> 00:04:05.970 William Cheng: A seat a rotational that is the only wise and then you write all you want. Okay, so this way, the average performance will be near 100% of this transfer capacity. 28 00:04:06.600 --> 00:04:13.890 William Cheng: That so so so that that idea is very simple. So the question is, how can you actually build a file system out of a pen only and never update or never. And then 29 00:04:14.370 --> 00:04:22.680 William Cheng: Never modify. So this way. How can you change a file. Okay, so therefore we sort of talked about, you know, one of the system that would actually build to demonstrate that this can be done. 30 00:04:23.730 --> 00:04:33.120 William Cheng: Yeah. So the basic idea here is that we create a file right away, create about the I know goes to the end of the log and the data that belongs to the file also get get get get a penalty and a lot 31 00:04:34.860 --> 00:04:43.800 William Cheng: To append at the end log and next time. If you want to create another file. So you just keep adding to it right so you don't really write to the disk until you have enough data, right, and then you write everything one shot. 32 00:04:44.460 --> 00:04:52.950 William Cheng: Okay, so, so in this case will also need to address the issue. What if you want to change the file when you want to make a file bigger. What do you want to delete a file, you know, it's all that kind of stuff still needs to be done. Yeah. 33 00:04:53.970 --> 00:05:03.720 William Cheng: Alright, so, so, so I guess there's one research group, actually, you know, sort of figured out how to do it. So they actually had a demonstration system called a sprite file system. 34 00:05:04.200 --> 00:05:10.500 William Cheng: It was done by the Berkeley UC Berkeley researcher from UC Berkeley resource or they actually pretty famous with your file system group. 35 00:05:11.130 --> 00:05:23.820 William Cheng: So so so in order for them to demonstrate that you can actually achieve near 100% of the disk right capacity. Again, this is transfer capacity they build a lot truck to assist them and they and they should have this can be done. 36 00:05:24.930 --> 00:05:33.300 William Cheng: But in the end, this is really not a very interesting file system. Right. What, why is that right. So if you look at this file system over here. Once I finished writing all the way to the end of the desk. 37 00:05:34.050 --> 00:05:43.980 William Cheng: What can I do. Well, there's nothing you can do. Right. So, because you know this file system is a pen only never delete ever update. So once you get to the end of the file system over here. Then you become a read only file system. 38 00:05:45.120 --> 00:05:50.850 William Cheng: Okay, so, so this is not very useful for us as them. But again, the Berkeley people, they just try to say, Well, I want to demonstrate that this can be done. 39 00:05:51.360 --> 00:06:00.210 William Cheng: So even though it's not very useful if it's not very useful. Why do we talk about it. And the reason we talked about is that later on somebody else actually use this idea to do something more useful. 40 00:06:00.630 --> 00:06:04.410 William Cheng: Okay, so, so, so we also got to see that pretty soon. Yeah. Alright. 41 00:06:05.220 --> 00:06:17.280 William Cheng: So let's take a look at this graph houses them. How do you actually, you know, change the data. So, for example, so l l FS over here is the the larger houses them. Right. Well, we try to create a file that I know is here data blocks over here. 42 00:06:17.610 --> 00:06:23.400 William Cheng: So what we'll do is that we can append that to the lock. Right. But again, you know, when you try to plan things out. You just write something, you know, you 43 00:06:23.760 --> 00:06:33.180 William Cheng: You read your software to plan on the desk and once you gather about one cylinder worth of data, then you perform the right now as we can play all these things out. So I'm going to write a date. I hear onto the 44 00:06:33.510 --> 00:06:36.690 William Cheng: Onto the at the end of the law first and then we're going to write the I know 45 00:06:36.840 --> 00:06:42.390 William Cheng: So you can actually sort of figure out, you know, what is the block nominees that I know. So every Monday I know at the end there is a this map. 46 00:06:42.540 --> 00:06:50.850 William Cheng: Is that this map to these pointer to all these places on the days when this case all these data right here have block numbers. All you do is sore sore those block number inside you. I know. 47 00:06:51.210 --> 00:06:59.580 William Cheng: That you can actually find all this data that so that's pretty straightforward. So we're going to add a fine to it and then we can add another file into. So let's say this fall is a and this one has been 48 00:07:00.480 --> 00:07:06.180 William Cheng: How do you actually find this file am beat right so so you basically need that data show to the keep track of all your notes. 49 00:07:06.900 --> 00:07:11.790 William Cheng: So in the spread fast. Is that what they decided to do is to build a data structure called the map data structure. Again, you 50 00:07:12.390 --> 00:07:16.890 William Cheng: Should be familiar with a map data structure, it just look up the infrastructure if you're given. I know they know 51 00:07:17.370 --> 00:07:27.900 William Cheng: How to Find it then. So the ability to start to call it. I know map, starting with the I know, Matt, you are able, you should be able to find all the iOS inside your system. OK. So again, this is a gigantic data structure. 52 00:07:28.500 --> 00:07:34.350 William Cheng: They pointed the ruler. They want every father ever created. So this way, starting, but I don't map, you will find all your files. Yeah. 53 00:07:35.130 --> 00:07:42.210 William Cheng: Right, so, so let's. So let's say that we have this data structure. We don't know where it's stored yet, but it has a little whoo, whoo. 54 00:07:42.750 --> 00:07:49.650 William Cheng: We're gonna see where this data structure goes. So now this I know Matt's been keeping track of our A and B, right. So let's say that I want to make a little bigger. 55 00:07:50.010 --> 00:07:56.940 William Cheng: Okay, so how do I make a little bigger. So a over here. Here is a over here. And these are the data blocks over here. If you try to make it a bigger 56 00:07:57.240 --> 00:08:03.360 William Cheng: So let's say that I want to make the state of block over here, extend a little bit. But again, I mean this is part of the log. So therefore, you're not allowed to modify it. 57 00:08:03.870 --> 00:08:10.860 William Cheng: Okay, so what I have to do is that when I modify thought hey I need to append to the end of the log. Okay, so therefore it will look like this. Right. 58 00:08:11.190 --> 00:08:22.230 William Cheng: This data over here will be gone. I need to move it over here. But again, I can't really modify this old data. So all I have to do is to add a new blog at the end of the log. And then I also need to point this pointer to point to point out right 59 00:08:22.890 --> 00:08:25.950 William Cheng: But how can I really point there. I really not allowed to do that. So, so over here again. 60 00:08:26.250 --> 00:08:34.770 William Cheng: If you want to change this pointer that will be modifying you know data inside a law. You're not allowed to do that. So, therefore, this particular I know he needs to be moved at the end of this 61 00:08:35.460 --> 00:08:42.990 William Cheng: Again, at the end of the law, then. So the analog over here. He's will go to the end. So in that case, my I know map over here will point to where the new folders. 62 00:08:43.350 --> 00:08:50.370 William Cheng: Right, so they will look like this over here, this will be the new new file eight. OK. And then from, you know, 63 00:08:51.300 --> 00:08:57.390 William Cheng: And then from there, it will point to all these data blocks over here, I get the data back has been moved. So therefore, they will do those things. 64 00:08:57.540 --> 00:09:09.750 William Cheng: Right. And instead, I know map over here. I'm going to say that, you know, for a was here before it's no longer here I need to go to a different place. Okay, so now what I need to do is I need to actually, you know, I know. Matt change. I need to append that I know map to the end of the log 65 00:09:10.770 --> 00:09:19.170 William Cheng: But I don't have. It's a very, very big data structure. Okay, so what the spray people decided to do is that they will take the I know Matt and they chop them down into little pieces. 66 00:09:20.040 --> 00:09:31.410 William Cheng: There so basically is a two level data structure. They were charged. I know maddening to many, many pieces and then the piece that contain foul a that will be the one that has changed, and that will actually go go go go to the envelope. 67 00:09:31.800 --> 00:09:38.820 William Cheng: Okay, so therefore it will look like this. The yellow part over here, this will be the I know Matt piece that can contain four MB. So therefore, 68 00:09:39.000 --> 00:09:46.920 William Cheng: You know, he was here, he is still stay where they are and this I know, Matt. So where was this I know my piece before what this item as piece before. Was that a lot 69 00:09:47.580 --> 00:09:52.620 William Cheng: OK, so now you know since 4am be changed. So therefore, again I anything that changes append only 70 00:09:52.830 --> 00:10:02.670 William Cheng: I need a pen. At the end of the log over here. So now we're going to talk over, you know, a and b. And now the top level data structure. They keep track of all the I know Matt pieces that also needs that also has changed. 71 00:10:03.240 --> 00:10:09.270 William Cheng: Okay, so again, that data structure is very, very large. If I keep a pending to the Father to the end, the log over here, then it's going to be a big mess. 72 00:10:09.840 --> 00:10:17.010 William Cheng: So what the stripes black people decide to do is that the top level data structure that keep track of all the I know Matt pieces that data structure. It doesn't go into the law. 73 00:10:17.850 --> 00:10:23.700 William Cheng: Okay, so that data shows as you go into something called a checkpoint file inside a checkpoint file which is a separate part of the desk. 74 00:10:23.940 --> 00:10:33.420 William Cheng: That's not part of the luxury houses them. So, so, so, so maybe what you would do is that you will you take your desk you partition them into two parts. One is the checkpoint file that keep track of, you know, 75 00:10:33.690 --> 00:10:42.990 William Cheng: The, the luxury of us there. And then the, the other part is the law structure processor, guys, so therefore the picture look like this. I have a check one file that keep track of all the I know Matt pieces. 76 00:10:43.830 --> 00:10:53.100 William Cheng: Okay, so then when this guy is what it was before is that the check my father was in the I know Matt piece of content and be there were here before. Now all I have to do is to to to make this change over here. So they will point to a new place. 77 00:10:53.880 --> 00:10:55.470 William Cheng: That. So this way when you try to 78 00:10:55.890 --> 00:11:06.780 William Cheng: What is the SharePoint about what a TripAdvisor also going to be stored in the superblock so the supermarket at your final, final, final word as right so this way. If you go to the checkpoint file, you will be able to find the entire file system. 79 00:11:07.170 --> 00:11:10.980 William Cheng: Right, because if you can find all the I know that I know will point to all data Plaza in this way you can find it. 80 00:11:12.720 --> 00:11:15.090 William Cheng: Alright, so, so, so, so 81 00:11:16.140 --> 00:11:22.980 William Cheng: So anyways, so when you try to you know make a lot of changes over here you can append a lot of stuff you know into the end of the 82 00:11:24.120 --> 00:11:31.410 William Cheng: Launch of houses. Then, and then you need to update a check my father point to, you know, all these kind of places right so so again this is how the logical system work. 83 00:11:32.280 --> 00:11:37.800 William Cheng: There's only one thing that we need to sort of address right so because you know when you try to wait for a lot of data to be written out into the log 84 00:11:38.190 --> 00:11:40.950 William Cheng: We need to wait a cylinder full of data and that can take a long time. 85 00:11:41.460 --> 00:11:49.110 William Cheng: Okay, so what's a long time. Right, long time, maybe five seconds. Maybe 10 second maybe even 30 seconds. So if you need to wait that long. Well, chances are you're going to get a crash. 86 00:11:49.410 --> 00:11:54.750 William Cheng: Okay. I mean, the longer you wait, the, the, sort of, the bigger the risk that you will lose power, you don't get a system crash. 87 00:11:55.260 --> 00:11:59.130 William Cheng: So in that case, what do you do, right. So, so you know all the things that you're trying to write to the log. Over here we 88 00:11:59.520 --> 00:12:05.640 William Cheng: Get a crash while they are going to be in big trouble because the data structure is that a checkpoint file will be inconsistent with a large virtual file system. 89 00:12:06.180 --> 00:12:13.410 William Cheng: OK, so the solution that come up with is basically they borrow the idea from graphics is them. So I guess if you have taken the graphics class, you know, 90 00:12:13.830 --> 00:12:19.320 William Cheng: You, you know that when they tried. We tried to write data to the computer screen. They use something called double buffering. 91 00:12:19.710 --> 00:12:29.550 William Cheng: Right, so, so, so let's say that this is your, this is your computer screen over here. What happened is that we can use one buffer over here to to to send data to the screen. Right. So in this case, you know, the other 92 00:12:31.290 --> 00:12:35.280 William Cheng: So what a Hora would use to take all the data from the buffer over here displayed on the screen. 93 00:12:35.700 --> 00:12:45.150 William Cheng: Okay, if you only use a single buffer. And then what you will do is it will change this, you know, we, when we want to change the screen you will make changes over here. If you do that, you'll see the screen flicker. 94 00:12:45.990 --> 00:12:50.640 William Cheng: Okay, so what they do is that they actually use to buffer. So you start with this buffer. And then you make a copy of it. 95 00:12:50.880 --> 00:13:00.300 William Cheng: And then you change the new buffer over here, only the place that you would change over here, you change in a different buffer. And when you finish changing the new buffer over here, what you do is that you switch the pointer to have the other frame have 96 00:13:00.630 --> 00:13:02.310 William Cheng: To have the other buffer over here, see the screen. 97 00:13:03.060 --> 00:13:04.500 William Cheng: Okay, so this way when you switch 98 00:13:04.830 --> 00:13:11.070 William Cheng: Over to the other, the screens are going to flicker and and from this point on, what you will do is that next time, will you, will you need to 99 00:13:11.190 --> 00:13:15.750 William Cheng: Change data. Okay, you make a copy from this buffer into the other buffer and you start changing this buffer. 100 00:13:15.900 --> 00:13:22.020 William Cheng: And we finished over here okay you switch the point are you so you keep switching to ponder. So in the end over here. You're gonna, you're not going to end up with any kind of flickering 101 00:13:22.770 --> 00:13:32.310 William Cheng: That's all we can do the same thing with the checkpoint file that we can actually have to checkpoint file one, remember the previous version of the file system and what it's going to be the new version of the file system. 102 00:13:32.670 --> 00:13:36.000 William Cheng: Guys, so there was going to look like this, we have to check one file one 103 00:13:36.180 --> 00:13:46.590 William Cheng: So so so so happens that you're going to wait for five seconds 10 seconds 30 seconds to wait for all your changes, right. So, in that case, to check one file over here will point to the old version of the fastest and where the end of the log is right here. 104 00:13:46.950 --> 00:13:48.810 William Cheng: Yeah, so now you are ready to write and 105 00:13:49.800 --> 00:14:00.000 William Cheng: You're ready to write a cylinder full of data into the law. So what you would do at that time is that you will you will figure out what the new check my father looks like by copying the old check off into a new checkpoint file. 106 00:14:00.210 --> 00:14:01.860 William Cheng: And then you update a new check, Papa. 107 00:14:02.130 --> 00:14:12.900 William Cheng: And then you can start performing this long log, which was a long right to the law and again when you try to write a lot of data into the law all into the same cylinder, you're going to achieve near 100% of the dis right capacity. 108 00:14:13.260 --> 00:14:17.580 William Cheng: That but you'll also get. You can also get a crash. So why don't we get a crash. 109 00:14:18.180 --> 00:14:26.100 William Cheng: But when you get a crash. Next time you boo, the operating system. What happened is that in the in the super block the super blog. Was that your current file system is in check pump out a 110 00:14:26.880 --> 00:14:38.100 William Cheng: Okay. It will also remember what the end of the law charge of our system and you'll be right here. So this way you know you might last the last few seconds of changes. But again, when you reboot the file system, you're gonna find the file system and the file system, the 111 00:14:38.370 --> 00:14:48.090 William Cheng: System is going to be in a consistent state. Okay, so this way. If you have a crash, there's gonna be no problem. You don't lose them. You lose a few seconds of work but you always go back to a consistent process though. 112 00:14:49.170 --> 00:14:54.450 William Cheng: Okay, if it turns out that you don't get a crash. So what you would do is I you will write the entire cylinder, you know, 113 00:14:54.900 --> 00:15:04.110 William Cheng: As achieving nearly 100% or right capacity, while you're finishing with that, then the children will be over here will point to the new version of the process. And so there's one more thing you need to do. 114 00:15:04.320 --> 00:15:08.310 William Cheng: You need to modify the super blog to say that now the file system is checked off, I'll be 115 00:15:09.630 --> 00:15:19.140 William Cheng: Okay, so, so as long as you don't get any crashes over here. From this point on, if we get a crash, we do the system, you will find a new file system in a checkpoint by be and check with Bobby will be a consistent have our system. 116 00:15:19.740 --> 00:15:27.840 William Cheng: Well guys, so this way, whether you crash in the middle of the updates of the desk or, you know, afterwards, you always go back to the states that have a system. 117 00:15:28.380 --> 00:15:36.810 William Cheng: Okay, so, of course, you know, we hear what you said it right to the right, the entire cylinder. You got to make sure that the policy is me. That is isn't a consistent day you know at 118 00:15:37.800 --> 00:15:48.270 William Cheng: That time you'll do that. So how do you make sure that your facility is inconsistency. So would you do that you you go verify that inside the buffer cache to make sure that all the data structure, the data structure. 119 00:15:50.430 --> 00:16:00.270 William Cheng: Okay, so later on when we talk about crushing recovery. What sort of talk a little bit more detail of how to use this particular technique to make sure that your file system is questions. Yeah. 120 00:16:01.740 --> 00:16:03.330 William Cheng: All right, so 121 00:16:04.830 --> 00:16:09.660 William Cheng: All right, so, so the launch of us is mid advantage over here is that, you know, again, the purpose of, you know, 122 00:16:09.960 --> 00:16:18.210 William Cheng: For the Berkeley people to create a lot to process them is to have good performance rights, right, because the new version of good performance were reading what is very simple. You just use a large buffer cache. 123 00:16:18.870 --> 00:16:23.790 William Cheng: To to achieve new 100% of this right throughput, you have to use something like a 124 00:16:24.270 --> 00:16:32.820 William Cheng: Larger file system that you can also recover from crashes easily through the use of checkpoint out again. They use double buffering, just like the graphic system. 125 00:16:33.090 --> 00:16:36.930 William Cheng: So this way you can crash. Anytime you can always recover and go back to the previous state. 126 00:16:37.530 --> 00:16:43.320 William Cheng: Which is a good state for the file system. Again, the main disadvantage of a lot of our system is is not very, very useful. 127 00:16:43.710 --> 00:16:47.490 William Cheng: Because as soon as you finish right into this that this become a read only file system. 128 00:16:47.760 --> 00:16:58.860 William Cheng: Okay. And also if you keep modifying files over here, you can see that a lot of space over here. They are wasted, right, because once you modify it was, you know, the data that was in the earlier part of the desk. You can never reclaim it. 129 00:16:59.700 --> 00:17:10.950 William Cheng: Okay. So, therefore, you know, it's really not a very good is really, it's not really a very useful file since then. Okay. But again, people gain some experience by playing with a system like that. Yeah. 130 00:17:12.690 --> 00:17:13.230 William Cheng: All right. 131 00:17:14.610 --> 00:17:16.050 William Cheng: Okay, the last part. 132 00:17:17.730 --> 00:17:27.660 William Cheng: Over here talking about the performance over here is that we're going to sort of briefly talk about the Windows file system. So again, we don't you know know that much about it other than, you know, the work that are published by the window passes them. 133 00:17:28.170 --> 00:17:40.380 William Cheng: Windows has, you know, a fast 1632 they use a data structure extents. So, so this is this is the equivalent to that this map of this isn't bob bob surpasses any Windows. Yeah. 134 00:17:40.830 --> 00:17:46.200 William Cheng: So what are the expense right extend is the list of runs as tell you where all the data blocks are so for example. 135 00:17:46.410 --> 00:17:52.920 William Cheng: You know, this is, this is one of my files. Right. Instead of using the map right and remembering system of us in this map is the last 13 pointers. 136 00:17:53.130 --> 00:17:59.820 William Cheng: So we here. They use the data structure this data show you tell you where all this block is ok. So, for example, the beginning part over here. 137 00:18:00.180 --> 00:18:09.840 William Cheng: They tell you where the first three block is and then the next one over here. Tell you what the next four block is an excellent tell you where the next so many blocks is etc. So if you can coordinate all these blocks together, you have a 138 00:18:10.500 --> 00:18:23.820 William Cheng: Net. So the first three blocks over here starts at offset 117 to eight. So again, that's a block number. It says, if you go to blog 117 to eight of the file system over here and there are three contiguous blocks over here that belong to the beginning part of this file. 139 00:18:24.240 --> 00:18:34.590 William Cheng: Okay, so why do they use that this the data structure, right, it sounds like a really weird data structure. The reason they did this that because in order for them to allocate this space. They use the first fit memory alligator. 140 00:18:35.640 --> 00:18:41.130 William Cheng: Okay, the first of memory alligator right when you try to, you know, we need certain certain number this blog like you called mellow. 141 00:18:41.310 --> 00:18:47.070 William Cheng: So in this case, you called Malik, except it's for the desk. So you say, I need three blocks. So what it would do is that it will go to the first 142 00:18:47.250 --> 00:18:52.230 William Cheng: I'll go go go there free list find three blocks and it will allocate the first block over here, remember where it is. 143 00:18:52.440 --> 00:18:56.460 William Cheng: And later on, you try to make the font bigger is that, oh, this time I want to make it bigger by flow blocks. 144 00:18:56.610 --> 00:19:06.540 William Cheng: Again, you call the first fit you know memory allocated for this and they will allocate four blocks you starting at this this block number and then you've allocated for this particular about you keep doing this and you'll fall getting bigger and bigger and bigger. 145 00:19:07.050 --> 00:19:16.620 William Cheng: Okay, so this is a horrible data structure because anytime he tried to retrieve all the data from the disco be here. You need to go through this linear list find all the offset and try to retrieve all these block line at a time. 146 00:19:17.460 --> 00:19:23.040 William Cheng: Now, so this is called Alyssa runs right so so the first three block over here is a run and it starts at a certain 147 00:19:23.880 --> 00:19:26.730 William Cheng: Offset number. Okay, there's another thing that's really 148 00:19:27.000 --> 00:19:33.840 William Cheng: Wrong over here is that this roundness data structure. Where do you store that right. You also need to store this data structure inside of our system as well. Again, just like in 149 00:19:34.110 --> 00:19:40.500 William Cheng: In system boss is there the data structure that store that points to all the data block is that 13 pointers. 150 00:19:40.950 --> 00:19:47.940 William Cheng: Okay, this isn't a policy that is pretty smart. They use a fixed size data structure. Use a 13 pointer over here, they can actually point to a file sizes. 151 00:19:48.450 --> 00:19:49.920 William Cheng: That can be as big as 16 gigabyte. 152 00:19:50.790 --> 00:19:53.310 William Cheng: Okay. So Microsoft over here if you want to have a really, really. 153 00:19:53.580 --> 00:19:58.050 William Cheng: Big fall over here. What does the data structure will look like, right. So if you create this fall a little bit at a time. 154 00:19:58.200 --> 00:20:09.750 William Cheng: In this case, the run this will be really, really long ride this run is over, you would say, Well, here's the first three block has an ex football. Here's the next seven blog has the next 11 o'clock. And then, and then this data structure can actually spend multiple this blocks. 155 00:20:10.770 --> 00:20:19.800 William Cheng: Okay, so, so when you perform a random search. So let's say you're doing a C Corp. I want to see to offset, you know, 10,000 where's that 156 00:20:20.580 --> 00:20:28.530 William Cheng: Okay, so what you need to do is that you actually need to retrieve the entire run list and start, you know, going from the beginning. Over here, try to find out you know where is offset $10,000 157 00:20:28.740 --> 00:20:31.710 William Cheng: Okay, so in that case the run this can actually spend many, many this block. 158 00:20:31.980 --> 00:20:43.650 William Cheng: The crucial thing about the performance of the fastest them is that you need to go to the dis as few times as possible. Right. So by traversing the wrongness. You have you actually have to go to order n number of this block, you know, 159 00:20:43.980 --> 00:20:47.970 William Cheng: This blog is that apostles them while you performance going to be really poor. Okay, so remember 160 00:20:48.510 --> 00:20:54.720 William Cheng: You know, for this isn't about us is that if you want to go to a random blog is that this isn't about bosses them. You know what, what is what's the performance 161 00:20:55.170 --> 00:20:56.520 William Cheng: The performance is order one. 162 00:20:57.090 --> 00:21:05.190 William Cheng: Okay, so what so so this isn't about us as as order one over here is order n because around this can be really, really long. So therefore, the performance is terrible for Windows 163 00:21:05.460 --> 00:21:11.370 William Cheng: Okay, so if you're using Windows. You should never really use passive seeing a factor be too because they are really, really slow file system. 164 00:21:12.210 --> 00:21:20.400 William Cheng: All right. There are also other other problem with a fast 1632 passes them is that you know pieces because they use the first 165 00:21:20.940 --> 00:21:26.850 William Cheng: Memory memory, memory allocated algorithm on this system, we're going to end up with the same problem with the first memory alligator. 166 00:21:27.240 --> 00:21:32.010 William Cheng: One of the big problem with the first segment memory alligator is that it's going to have external fragmentation. Okay. 167 00:21:32.520 --> 00:21:42.630 William Cheng: So if you use Windows, you will know that once in a while, you have to run this point on how the frag. So what does the frak do. Right. So if you look at the linear view of the desk right here is a booth, blah. Here's the the 168 00:21:43.110 --> 00:21:52.020 William Cheng: The Super blah, and then the data is scattered all over the place on the desk right because you're using the first remember alligator. So there's gonna be a lot of empty space over here, all over the desk. 169 00:21:52.410 --> 00:21:59.520 William Cheng: Okay, so when you perform the this a different operation. What they're trying to do is to take all the data blocks over here and then move them towards the left. 170 00:21:59.700 --> 00:22:05.250 William Cheng: And this way all the empty space over here will move to the right. So when you're done with the fragmentation while I've been inside the 171 00:22:05.520 --> 00:22:13.290 William Cheng: This will look like this. So this is still the booth, blah. And this is still the super blah, and then all they use data, they're all sitting over here and the rest of the days over here is completely empty. 172 00:22:15.000 --> 00:22:22.740 William Cheng: Okay, so he's not a good thing to do. I mean, first of all, when you try to perform this the fragmentation is. It's called the frag because the because they want to be fragmented disk. 173 00:22:23.850 --> 00:22:27.510 William Cheng: You're going to get a very scary message on the screen to say don't turn off your machine. 174 00:22:27.840 --> 00:22:35.700 William Cheng: Right, because if in the middle, when they try to move all these data blocks around over here if you turn off the machine. What then you'll fast as it will be an inconsistent state one, then 175 00:22:36.060 --> 00:22:40.140 William Cheng: That's how you put the fastest, then you'll assist them basically it's working. 176 00:22:40.650 --> 00:22:48.720 William Cheng: Okay, so therefore, oh yeah so so you can actually imagine the algorithm. So what it will do is that it will read the day and the entire data on the desk and try to sort of move them a little bit of time. Bye. 177 00:22:49.260 --> 00:22:58.530 William Cheng: Bye bye shuffling all these data forward to what the beginning of this rather yeah it's a very, very scary algorithm. If you lose power in the middle. Sometimes this operation can take hours to finish. 178 00:22:59.010 --> 00:23:06.960 William Cheng: So you know if it takes that long. You know, if you have operating system crash or you have a, you know, if you lose power than then they're going to end up with a 179 00:23:08.010 --> 00:23:15.090 William Cheng: Number that that file system. Okay, so, so, so, so, and you notice that if you're running Linux Linux never asked you to do diffract 180 00:23:15.660 --> 00:23:24.810 William Cheng: razon again. The reason is that because they, you know, so even if you're using system PA system, you know, you'll remember allocation over here is four kilobyte full, full, full, full 181 00:23:25.320 --> 00:23:34.770 William Cheng: Full is going to be four kilobytes, and you don't have internal you don't have external fragmentation. Okay. So, therefore, using a buddy system and you know in you. 182 00:23:35.580 --> 00:23:45.480 William Cheng: Know, not a buddy system. I mean, the algorithm that we mentioned before to allocate you know use block and free blah, you won't have any kind of internal you won't have any kind of external fragmentation. Right. 183 00:23:46.710 --> 00:23:54.930 William Cheng: Alright, so the other problem with, you know, the fast 16 and 32 bosses them is that, you know, when you perform a random access using LLC. 184 00:23:55.530 --> 00:24:02.970 William Cheng: So after that performance will be really, really poor that so the solution to fix the windows problem is to use multi level extends 185 00:24:03.480 --> 00:24:11.430 William Cheng: OK. So again, you know, if you have a you know problem that's not scalable. It's order n. The two solutions I either use a hash table or use a tree data structure. 186 00:24:11.820 --> 00:24:22.440 William Cheng: In window, they decide to use a tree data structure. So instead of having one run run list that can go on for forever, what they decided to do is that they divide them into two level runners. Right. So you have a tree data structure. 187 00:24:22.590 --> 00:24:32.550 William Cheng: There's a top level run this and then there's a second level run this. The second level runners. They look just like to run this before. Okay. And the first level rounders keep track of all the second number on this. 188 00:24:33.210 --> 00:24:44.280 William Cheng: Okay. So, this way we ensure the performer random access. What you do is that you go to the first. You know, you go to the first level run this and again the trick. Over here is that every run list, they will fit inside one this bar. 189 00:24:45.210 --> 00:24:52.110 William Cheng: Okay, so therefore, when you try to access the first first level runners. All you do is that you treat that this block and then you use all the 190 00:24:52.590 --> 00:24:57.240 William Cheng: They'll get the first level here. They also have a run this us to run this inside the first level to try to figure out 191 00:24:57.480 --> 00:25:04.110 William Cheng: You know, which seven second level runners, you have to go to. And then you go to the second level run this again. We only one this block and now 192 00:25:04.980 --> 00:25:11.850 William Cheng: And now, will you finish that when you finish that region that this law you know exactly where the block number that you need in order for you to go to the go retrieve your data. 193 00:25:12.600 --> 00:25:17.070 William Cheng: Okay, so this way, it becomes, you know, so, so, I mean, cuz he can he can say that this is Otter login 194 00:25:17.610 --> 00:25:26.160 William Cheng: But, you know, typically, you know, the multi level be here. It's going to be two level. So in this case, it becomes order one. And this actually perform even better than system PA system. 195 00:25:26.730 --> 00:25:33.720 William Cheng: Right, because this is above us to send the worst case in order for you to find out a block number. You have to actually go down because it could be a triple triple interact 196 00:25:33.900 --> 00:25:45.090 William Cheng: So in that case, you have to go to this, you know, three times right plus last time you will be four times that. So in Windows over here. The only overhead is going to be the first level around this plus the second number on this and they you know exactly what you have to go 197 00:25:45.540 --> 00:25:47.490 William Cheng: Now, so the picture. It looks like this. 198 00:25:48.120 --> 00:26:00.030 William Cheng: You have a top level run, let's just say that the first 84 blocks over here is stored inside the first run list. So if you add up all the length over here is going to be equal to 84 and the next 132 blocks over here. It's going to be in the second 199 00:26:00.420 --> 00:26:05.880 William Cheng: Is going to be another runners and then the next 98 blocks over here, it's gonna be another one is over here. So this way. 200 00:26:06.510 --> 00:26:15.660 William Cheng: You know, by going to to to level runners are you, you will be able to find to do to do find out what is the block number for the, you know, for data block. Are you looking for that. 201 00:26:17.310 --> 00:26:24.120 William Cheng: Alright, so that's all we want to say about Windows. So Windows NT that, you know, so when they use a to extend 202 00:26:24.540 --> 00:26:33.000 William Cheng: This data structure is used inside MTF s. So when you're using Windows, you should always use MT FS because their performance is much better than the file system. Yeah. 203 00:26:33.840 --> 00:26:44.340 William Cheng: Alright, so we are done with the file system performance issues over here. The next part. We're going to look at is look at crush recovery. Okay, so, so this is about track pressure resiliency. 204 00:26:44.730 --> 00:26:52.560 William Cheng: When we get a crash inside our file system. We don't want to lose data. Okay. But if you really think about it. Is there any way not to lose any data at all. 205 00:26:53.430 --> 00:27:01.290 William Cheng: And so for example, if I'm using Microsoft Word. I'm trying to save a file at the time when I press the Save button. What if I get a power outage right at that time. 206 00:27:02.700 --> 00:27:12.570 William Cheng: Okay, I'm trying to say. Right. And then, and then at exactly the same time I do. I have a power outage. Okay. So, in that case, you know, my data. My never go into the desk. 207 00:27:13.890 --> 00:27:24.990 William Cheng: Okay. So, therefore, you know. It's kind of impossible to actually make sure that you don't lose any data whatsoever. Right. So, so, I mean, again, there's a way to do it right the way to do is use some kind of a memory device to make sure that whatever you 208 00:27:27.000 --> 00:27:33.390 William Cheng: Whatever you're making a modification. They don't go into this actually go somewhere else. Right. So again, that will be a really, really expensive system. 209 00:27:33.840 --> 00:27:43.440 William Cheng: There are people actually feel system like that because, again, these are not your whole purpose albinism. I, for example, if you are, if you have a banking system we have a banking system. You want to make sure that everything you know 210 00:27:44.850 --> 00:27:51.510 William Cheng: Nothing back and ever happened, Keke Keke Keke Keke ever happen your system. Right. So in that case, you will build a very, very reliable system. 211 00:27:51.720 --> 00:28:00.420 William Cheng: And they're special tricks that you can do it. So again, it's out of the scope for our class that. But typically, you know, when you have a general purpose operating system. It's very difficult to build a, you know, to 212 00:28:00.720 --> 00:28:02.880 William Cheng: To to do the build system, we don't lose any data. 213 00:28:03.390 --> 00:28:11.160 William Cheng: Okay, so therefore we're going to sort of consider case, you know, maybe losing a little bit of data is ok but we don't want to lose the entire file system right losing a top autism is really not acceptable. 214 00:28:11.400 --> 00:28:13.950 William Cheng: Right. So again, I mentioned before, you guys are pretty lucky today. 215 00:28:14.370 --> 00:28:22.650 William Cheng: Pretty much anytime. When you lose power or some people I see that they have laptop. They're running out of power when they lose power and next time when they reboot the system all the files are there. 216 00:28:23.160 --> 00:28:29.970 William Cheng: Okay, it is possible that the file you just use that they are editing, you just trying to say, maybe you lose the part of the data for that file. 217 00:28:30.450 --> 00:28:37.200 William Cheng: Okay, but at least all the directory structure is going to be intact. So therefore, in this case, you're not losing our directory you also you're not going to lose the root directory 218 00:28:38.250 --> 00:28:44.700 William Cheng: Okay, so therefore, when you try to repeat your sister, maybe some of the data is gone, but at least any, any of the important data they're still going to be there. Yeah. 219 00:28:45.270 --> 00:28:53.520 William Cheng: Well, that's okay. It's still very important if you are using a regular, you know, a general purpose, you know, a file system is still need to required to do backup, right, because otherwise fire to any company has 220 00:28:54.450 --> 00:29:02.550 William Cheng: Has to make backup if the file system can never go back that. Alright, so, so we're going to sort of talk about some of the techniques that are using the modern operating system. 221 00:29:03.120 --> 00:29:11.550 William Cheng: You know, to, to, to, to explain why, you know why, when your laptop, you lose power, you don't have to worry about it you reboot it and then you know the system, you know, 222 00:29:13.050 --> 00:29:23.490 William Cheng: Everything is still going to be there. Okay, just about 15 years ago. Whenever you lose power, you know, if you're using Windows machine with window vision come up, you will say checking file system. Please wait 223 00:29:24.150 --> 00:29:36.900 William Cheng: So that you use your finger. Hopefully you don't lose anything. Sometimes you will lose an entire this. Yeah, so, so we're going to talk about some of the techniques that are used by the modern operating system. Right. So some of the stuff is going to be go beyond other sixth edition Unix 224 00:29:39.150 --> 00:29:43.800 William Cheng: Alright, so we're going to assume that you know this isn't me going to use these using a buffer cache, right. So, 225 00:29:44.010 --> 00:29:51.060 William Cheng: The bazooka you are you're if you're using right through. Well then, you know, the only data blocks that can be wrong will be the one that you're actually writing to the test right now. 226 00:29:51.360 --> 00:29:57.360 William Cheng: Yeah, so when you get a crash when you're trying to transfer data to this. Well, they are you going to end up with the inconsistency, because the data inside your buffer cache. 227 00:29:57.600 --> 00:30:02.700 William Cheng: Is not the same as it is and if you just lose power right at that time. Well, they don't lose some data. 228 00:30:03.270 --> 00:30:07.320 William Cheng: Okay, so, so if you're using right right through your performance is going to be terrible. 229 00:30:07.530 --> 00:30:14.040 William Cheng: But at least you have assumes that be consistent. So, but again that's not very, very interesting. So what we're gonna do is we're going to, we're going to assume that you're using right back. 230 00:30:14.490 --> 00:30:18.120 William Cheng: You guys will get the way you do right back is that every time we perform a racism call 231 00:30:18.300 --> 00:30:24.060 William Cheng: We're going to dirty these data blocks instead of buffer cache. And then right is going to return as if the data has gone to the desk. 232 00:30:24.240 --> 00:30:31.230 William Cheng: Okay, and now the current. The Colonel is going to run a disc update tasks that this update test. What it will do is that it will go through the buffer cache. 233 00:30:31.650 --> 00:30:37.770 William Cheng: You know, look for all these data blocks that are dirty and then it will clean them when it has it clean them is that they will write them out to this one block at a time. 234 00:30:38.370 --> 00:30:40.890 William Cheng: Okay, why does it right the data out of this one pocket of hot 235 00:30:41.100 --> 00:30:50.250 William Cheng: Well, because this is a sequential device. So therefore, there's no way for you to actually speed it up, you have to do one at a time. So this is really, really slow slow process. And during this time, you can actually have a crash. 236 00:30:50.610 --> 00:31:01.860 William Cheng: Okay, so if you only right to have these data blog so we don't do this right, what about the other two. They didn't get onto the desk. Why not case it these two data blocks over here belong to directory, I note or are you going to be in big trouble. 237 00:31:02.910 --> 00:31:08.670 William Cheng: Okay, so how come that this update task over here doesn't know that these data blocks over here are more important than the other data box right 238 00:31:08.850 --> 00:31:17.130 William Cheng: Because it this update test is a very, very low level function inside of Colonel all it does is to go through the buffer cache. Look at all these data blocks over here and then write it out to the 239 00:31:17.580 --> 00:31:21.240 William Cheng: To the desk. Okay, so therefore it doesn't really know that, oh, this is a directory 240 00:31:21.810 --> 00:31:28.560 William Cheng: This, this part, you don't know directory that this block we don't belong to. So far, so therefore updating the directory is more important. 241 00:31:29.010 --> 00:31:37.230 William Cheng: Okay, but what if all these forget about. So we have a directory. Right. So in that case, again, if you update to have them on to the days and then you know you missed the other two next time we, you know, 242 00:31:37.560 --> 00:31:40.950 William Cheng: If you get a crash. Are you still going to get up with a big problem is like a file system. 243 00:31:41.940 --> 00:31:52.320 William Cheng: Alright so so so anyways so so this particular you know that that's what we're going to assume it's what you have is it optimization. And now we're going to see what you can do to improve the situation. Yeah. 244 00:31:54.120 --> 00:32:00.150 William Cheng: All right. So, in the event of crash right bad things that happen. So let's take a look at the example you know what are things going to have them. 245 00:32:00.450 --> 00:32:05.070 William Cheng: And also why, you know what, why, why do we enable the inconsistent analysis that right 246 00:32:05.310 --> 00:32:10.380 William Cheng: The main reason we get to we have inconsistent with the fastest them. Is that because there's really no way to lock the desk. 247 00:32:10.620 --> 00:32:14.340 William Cheng: There. So for example, if we start with the previous version of the file system, it looks like this. 248 00:32:14.580 --> 00:32:24.900 William Cheng: And now we change to data blocks inside of houses and by changing that inside the buffer cache there. So what if we update all that data by writing it down to the days in the end that this will look like this. 249 00:32:25.950 --> 00:32:28.920 William Cheng: Okay, so again we started out with this day right where you know you 250 00:32:29.100 --> 00:32:38.190 William Cheng: You have this block over here that point to something. And now we're going to modify this this blog. We also got to create this blog and now we're going to write these to this blog simultaneously to the desk. 251 00:32:38.460 --> 00:32:48.480 William Cheng: If we can do that everything will be fine. Right. The final step over here will be will be this one will be the final stay inside of our system. So if you go from here to here in one atomic operation then there will be no problem. 252 00:32:49.500 --> 00:32:51.780 William Cheng: OK, so the problem is how do you make the disk atomic 253 00:32:52.590 --> 00:32:57.750 William Cheng: What's impossible to make this atomic right you can say, I want to perform a lock on it is I tell you this, not to crash. 254 00:32:57.930 --> 00:33:04.380 William Cheng: And then I'm going to write one blog, followed by the other blog and and then a discount crash in the middle, because if you lose power. There's really nothing you can do. 255 00:33:04.800 --> 00:33:12.450 William Cheng: But we all want some people say, what, what did you buy a battery backup, right, if you buy a battery backup eventually run out of battery. When the battery dies exactly the same situation. 256 00:33:12.870 --> 00:33:19.440 William Cheng: Okay, so no matter what you do, you can really, you really can I tell the file system to say don't crash until I'm done. And there's really no way to do that. 257 00:33:19.950 --> 00:33:28.770 William Cheng: Yeah. Alright. So. So how to go from one to step one over here to step three over here in atomic operation. So why would we do that with memory. 258 00:33:29.670 --> 00:33:32.970 William Cheng: Right, because when you lose power you wipe out everything in memory. You don't really care anymore. 259 00:33:33.840 --> 00:33:39.570 William Cheng: Okay, so therefore the memory system is very different from this system right whether this system over here. They actually they try to remember things 260 00:33:39.810 --> 00:33:44.040 William Cheng: Okay. So, therefore, you got to make sure that you can actually implement this this operation automatically. Yeah. 261 00:33:45.000 --> 00:33:55.500 William Cheng: All right, so how do you do it right, so, so one way to do it is that you can actually write these to this bar, right, so, so, so, so again, you start with this this law. What you're doing over here is that you need to write to data blogs under the desk. 262 00:33:56.280 --> 00:34:00.210 William Cheng: Then if you can only write one this block at a time. What choices do you have 263 00:34:01.290 --> 00:34:06.570 William Cheng: Well, then you can either what right this one. First of all, by this one or you can write this one followed by the, by by by the other one. 264 00:34:06.810 --> 00:34:12.180 William Cheng: Either this update hat doesn't really know any better. Well done. There's a 5050 chance you will write a one way versus the other way. 265 00:34:12.750 --> 00:34:23.130 William Cheng: So let's say that this update tasks, write it this way for us. So what it will do is that it will write a new book on to the disperse. Okay. And then you will read the old block right if nothing happened in the middle over here. They then in this case. 266 00:34:23.670 --> 00:34:31.950 William Cheng: You'll go to a new state that, but what if you get a crash right after you write the first log on to this and now you get a crash. And then when you reboot the system. What are you going to say 267 00:34:32.250 --> 00:34:39.240 William Cheng: What are you going to see this file system because this one over here that you know this one over here. There's no way for you to find this new block in the file system hierarchy. 268 00:34:39.960 --> 00:34:49.590 William Cheng: In order for you to find this new blog in the file system hierarchy. You need this point over here to point to it right so if you only update this this blog and the other one is still the statements I did this. 269 00:34:49.830 --> 00:34:52.110 William Cheng: This way when you reboot the system, you're gonna go back to the Allstate 270 00:34:53.010 --> 00:34:54.240 William Cheng: Okay, so this is actually good. 271 00:34:54.540 --> 00:35:02.820 William Cheng: Man, because, you know, even though you might lose new data, but I mentioned before, there's really no way for you to make sure that I mean it's going to be really expensive for you to make sure that that no data is ever lost 272 00:35:03.030 --> 00:35:08.640 William Cheng: In this case, we're going to lose a little bit of data that we just worked on. Okay, but at least the file system over here will be an interesting 273 00:35:09.000 --> 00:35:16.770 William Cheng: Right, because the previous slide over here. Everything is still good guys so, so if this log on to this, and the other one didn't. And if we get a crash. Everything will be fine. 274 00:35:17.280 --> 00:35:26.280 William Cheng: That and also there's no crash that will be Adobe best because we go to a new state. Okay, so they all say, or the new state both states are good. We don't have nothing in between. Yeah. 275 00:35:27.150 --> 00:35:34.920 William Cheng: All right. What if you're unlucky right that this update has somehow pick the other one around to the disperse. So again, you know, this is what sitting on this over here. 276 00:35:35.940 --> 00:35:44.760 William Cheng: Okay in memory over here. They are not on disk over here. But now what you decided to do is to write the first block over here onto the onto the disperse that so in that case. 277 00:35:44.940 --> 00:35:56.130 William Cheng: This will go on to this right. The second the new block hasn't gotten to this yet, but a new blog over here you have run your, you know, File System algorithm you allocate this blogs dollar free blog and now you put it inside your this map. 278 00:35:56.430 --> 00:36:05.370 William Cheng: So you can find this block. And now if you get a crash. This is on the desk and this one is not on a desk. In this case, what you will do is it will go. You got a crash and next time when you reboot. 279 00:36:05.910 --> 00:36:16.560 William Cheng: You know, the system. What do you see, while you see that this one over here pointed this block, right. So, therefore, and then this blog over here that's still have the old data on it. Now, what is the old data, the old data is garbage. 280 00:36:17.250 --> 00:36:24.120 William Cheng: Okay, so therefore you look like that. So if you're able to foster them over here, this data block over here, you're, you know, this pointer is a valid point or because 281 00:36:24.720 --> 00:36:33.000 William Cheng: At this point, upon it just, it's just a block number right so this block number will point the right thing because this data is good. And now the data points of view is going to be garbage. 282 00:36:34.350 --> 00:36:41.790 William Cheng: OK. So again, if this one is a directory file. While in this case you point to that you're going to think that your directory went crazy, you can end up with all kinds of crazy data. 283 00:36:42.240 --> 00:36:50.610 William Cheng: Okay, if it's a regular file while then in this case we will do some data, but it's a directory file, you're gonna end up with a lot of directory entry. They have no idea what they are. So in this case, this might be really bad. 284 00:36:51.960 --> 00:37:02.370 William Cheng: Okay, so. So in this case, you know, basically work depending on luck, whether that this update tasks, pick the right one, or pick the wrong way but pick the right one, everything's okay and repeat the wrong one. If it turns out to a directory file. 285 00:37:02.730 --> 00:37:12.810 William Cheng: So in that case really bad things can happen. Okay, so therefore we need to sort of find a way to make sure that the good thing always happen and not count on luck because you should never, you know, depend on luck. Yeah. 286 00:37:14.550 --> 00:37:24.450 William Cheng: All right, so, so, so I guess that was kind of silly example a more realistic example is for all I know is system PA system. Right. So, for example, the I know look like this. Right. Yeah, so 287 00:37:24.720 --> 00:37:36.330 William Cheng: It has a bunch of stuff at the top at the bottom over here. They're 13 pointers or again there are 13 pointers over here, some point directly to it. This blah some point or indirect this bar. So let's say them that might fall over here is exactly 10 kilobytes long 288 00:37:36.750 --> 00:37:42.030 William Cheng: Okay, so therefore the first 10 partners over here appointed direct but the last three pointers over here are now. 289 00:37:42.420 --> 00:37:46.140 William Cheng: Okay, now I tried to make the file, a little bigger. So now I have to use the indirect pointer. 290 00:37:46.560 --> 00:37:51.360 William Cheng: Okay. So, therefore, that the 10th point over here the you know the 11th pointer over here that 291 00:37:51.660 --> 00:37:55.830 William Cheng: I need to use the 11th pointer. So what I would do is I will allocate to this block. 292 00:37:56.040 --> 00:38:02.880 William Cheng: One to store the new data and the other one is stored in direct pointer as remember the indirect point over here. There's 256 pointer. 293 00:38:03.090 --> 00:38:14.940 William Cheng: One of them will point to the new one and all the other mobile point, you know, and then the 11th pointer inside my mind this Bible be here will pointed this indirect, blah, blah. Okay. So in this case, which block or dirty. 294 00:38:15.990 --> 00:38:25.680 William Cheng: That this block is dirty. This book is dirty. Also, this book is dirty right the blog that contains the, the, the, I know is also dirty. Right. So I'm going to call these two blocks x y AMP z. 295 00:38:26.310 --> 00:38:33.600 William Cheng: So now if we let this update has to to to to to randomly pick these this bottle right to this, then how many different combinations are there. 296 00:38:34.290 --> 00:38:43.200 William Cheng: While since there are three display. There's going to be, you know, to, to the three there's gonna be a different combination right x, y, z, followed by X z y. Bye bye. But 297 00:38:44.640 --> 00:38:55.230 William Cheng: Sorry, there's actually three factorial. So there are six different choices, x, y, z, x, y, y, z, y, z, x, y, z, x, y, and z y x 298 00:38:56.190 --> 00:39:05.520 William Cheng: Okay, which combination is good. We always have an agent bad, right. So again, we need to look at all these cases, to find out which one actually works and what doesn't work. We don't really want to count on that we want to have an algorithm. 299 00:39:05.730 --> 00:39:12.390 William Cheng: You know, to, to, to, to, to make sure that that this update has will you know will write all these data blocks out there in the right sequence. 300 00:39:12.870 --> 00:39:14.700 William Cheng: So how do we actually tell that this update has that 301 00:39:14.970 --> 00:39:24.060 William Cheng: Well, so in that case what we need to do is that we need to build a data structure to say that these three data blog actually actually related. So this way we can pass that information to the lower level. 302 00:39:24.270 --> 00:39:36.030 William Cheng: You know, fastest the functionality so that when the this other tasks decide to write all these three data shots on the street, this bug onto the onto this it will look at this data showed you say, oh, I have to read in this way. Otherwise, I'm going to be in trouble. 303 00:39:37.050 --> 00:39:44.670 William Cheng: Okay, so therefore, again, you know, so the you know the forces of is gonna get a little more complicated because you need to pass that information to the desert a task. Yeah. 304 00:39:46.230 --> 00:39:56.010 William Cheng: Alright, so in this example if the sequence, I can come up with over here is that you should write it down to this x fault. Yes. Okay. If you let this update has choose right then. What it will do is it can be random. 305 00:39:56.550 --> 00:40:06.330 William Cheng: The order that it uses X follow my wife. I love ice. So in that case, you will write x to the desk and I followed by why to the desk and now before you write in the z. They know how to do this. In this case, you will have a crash. 306 00:40:06.720 --> 00:40:16.980 William Cheng: OK. So again, this is like the back case that I talked about when you reboot the fastest. And what do you see, you see the X go into Newsday why going to Newsday and now this point over here again will point to garbage data. 307 00:40:18.090 --> 00:40:24.510 William Cheng: Okay, so. Okay, this one is a directory file, you're going to see the director of over here as many, many entry and some of them I look like complete garbage. 308 00:40:24.900 --> 00:40:30.300 William Cheng: OK. So, again, that will be really bad if this is a regular file then in this case is the mountain be too bad, then 309 00:40:30.960 --> 00:40:41.670 William Cheng: What if the the disability has pick a different order Z fall by X followed by why there's this guy Z go to this this verse, and then you get a crash before excellent gone through this and now when you reboot of 310 00:40:41.940 --> 00:40:49.830 William Cheng: Them were you gonna say you got to see the original configuration over here which is going to be a consistent state, you know, you know, consensus and stay in 311 00:40:50.760 --> 00:40:54.870 William Cheng: Which is the previous day of the file system, right, because see over here. This one gone to the desk. 312 00:40:55.050 --> 00:41:06.180 William Cheng: You won't be able to find this this Z block. If you look, if you go to your file system hierarchy is that. Where is he, nobody can find the anymore. Okay. And if you go to the free list, you can't find Z. Anyway, so this blog over here will be gone. 313 00:41:06.960 --> 00:41:14.790 William Cheng: Okay. Similarly, the blog why over here will be gone. They will be allocated from the free list, but you can't find it in the file system hierarchy, because you know this this changes hasn't 314 00:41:16.050 --> 00:41:23.040 William Cheng: It hasn't made it out to the desk. Okay. So in this case, you move to this blocks over here but you go to a good state, which is the previous owner of the file system. 315 00:41:24.030 --> 00:41:32.850 William Cheng: Okay, if this happened over and over again your disability because smaller, smaller, smaller because every time we get a crash, you're going to lose a few this blah, but again, at least you'll fascism is inconsistency. 316 00:41:33.540 --> 00:41:44.280 William Cheng: Okay. So, depends on you know the other, the decision was made by this have a cat and exactly when you crash. Sometimes you're going to be okay. Sometimes going to be in really bad shape right so clearly that's really not a good way to go. Yeah. 317 00:41:45.960 --> 00:41:53.400 William Cheng: All right. How do you cope rice. Oh no, of course you know you founded this not the crash. It's really hard to do unless you you want to spend a lot of money. 318 00:41:53.940 --> 00:42:03.180 William Cheng: On you know banking system where they can make sure that I did you know that this isn't doesn't crash. Again, very, very expensive to do that. So again, you know, there are I think their classes. 319 00:42:03.600 --> 00:42:09.210 William Cheng: You know, a USC that talk about, you know, building a reliable, you know, system. So in that case, you can actually, you know, see. 320 00:42:09.780 --> 00:42:14.220 William Cheng: If you're interested in those kind of classes you can take that class, and they will tell you how to actually do that. Yeah. 321 00:42:14.910 --> 00:42:25.110 William Cheng: The editor. The other thing, who has to two different approaches. One is performing a multi disk update in an order. So, so therefore you cannot build a data structure, you can send the data shows you to this update as 322 00:42:25.320 --> 00:42:31.050 William Cheng: And what it will do is that they will write it out to the desk in a consistent in a consistency preserving matter. 323 00:42:31.950 --> 00:42:37.140 William Cheng: Okay, so this way no matter when you have a crash and when you reboot the system will be inconsistent state now. 324 00:42:37.620 --> 00:42:44.760 William Cheng: The other way to do is run transactions as what a transaction. So I'm going to talk about that next time transaction is a concept from a database of them. 325 00:42:45.150 --> 00:42:49.470 William Cheng: So what do we do that you will borrow this this concept on a daily basis then and use the inside of our system. 326 00:42:50.010 --> 00:42:56.280 William Cheng: Okay me a long time ago, people didn't really think that was possible because databases them is big and they are slow. It's okay to run transactions. 327 00:42:56.760 --> 00:42:59.580 William Cheng: If you try to use it in the fastest the fastest and have good performance. 328 00:43:00.240 --> 00:43:06.300 William Cheng: Okay but but but later on people sort of figure out how to do it. So we're going to sort also talk about how to actually do that in a regular file system. 329 00:43:07.200 --> 00:43:10.890 William Cheng: Alright, so let's talk about the first you know approaches over here, a consistent approach. 330 00:43:11.370 --> 00:43:15.540 William Cheng: Here's sort of a picture of what it looks like on the horizontal axis over here is consistency. 331 00:43:15.750 --> 00:43:25.230 William Cheng: To the right things. I'm welcome Stan to the left over here, there are less consistent. So you can actually see that the fastest to segment. This is the boss is then they are you know they're just as bad as 332 00:43:25.860 --> 00:43:35.910 William Cheng: The just as bad as each other, each other. Okay. The vertical axis over years performance. You can see that the fast, fast, fast. Our system is a lot perform a lot better than this is the buffer system they're 333 00:43:36.510 --> 00:43:45.240 William Cheng: Out of the consistent actions over here. The first solution. We're going to look at is look at something called a soft update I saw update is to use why I mentioned before using a, you know, 334 00:43:45.870 --> 00:43:56.790 William Cheng: Using a consistency preserving approach get as it turns out, even, even in the best case for this for the self update the kind of consistent that you get is really not what we want. 335 00:43:57.150 --> 00:44:01.170 William Cheng: Okay, there will be some cases still, you know, things can be bad. We're going to talk about those cases. 336 00:44:01.740 --> 00:44:07.860 William Cheng: So, India, you know the right way to do it is to follow the idea from the databases that and running transaction. So those of you 337 00:44:08.460 --> 00:44:15.510 William Cheng: Who know about databases, the database system. It's always you know your database always in a consistent day because all they run our transactions. 338 00:44:15.810 --> 00:44:20.550 William Cheng: Okay, so by using transaction on the file system. You'll file system will always be in a consistent state. 339 00:44:20.970 --> 00:44:30.690 William Cheng: Method. Again, there are cases where you just tried to save a file and you lose power rather that time. Well, we need to make sure that you go to the previous day, which is a which is a consistent state instead of our system. 340 00:44:31.050 --> 00:44:35.520 William Cheng: Okay, there's no one for you. Do we do not lose any data, unless you're willing to spend a lot of money. Right. 341 00:44:36.630 --> 00:44:44.760 William Cheng: Alright, so first today itself update and next time we're going to see journaling. Also, there's something called shadow paging their face. They're based on exactly the same idea that 342 00:44:45.420 --> 00:44:50.730 William Cheng: All right, so the main idea here is that when we write it down to the desk. We need to have a order preserving 343 00:44:51.030 --> 00:44:57.780 William Cheng: You know what we do right data. So basically what and call all that information into a data structure. I'm gonna pass it to this up a task and this 344 00:44:58.020 --> 00:45:09.060 William Cheng: Task will look at that data should just say, oh, we're gonna write this data first of the desk and if all of the data. So therefore, no matter when you crash the data and the data on this will always be consistent that 345 00:45:10.530 --> 00:45:24.900 William Cheng: Alright so here. There's also a sort of a comment over here it says that, you know, there can be enough innocuous inconsistency and that will be consider okay right so like the example that we saw the blog y AMP Z. They never gone to this guy. When we reboot the system. 346 00:45:26.010 --> 00:45:28.950 William Cheng: They go back to the previous day, and y and z. Our last 347 00:45:29.520 --> 00:45:40.500 William Cheng: Okay, so in this case the file system is really not consistent. Right. But why is it not consistent right because y AMP z and they looked in the long no longer belong to the free list and also they no longer belong to the 348 00:45:41.340 --> 00:45:54.540 William Cheng: People they also no longer belong to the fastest of hierarchy present this case, you did. I might be lost. Okay. And this will get smaller and smaller and smaller, but again, again, at least the rest of the data over here inside of our system hierarchy there continue to be consistent. 349 00:45:55.590 --> 00:46:01.740 William Cheng: Okay. So, therefore, this guy will allow these kind of innocuous inconsistency. You know, you're using software. Yeah. 350 00:46:02.550 --> 00:46:08.190 William Cheng: Alright, so the synchronous right you know to this can be really, really. So, so therefore we're going to build these data structures described the dependency. 351 00:46:08.400 --> 00:46:18.360 William Cheng: Pass it to that this up up a task and that this athlete has will use those dependency data structure. Now, you know, run analysis on it, trying to figure out what is the best way to out to update a desk. Yeah. 352 00:46:21.330 --> 00:46:25.980 William Cheng: All right, so. So again, the idea here is that you if you have the picture. Yeah, this is the before picture. And this is the 353 00:46:26.820 --> 00:46:30.750 William Cheng: This is the after picture in the after picture all the green data over here are new. 354 00:46:30.990 --> 00:46:37.290 William Cheng: Guys. So in this case, you know, the old data of your point is our own note over here. And now we need to point to the new note right so this case. 355 00:46:37.500 --> 00:46:45.450 William Cheng: This data block changes change. And also, we get a new data, blah, so therefore there's a dependency between this blog in this blog, the dependency, you can actually, you know, 356 00:46:46.830 --> 00:46:51.990 William Cheng: I don't know you have taken the CSI 70 or they talk about some kind of a dependency graph you can use that to to 357 00:46:52.530 --> 00:46:59.790 William Cheng: Describe dependencies. This data depend on this this data, so therefore you draw a graph like this. The point of arrow from, you know, from 358 00:47:00.480 --> 00:47:08.010 William Cheng: From the top data over here to to to to to be the new data. You can also see this one as a pointer that point if on the first block to the second one. 359 00:47:08.250 --> 00:47:18.750 William Cheng: So, therefore, that also show you what the dependencies are right. So by passing this this information to the disability task, it will say that this data depend on this one. So what he would do is he would write the data to the this that doesn't depend on anything. 360 00:47:19.350 --> 00:47:28.260 William Cheng: Okay, this blog over here is that the end of the dependency less so therefore it will write this data out to the dispersed and in case if you have a crash, you go back to the previous day. 361 00:47:28.890 --> 00:47:36.240 William Cheng: Okay. So, therefore, what it will do is that I will organize it, this right over here to write the one that has the least amount of dependency to the disperse. And then he will write it 362 00:47:36.720 --> 00:47:45.180 William Cheng: Despite that depends on the other side of this one. Okay, so this way you know in the actual versus what he will do is that it will actually write this 363 00:47:45.630 --> 00:47:53.640 William Cheng: So against the different way to do it right. One is that you you pass it the inner structure to the this update as they use data structure to do it. The other one is that you can actually use right through. 364 00:47:54.000 --> 00:48:03.750 William Cheng: Okay, so if I need to write to this blog over here into the file system. Actually, I want to can do is that I can write that data that the other one depends on I'm going to write the data to the data using right through. 365 00:48:04.830 --> 00:48:13.140 William Cheng: Okay, so this way. I have to wait for it. Right, so get your file system is going to be a little slow. So what you will do that. You write a data into this and now you wait for it to get that will you finish writing this data out to this. 366 00:48:13.350 --> 00:48:17.220 William Cheng: Then you can release the first block over here into this update tasks. 367 00:48:17.880 --> 00:48:24.180 William Cheng: There. So in this example over here inside the actual file system, you will write this one synchronously to the desk by using right through. 368 00:48:24.480 --> 00:48:36.030 William Cheng: Okay. And when this operation is done, what you do is that you release this this block to the this update has and this one will be, you know, you don't even need any data structure. You say that this of it has. You can read this data to this anytime you want. 369 00:48:37.260 --> 00:48:44.400 William Cheng: Okay, so if you have a data dependency, like the one that we saw before X depends on why depends on z over here. So in this case, 370 00:48:44.610 --> 00:48:55.050 William Cheng: It's going to take even longer time. So what do you need to do. So you need to write z to the dis using right through. And then you write why to this to using right through. And then you release X over here to the discipline has 371 00:48:56.130 --> 00:49:02.910 William Cheng: Okay. So that would be one way to do it. And the other one is that you pass the data structure to describe the dependency a teller decided to say, hey, why don't you, you know, 372 00:49:03.120 --> 00:49:08.430 William Cheng: analyzing the data structure and try to sort of figure out that you use right Z to this first, followed by why followed by x 373 00:49:09.150 --> 00:49:19.830 William Cheng: OK, so again in the implementation people do either way if you don't like the idea of, you know, waiting for, you know, a right to to finish, you know, using synchronous right you can actually use data structures that will get a little more 374 00:49:21.660 --> 00:49:26.010 William Cheng: They'll get a little more complicated us and also you need to make your this update 375 00:49:26.940 --> 00:49:36.720 William Cheng: The update tasks that Colonel a little smart because it needs to run some algorithm in order for them to analyze dependencies. Yeah, I mean, this one is a very, very simple example, they only have three despite your writing. 376 00:49:36.960 --> 00:49:39.570 William Cheng: What if you have you know 500 this blog. You're trying to update 377 00:49:40.350 --> 00:49:47.310 William Cheng: Right, you have 530 blocks. OK, so now that is why I started for free. Which one need to go to the disperse. Well, in that case is going to be a little more challenging. 378 00:49:48.180 --> 00:49:58.380 William Cheng: Okay, so, so again, yeah. Either way, you can you can do either way. Yeah. All right. If crash happened before the second step was done over here is over here, you write the first one synchronous 379 00:49:58.890 --> 00:50:05.070 William Cheng: To the desk and now you release the first one to get this update tasks and you know you're gonna get a crash before this one gone to the desk. 380 00:50:05.550 --> 00:50:13.620 William Cheng: Okay, so in this case over here, this know you know this particular this blog was still have the old data and we know the old data is actually even consistency. So pointed their own 381 00:50:13.800 --> 00:50:20.550 William Cheng: Next block. So everything is fine. Okay. What happened to this particular spot. Well, this particular, this one will be lost because nobody's pointing to it. 382 00:50:21.120 --> 00:50:35.880 William Cheng: Okay. So, therefore, what you need to do is that, you know, so, so when you reboot us to sell this point of view or still pointed to data because the data hasn't been modified so therefore this one needs to be reclaimed then. So over here, you know, after the crash over here. 383 00:50:37.230 --> 00:50:43.560 William Cheng: So some system, what it will do is that they will actually have sort of a thread inside of kernel called the disc scavenger thread. 384 00:50:44.010 --> 00:50:48.870 William Cheng: what he would do is that it will go to the Go, go to this over here and try to look for data off like this. 385 00:50:49.320 --> 00:50:55.380 William Cheng: Okay, and what it will do is that it basically doing garbage collection. Right, so, so, so, so, so how do you do that right but the basic idea over here is that 386 00:50:55.950 --> 00:51:03.930 William Cheng: You know, you go through every block on your desk right for every blog you ask the question, is this blog inside of our system hierarchy or is it inside of free list. 387 00:51:04.470 --> 00:51:09.180 William Cheng: Okay. If the answer is no. In both cases, well then this will be a dispute that look like this. 388 00:51:10.110 --> 00:51:17.190 William Cheng: That so so the job of that this a scavenger is to look for block like this and then what they will do is that they will add it to the file system. 389 00:51:17.760 --> 00:51:23.160 William Cheng: Okay, so what it will do is that, you know, in some of the system like this. There's a directory in the root directory called Lost and Found 390 00:51:23.520 --> 00:51:30.570 William Cheng: Yeah. So what are, what does that when I found out this blog that they will add this this block create a file and then put it into the size lost and found 391 00:51:31.050 --> 00:51:40.560 William Cheng: Okay, so if you go to the size larger and found, you're gonna find a bunch of this blog. They are all one block in size over here. And then what it would do that is that they also need to figure out who actually own this particular this block. 392 00:51:40.800 --> 00:51:47.400 William Cheng: If they can figure out who actually understood this law, they will send an email to the user over here to say hey you know there's a system crash. 393 00:51:47.580 --> 00:51:52.950 William Cheng: I found one of your files over here and I put in a session last info, followed by some kind of a random file name. 394 00:51:53.250 --> 00:52:03.180 William Cheng: Okay, so go take a look at that file to see if you want to save data, you know, the phone from that file. I mean, clearly, if this one is a text file on this case, you know, you can actually can recover for most of the data that you have lost 395 00:52:03.630 --> 00:52:06.360 William Cheng: But it turns out to be a binary file was in that case will be useless. 396 00:52:06.720 --> 00:52:13.770 William Cheng: Okay, but in the good old days over here. This was actually very useful you're editing file you editing a dicey file and all of a sudden you get it, you're good at this crash. 397 00:52:14.160 --> 00:52:24.990 William Cheng: The organism send you an email and you go to slash Lost and Found and then you find you, you, you, you found that I see father you editing and then you copy them them and then paste them into your code, everything will be okay. 398 00:52:26.160 --> 00:52:33.480 William Cheng: All right. Today's is that we don't see that anymore. So, so, so, so, you know, for the Linux file system Linux file system has something called 399 00:52:36.960 --> 00:52:41.310 William Cheng: St to are using this particular scheme so it also the fastball system. 400 00:52:42.120 --> 00:52:45.510 William Cheng: But you mentioned a fast also them. That's an improvement over system of our system in 401 00:52:45.840 --> 00:52:53.190 William Cheng: Our system, are they actually they use software updates get. So, therefore, if you look at the fast offices them. They have a slash also found directory 402 00:52:53.400 --> 00:53:00.630 William Cheng: Also inside the ST to process them. They have a slash Lost and Found directory. But if you're using EFT three or GST for you see that this one is gone. 403 00:53:01.350 --> 00:53:08.430 William Cheng: Okay, because he has to USD etc, for they are using transaction based recovery techniques. So therefore, they don't have slash slash and patent. 404 00:53:09.570 --> 00:53:14.940 William Cheng: So by looking at this directory you know that whether you know the the file system is using software update or they're not. Yeah. 405 00:53:16.650 --> 00:53:25.800 William Cheng: Alright, so this is self update. So, so let's go back to our example over here, right, if you have the picture that x depends on why it depends on z. So, in this case, when you try to write to this. 406 00:53:26.220 --> 00:53:33.060 William Cheng: Well, after you analyze the dependency right at the X depends on why it depends on z, which one is the display with the least amount of dependency. 407 00:53:33.240 --> 00:53:41.280 William Cheng: Well, that's going to be Z. Right. So, z, go to the dispersed. So, therefore I'm going to create a data structure to say z go into this first, followed by waterfall by X are so different. It's going to be the order 408 00:53:42.090 --> 00:53:49.920 William Cheng: That. So in this case, you know, there are four different times that you can crash, you can add your crush right here or right between them over here. So if you crash too early. 409 00:53:50.370 --> 00:53:53.280 William Cheng: Oh, guys. In this case, when you reboot of us is there. What is going to look like why 410 00:53:53.910 --> 00:54:02.820 William Cheng: Why is he an ex. None of them go to the Go go go go go into the actual process them. So in our case we reboot know x over here will be in the state. So in this case, again, you gotta 411 00:54:03.240 --> 00:54:08.580 William Cheng: You gotta, you gotta go back to the previous state of this particular file. So this is a directory file again. Everything's gonna be okay. 412 00:54:09.030 --> 00:54:13.470 William Cheng: Yeah. What if you crash in between over here, right. So in this case, Z went on to the desk. 413 00:54:13.800 --> 00:54:17.430 William Cheng: Okay, so you mentioned before, we reboot the system over here. You're going to go back to the previous day. 414 00:54:17.580 --> 00:54:28.380 William Cheng: In this case, you're going to end up with innocuous inconsistency and that will be considered acceptable in this case. So again, that this scavenger you know thread inside the colonel will find this one at this loss about you will get an email. Yeah. 415 00:54:29.160 --> 00:54:38.730 William Cheng: Well, what if you crash between y x over here. So yeah, Z went on to this wild. Welcome to the desk, but excellent gone through this. So when you reboot the process them again, you're going to see the host, Jay. 416 00:54:38.910 --> 00:54:48.000 William Cheng: Z and white over here, they, they're going to disappear from the fastest and then from the free list. So in this case, it's considered innocuous inconsistency. And then, again, this is the first day. 417 00:54:48.480 --> 00:54:58.140 William Cheng: What did you get crushed afterwards. So this guy's when you reboot it faster than me going to go into the new state. So in this case, no matter when you crash. You can crash and all this different time you'll fosters that will go to the state. 418 00:54:59.370 --> 00:55:08.280 William Cheng: Okay, so using software update. You know, if you can figure out what is the right order over here to write to this, no matter when you when you crash either go back to the last day or you go back to the new state. 419 00:55:09.420 --> 00:55:21.240 William Cheng: Guys. So, therefore, you know, this is the other technique actually works pretty well. Yeah, let's take a look at an example guys over here would try to create a new file with the new data block right so you have to start with a direct 420 00:55:22.920 --> 00:55:29.550 William Cheng: You know, start with a director. I know it over here. When you create a new file, you're going to create a new. I know you also can create one display to store the data. 421 00:55:29.640 --> 00:55:36.120 William Cheng: So this will be the fall. I know. And this will be the data. You also need to add this file into the directory. So therefore inside the directory 422 00:55:36.360 --> 00:55:44.670 William Cheng: Data search over here, we need to add an entry inside the you know the the other the inside of directory file, right. So again, the directory file is a 423 00:55:46.290 --> 00:55:52.830 William Cheng: It's an array of directory entries. Right. So in this case, we need to add a new directory entries over here to point to a new file. 424 00:55:53.010 --> 00:55:58.980 William Cheng: So in this case, again, we're going to end up with three data blocks over here, x is going to be the new directory data for a year for 425 00:55:59.220 --> 00:56:03.660 William Cheng: For the direct without this file, and then you know this. I know here's going to be new at this data will be new. 426 00:56:04.020 --> 00:56:13.320 William Cheng: And I said again X depends on why why depends on z and then again we know what to do now. So by receiving this dependency, you know, the, the, this update as we know what to do. Yeah. 427 00:56:15.030 --> 00:56:24.090 William Cheng: All right, let's take a look at another example. This example here, you try to move a file, you're doing Colonel to you know exactly how to move a file right moving a file is on link, followed by a link 428 00:56:24.510 --> 00:56:29.820 William Cheng: There. So therefore, we're going to start moving, moving one file from direct directly x prime directive y prime 429 00:56:30.420 --> 00:56:41.970 William Cheng: Yeah. So in that case, we need to move on from this one to the director and the director. So how do we do it right. We unlinked this link over here. And then we create a link to this particular file. So this is the starting state. And this one is the 430 00:56:42.660 --> 00:56:51.090 William Cheng: final state. Okay. So in this case, which this block or dirty. Okay. So in this case, x is x prime is dirty over here, y prime is dirty over here. 431 00:56:51.480 --> 00:57:03.420 William Cheng: So in this case, there are your sensor only two blocks or 32 factorial is equal to choose. So there was only two order that they can use the right to do this right, so the to order is going to be exploited by while we followed by x 432 00:57:05.190 --> 00:57:10.740 William Cheng: Okay, so. So in this case, these two data blocks has no dependency, right. So this is where soft update will get into trouble. 433 00:57:11.190 --> 00:57:17.970 William Cheng: There. So if there's no dependency over here. That means that you can read in any order that you want. So if you write X followed by while we are, what's going to happen. 434 00:57:18.330 --> 00:57:22.110 William Cheng: That. So if you right, absolutely. This first and then you have this crash. 435 00:57:22.350 --> 00:57:32.610 William Cheng: X has gone to the new state. So, this will be the new stack. And why is going to be in the last day or so let me clean up this picture over here. So X is going on to a new state. And why is going to go into the day. So what happened in this case you lose a file. 436 00:57:33.420 --> 00:57:36.540 William Cheng: Okay, so when you try to move a file in the middle of it, you're gonna crash, and now the files gone 437 00:57:37.230 --> 00:57:42.720 William Cheng: Okay, so this is really not good. What about the second way over here. Why you right why to this verse. And now you 438 00:57:43.620 --> 00:57:54.540 William Cheng: Know you get a crash. Right. So in this case, why go to the new state and X over here still stay in the whole state that. So in this case, you're going to end up with both x and y pointed this file. Okay, is that okay 439 00:57:55.770 --> 00:57:58.830 William Cheng: Well, you just said, you know, this is innocuous inconsistency. 440 00:57:59.070 --> 00:58:08.520 William Cheng: But maybe it's not a noxious inconsistency, because maybe the reason that the user wants to move this file from one directory to the other directory is because of protection problem. 441 00:58:08.790 --> 00:58:14.190 William Cheng: Okay, this path over here, coming from the root directory, maybe you know we perform, you know, 442 00:58:14.970 --> 00:58:27.870 William Cheng: Of the way to address a path of resolution everybody on the system can actually access this way. The reason you want to hide it into the directory is that because if you're traversing the other path this directory over here. Nobody else has access to 443 00:58:28.350 --> 00:58:38.610 William Cheng: Okay, so basically you take a file and try to hide it into private directory. But now if you crash in the middle over here. No, no, you think that this particular file now moving to a private directory for everybody else can still access the file. 444 00:58:39.720 --> 00:58:43.200 William Cheng: Is that bad one. That's really, really bad because that's a security problem. 445 00:58:44.490 --> 00:58:55.500 William Cheng: Okay, so in this case the two cases over here. One way is that you either Lucifer and the other case over here, you're going to end up being a security problem. So in this case, this is why why solve update doesn't really have a good 446 00:58:55.830 --> 00:59:00.630 William Cheng: You know, a good good good consistency, because they are these kind of cases that can happen. Yeah. 447 00:59:02.610 --> 00:59:09.330 William Cheng: Alright so saw saw saw saw saw update is sort of a technique that was implemented using this particular 448 00:59:09.900 --> 00:59:20.220 William Cheng: Kind of technique. And then what it will do is that they will actually use data structure that would describe the dependencies. Okay. And then what it, what does that entail as a dependency and then to figure out how to actually write it out to the desk. Yeah. 449 00:59:21.780 --> 00:59:28.170 William Cheng: So it's all over here. So let's say that you have you know this. They just went over here again director. I know point to a file and quantify know 450 00:59:28.500 --> 00:59:32.970 William Cheng: If you have these dependency over here, you know, how do we actually write it out to the desk. Okay. 451 00:59:33.420 --> 00:59:42.330 William Cheng: Again, if you have taken the CSI 70 you probably learned the algorithm know as topological sore political score is actually a very powerful algorithm. It can take any kind of grab that describe 452 00:59:43.020 --> 00:59:48.420 William Cheng: Describe a dependency described can be really, really large can have 1000 you know data structure in that 453 00:59:48.750 --> 01:00:00.090 William Cheng: Okay, you can add you have them, you know, point to each other, you know, do all kinds of dependency over here. And then what it would do is that it would generate a linear order when you write data out to this is not linear order all the dependency will be satisfied. 454 01:00:01.650 --> 01:00:06.450 William Cheng: Okay, so all that this of it has has to do over here is to run topology topological so work. 455 01:00:07.170 --> 01:00:15.840 William Cheng: OK, and then he will come up with the linear order. So if you write down in your water, you know, that will give you a sequence your heart right there under the desk. So this way all the dependencies will be satisfied. 456 01:00:16.560 --> 01:00:20.790 William Cheng: Okay, unfortunately. So even though you know so. So if it works. It works really well. 457 01:00:21.330 --> 01:00:27.660 William Cheng: But what if you know theological sore actually fail right so the bad news. Over here is that the particle swarm. There's no guarantee that it will actually work. 458 01:00:27.960 --> 01:00:35.910 William Cheng: Okay, if you have cycles in your data, right, you've got data shows you like this you know that point to each other. This depend on this been on this depend on this. Why, in that case. 459 01:00:37.140 --> 01:00:43.320 William Cheng: Topological so or fail, and then we'll say that, well, if you have a psycho see your so again you can describe the UC dependency graph. 460 01:00:44.130 --> 01:00:53.550 William Cheng: So again, the graph is a data structure if the graph has a cycle in them or theological sure there's no way for you to store that because there's no way to generate a linear order. So the order dependency are satisfied. 461 01:00:54.390 --> 01:01:01.530 William Cheng: Okay, so in this case couple articles or return error. And then this update has to say, why give up because I have no idea. No way to actually write the data out to this. 462 01:01:02.250 --> 01:01:10.680 William Cheng: You know so so that all the all dependency are preserved then. So, by the way, I'm in the data shows that will that we have seen before. How do you actually 463 01:01:11.220 --> 01:01:22.500 William Cheng: How do you actually end up with the cycle. Okay. So, depends on why depends on Z was a linear this okay but in reality what happened can can be that these two data blocks over here. In this example there are both I notes. 464 01:01:23.220 --> 01:01:27.990 William Cheng: Okay. The I know they're very, very small. They actually they can actually sit inside the same data, blah. 465 01:01:28.410 --> 01:01:41.790 William Cheng: Okay, so it is possible. They belong to the same data block when this case if you put names on a date, same data blog, they will look like this. Well, what I notice here, there is here. This one depend on this, this one depend on this and now you end up with the cycle in the 466 01:01:42.870 --> 01:01:48.000 William Cheng: You know the cycle. OK. So again, this is the real reason soft update is not as good as as a 467 01:01:48.510 --> 01:01:56.640 William Cheng: Transaction is because when there's a cycle in the graph, then you know that this update has is going to give up and say, I don't know what to do. Okay. 468 01:01:57.420 --> 01:02:03.120 William Cheng: All right, so, so, so, so I mean in the good old days, are they really trying to get this to work. So what it will do is that again. 469 01:02:03.690 --> 01:02:08.100 William Cheng: Would you can do is that, you know, so let's say that this is the this block that they that they 470 01:02:08.790 --> 01:02:10.290 William Cheng: That you try to update and all the data. 471 01:02:10.560 --> 01:02:21.180 William Cheng: Are inside your buffer cache, right. So what you would do is that you can actually not modify the first part of the data block over here, so they can actually store the old data inside this data blog over here. So the data, why should look like this. 472 01:02:21.390 --> 01:02:29.280 William Cheng: This is the old directory. I know. And this one depends on this one depend on this. And then what you can do is I is that you can again. Right. The state of block on to the dis synchronously. 473 01:02:29.880 --> 01:02:39.030 William Cheng: Okay, and then you can write this one into the data into the sink synchronously. And then you replace the top part of this data blog over here. And then, and then 474 01:02:39.930 --> 01:02:50.610 William Cheng: Finally, we write this new book over here, you know, onto the desk. Okay, so this way again whenever you will. This is okay and I guess if you have take some other 475 01:02:51.480 --> 01:03:01.380 William Cheng: You know, graduate class at USC, you probably have learned some other techniques, how to actually break a cycle, we have a cycle over here. What you can do that. You can actually break a cycle by, you know, so, you know, 476 01:03:01.740 --> 01:03:04.260 William Cheng: For example, you have, you have to data search over here. 477 01:03:04.740 --> 01:03:11.610 William Cheng: You want to break the cycle over here, what you do is that you duplicate this data structure somewhere else and then you break the cycle and by creating two notes over here. 478 01:03:11.970 --> 01:03:20.130 William Cheng: Okay, so again, they use the same kind of technique inside the file system over here. So that creates some some different data structure over here. So, so this way. 479 01:03:20.730 --> 01:03:22.590 William Cheng: The consistency can always be preserved. 480 01:03:22.980 --> 01:03:28.920 William Cheng: Right. So again, in this case, what you have to do is that you have to do right through. We have to do synchronous. Right. So again, instead of passes them up here. 481 01:03:29.100 --> 01:03:36.780 William Cheng: The performance is going to be suffer. You cannot release all these data structure, you know, into this up a task. So you're gonna be so you're gonna you're gonna get slow down. 482 01:03:37.140 --> 01:03:45.180 William Cheng: But still it is possible when you get more complicated cases, there might not be a way to write all these data, you don't do this or the file system is going to become really, really slow. 483 01:03:46.230 --> 01:03:54.540 William Cheng: Yeah, right. So in the end, I'll be here the you know the the software update was implementing the fastpass is then in 1994 484 01:03:55.320 --> 01:04:04.290 William Cheng: So, and also, you know, in the other implementation over here. They have this scavenger you know thread over here, they'll actually be able to reclaim all these last this blog. 485 01:04:04.770 --> 01:04:11.730 William Cheng: But pretty soon people sort of figure out how to do transaction and then I guess these days, you know, only on the very, very old system, there were students of update 486 01:04:12.180 --> 01:04:25.740 William Cheng: The newer version of the rest of them. They all they all going to be using transactions. Okay. All right. So, so this is going to be a good place to break. So next time we're going to talk about, I guess the the lecture next week, we're going to talk about running transactions. Yeah.