WEBVTT 1 00:00:01.709 --> 00:00:07.410 William Cheng: Welcome to lecture at. So now we are two days into Colonel three 2 00:00:08.069 --> 00:00:15.450 William Cheng: Colonel three is do 14 days, two weeks from today. If you have co from previous semester, don't look at them. Don't copy them best to get rid of it. 3 00:00:15.839 --> 00:00:21.630 William Cheng: And it's important that you read Colonel three FAQ and understand pretty much all the lecture material covered so far. 4 00:00:22.110 --> 00:00:30.900 William Cheng: The spec and the Phoenix document is not enough to get you to, to, to finish, Colonel. They are please also watch weeknight discussion section videos. 5 00:00:31.830 --> 00:00:37.440 William Cheng: The grading guidelines on. Well, great. So you should get familiar with it, you should do GDP assignment number three to get 6 00:00:37.770 --> 00:00:41.100 William Cheng: You know, to try to sort of figure out. And yeah, what were they trying to do is to is to 7 00:00:41.550 --> 00:00:47.640 William Cheng: Sort of force you to use the colonel three FAQ tried to find out what are the commands that you have to use in order for you to 8 00:00:48.180 --> 00:00:57.570 William Cheng: To to debug the user space program there. So again, if you if you call it hasn't gotten that far yet if you set a breakpoint. It doesn't really make sense. But again, you should sort of go easy look through it. 9 00:00:58.020 --> 00:01:02.190 William Cheng: To see what kind of stuff that you have to learn in order for you to debug Colonel to be code. Yeah. 10 00:01:02.670 --> 00:01:08.670 William Cheng: If you get stuck, you know, come to office hours me email post to the class Google group and don't wait too long. Yeah. 11 00:01:09.630 --> 00:01:16.380 William Cheng: Colonel three recommended timelines. Again I divided into five phases. Phase one should be finished by the end of today. 12 00:01:17.130 --> 00:01:27.270 William Cheng: With the GFS test running with the system PA system. The next phase is going to be very difficult, is to get a hello program to run. So I set, you know, set it for five days to do that. 13 00:01:27.720 --> 00:01:35.100 William Cheng: You need to build the address space run as user been hello directly from in a prop run using Colonel exactly 14 00:01:35.880 --> 00:01:43.590 William Cheng: So again, you should read the colonel FAQ to see, you know, how, how do that and then phase three, so after you get hello 15 00:01:44.100 --> 00:01:51.660 William Cheng: World, you need to pass all the tests in Section be the first one is being hello. And the last one is is forking wait 16 00:01:52.230 --> 00:01:59.640 William Cheng: So, at that time, you need to get shadow object and forth to work together in order to pass the last test and then in phase for 17 00:02:00.060 --> 00:02:05.400 William Cheng: You should get the user space show to work. So again, there's going to be new function, you have to write 18 00:02:05.730 --> 00:02:14.190 William Cheng: Do a map. The implemented a map system call and this will be the first time you're going to call Malcolm free. So again, you need to get those go to work. 19 00:02:14.640 --> 00:02:25.650 William Cheng: And then finally, once you get the user space shell to work and then they are five user space program that you have to pass. So the last few days or you should try to pass all those tests. Okay. 20 00:02:27.360 --> 00:02:29.760 William Cheng: All right, so 21 00:02:31.170 --> 00:02:39.210 William Cheng: You should read the colonel three FAQ, you know, I guess some of the stuff I mentioned before, we're going to repeat them a little bit I number one. This one is, you know, ever since. Colonel one 22 00:02:39.630 --> 00:02:43.740 William Cheng: Whenever you get a colonel page fall, especially on Colonel three, you're gonna get 23 00:02:44.190 --> 00:02:52.320 William Cheng: You're gonna get some really weird kernels of paintball what's important is to pinpoint exactly which instruction cause the colonel page fall 24 00:02:52.620 --> 00:03:00.720 William Cheng: Okay in current one and two, you're going to be looking at an SQL in Colonel three. Sometimes you have to look at assembly code in order for you to sort of figure out exactly where the program crashed. 25 00:03:01.320 --> 00:03:12.030 William Cheng: OK. So again, you know, so whenever, whenever you get a bad Colonel page fall follow the colonel fact is is figure out exactly what you do without doing that. First is pretty useless actually guess 26 00:03:13.230 --> 00:03:16.980 William Cheng: Well, what could have caught. What could have caused the crash. 27 00:03:17.400 --> 00:03:26.910 William Cheng: Okay, you got to pinpoint exactly where the bug is sorry not be motivated, where the crashes and then you try to sort of figure out why you get well. Yeah. What, why are you getting a crash that 28 00:03:27.720 --> 00:03:36.360 William Cheng: Well as some of us need to work backwards. Now, and also when you get your first legitimate baseball. Right. So that means that you finished building your address space and now you're 29 00:03:36.630 --> 00:03:45.480 William Cheng: You know your your your loader is going to start running your program and I guess, is we're doing on demand page. And the first thing that you will get is that you're going to get a page fall and boom, you're right back instead of Colonel 30 00:03:45.720 --> 00:03:52.920 William Cheng: As soon as you come into this, you know, to get your first page for the first thing you should do is to type this following Colonel debugging command. 31 00:03:53.250 --> 00:04:05.640 William Cheng: Okay, Colonel info VM and mapping info product and product right arrow PV a map. So the map is your address space. So this instruction on this commands us that GDP print our your address space look like 32 00:04:06.450 --> 00:04:13.860 William Cheng: Okay, so you got to ask yourself the question that. Do I have the right address space. If your address space is wrong. Well then, from this point on, nothing's going to work. 33 00:04:14.430 --> 00:04:17.370 William Cheng: Okay, so you got to make sure that you have a good address space. Okay, so 34 00:04:18.000 --> 00:04:25.260 William Cheng: So, so you can sort of look at the code in VM happy info. So as soon as you write your run this command. If your kernel crash. That means that your data structure is bad. 35 00:04:26.010 --> 00:04:35.130 William Cheng: Okay, so, so, so getting that case, you know, maybe what you're trying to build your memory map. There's a bug in your code. So once you get your money, what 36 00:04:35.910 --> 00:04:41.670 William Cheng: Once this. Can I showed you a reasonable address space again the address space is not going to be the same as the one that we look 37 00:04:41.970 --> 00:04:46.710 William Cheng: That we talked about. He said the lecture is going to look a little bit different. There might be some surprises there. 38 00:04:46.980 --> 00:04:52.740 William Cheng: Okay. So if you're wondering what your I just got to see if this is right. That's where you should ask your classmates for help. 39 00:04:53.370 --> 00:05:03.030 William Cheng: Okay, so I cannot tell you what the addressable someone's like, you should feel free to post to the class, we will learn with to say that this is the print off of this command. This is my address space. Are you guys gave me the same thing. 40 00:05:04.080 --> 00:05:06.960 William Cheng: We had some people just post a message there oh what you know what 41 00:05:07.320 --> 00:05:16.950 William Cheng: What are you getting so I can check if if my address space is correct, that's the wrong way to do it. Okay, you have to post your address space and ask other people to see if the same thing. 42 00:05:17.340 --> 00:05:22.050 William Cheng: Is to see if see if they're getting the same thing. Okay, I don't, you know, don't, don't, don't try this at a high, you know, ha. 43 00:05:22.500 --> 00:05:26.610 William Cheng: Ha ha Hydra address babies, because you're so embarrassed to show it to other people. 44 00:05:26.970 --> 00:05:34.110 William Cheng: Okay, so just just let other people see it and other people can comment with to tell you what, you know, what do they think is correct what the thing is wrong. 45 00:05:34.530 --> 00:05:47.100 William Cheng: So guys, remember I also cannot tell you exactly what the address based supposed to look like that. Even though this is not telling you what will call the guy, but there are things I don't. I really don't want to tell you. Okay, I really want you to do this do to discover yourself that 46 00:05:48.750 --> 00:05:51.510 William Cheng: Alright, so again, I cannot tell you what the right 47 00:05:51.690 --> 00:05:58.980 William Cheng: Values for this particular case, sometimes I can tell you what they are. But sometimes, a lot of the I like the program print. Now, I won't tell you what what it's supposed to look like, right. So again, 48 00:05:59.160 --> 00:06:03.930 William Cheng: You should start a discussion or I kind of wanted to nobody, discuss anything is impossible will grow, you should 49 00:06:04.350 --> 00:06:12.870 William Cheng: You know, you should use this opportunity to use the clock school cosmic go grew so this way you don't have to wait for my response, you know, things will go faster and also 50 00:06:13.230 --> 00:06:26.580 William Cheng: You know that, I think, yeah, it's really, really important to to to learn how to talk in a in a in a forum like the class Google group. Okay, so these are really good practice. People hasn't been taken advantage of that. 51 00:06:28.110 --> 00:06:30.300 William Cheng: And also you need to read about. So, so at the 52 00:06:31.320 --> 00:06:38.790 William Cheng: At the end of a handling the page fault. Okay, so remember you had a face while we talked about in class. What do you have to do, right, you have to, you know, you know, 53 00:06:39.330 --> 00:06:48.930 William Cheng: Go to address space walk it now try to, you know, try to look on the page for him. I like on us up in the end. And when you are about to return back into the usual space, you have to fix the page table. 54 00:06:49.560 --> 00:06:54.090 William Cheng: Okay, I mean sometimes you don't have to do that but it, but clearly in the week when you try to perform the first page fall 55 00:06:54.270 --> 00:07:00.720 William Cheng: V equals to zero. So if you go back into the user space with equal to zero. What are you going to get another patient. Are they going to keep doing this forever. 56 00:07:01.500 --> 00:07:05.730 William Cheng: Okay, so therefore, in the end of handling a patient, you have to change the pace table. 57 00:07:05.940 --> 00:07:08.160 William Cheng: The function that you need to use is called PT map. 58 00:07:08.340 --> 00:07:19.980 William Cheng: PT map sets a PT sessions will pay stable. So in this case, would you want to do is that you want to set up a map entries that appear stable. So again, that will be one page table NGO over here is you need to set it up using this function. 59 00:07:20.760 --> 00:07:25.410 William Cheng: OK, so again look at a common kind of comment block do some graph is sort of find out you know 60 00:07:26.730 --> 00:07:33.030 William Cheng: What, what are you supposed to do it with this pretty good posture. Again, you should discuss in the cost will grow if this doesn't work. 61 00:07:33.600 --> 00:07:41.460 William Cheng: There's the colonel FAQ look for the stream PT underscore map and see what the what what kind of hints that it's giving you 62 00:07:41.820 --> 00:07:45.930 William Cheng: So follow the instruction there and try to sort of verify whether it will you know what what 63 00:07:46.650 --> 00:07:51.240 William Cheng: Were the things look correct or not. If things doesn't look right. Don't bother with PT map is not gonna work. 64 00:07:52.020 --> 00:08:04.020 William Cheng: Okay, so before you call PT. Now you got to make sure that that all your data structure is in a reasonable shape because otherwise you're mapping nonsense into your address space when like it's not gonna work. OK. So again, read the colonel FAQ. Yeah. 65 00:08:05.340 --> 00:08:14.370 William Cheng: All right. And also, we do have a few about how to do single step in assembly code because you know once you know once the function get combined too many, many assembly code over here. 66 00:08:14.880 --> 00:08:21.600 William Cheng: Sometimes your program can crash in one place or the other. So if you only set a breakpoint inside of see function, you're not going to see any more details. 67 00:08:21.810 --> 00:08:31.860 William Cheng: Okay, so if you want to go to the detail you really have to look at the assembly call. So again, look at the colonel FAQ to see how to do that and how to set a breakpoint using assembly code or using a virtual address 68 00:08:32.370 --> 00:08:42.540 William Cheng: Okay, so in Colonel three sometimes that's the, that's something that that's probably something that you have to do that in the next discussions. Actually, I was sort of show you, you know, as a 69 00:08:44.220 --> 00:08:50.460 William Cheng: I'll show you a little bit of stuff inside of kernel three FAQ. But again, you should you should start doing this as early as possible. 70 00:08:51.330 --> 00:08:57.750 William Cheng: Now also how the second conditional breakpoints, because otherwise, you know, for example, if there's a function 71 00:08:58.050 --> 00:09:04.380 William Cheng: Where you hit that function, you know, the 15th time it's going to crash, but the force. The first 14 times doesn't crash. 72 00:09:04.680 --> 00:09:14.100 William Cheng: Now guys, this guy's, you have to keep say, you know, hitting the brake line over and over again, it will be really tiring, it will be nice. Learn how to set a conditional breakpoints. So this time, this way when it breaks. Exactly. 73 00:09:15.330 --> 00:09:26.430 William Cheng: Exactly gonna break at the time that you want it. Okay, so. So again, you know, if you don't know how to do that again read the colonel three FAQ, and you can also ask me about things inside the inside the colonel FAQ 74 00:09:27.240 --> 00:09:30.840 William Cheng: So lots of stopping the colonel through fact over here. So, again, get familiar with that. Okay. 75 00:09:31.710 --> 00:09:42.270 William Cheng: Alright, so now we're going to go back to chapter six and last time we're going to talk about. We'll talk about how to implement directory or we're going to implement directly efficiently and also have long component names right 76 00:09:42.990 --> 00:09:48.270 William Cheng: We started out as the last lecture, we end up with a system five directory file implementation. 77 00:09:48.630 --> 00:09:59.730 William Cheng: So remember, the director of our it's a array of records and each records I exactly 32 bites law right here's 32 by 32 by 32 bites. Right. So a directory file, it's just an array records. 78 00:10:00.090 --> 00:10:07.710 William Cheng: OK, so the array records is very, very rigid because the component name has to be and most 27 characters, plus the backside zero 79 00:10:08.100 --> 00:10:17.310 William Cheng: Okay, so this case, it's not flexible enough. So what we need to do is that we need to change the, you know, sort of the data structure for directory file sort of the component and can be as long as you want. 80 00:10:17.700 --> 00:10:22.980 William Cheng: Okay, so how long going to be, why me again, in the end, there's going to be a limit. Okay, but the limit should be very, very big 81 00:10:23.550 --> 00:10:32.250 William Cheng: Okay. And also, we don't want the waist up too much space inside directory files are going to sort of see a flexible data structure that allow you to specify component name of, you know, 82 00:10:33.480 --> 00:10:36.450 William Cheng: Come on in. And that could be local or can be sure. Okay. 83 00:10:37.890 --> 00:10:50.520 William Cheng: Alright, so this is done is that a fastball system. So for directory, the directory is not going to be a record of, you know, the citizens to directly entry. But now, every directory entry can be on different size. 84 00:10:51.570 --> 00:10:57.810 William Cheng: Okay, so every day. I can be on different deployment side. So, that is how do you, how do you know how I'll be the director entry is right. So again, 85 00:10:58.020 --> 00:11:05.700 William Cheng: You know, every day two entries over here is going to be divided into two parts. One is the header and the other one is the body and the head of will tell you how many bites are inside the body. 86 00:11:06.150 --> 00:11:09.480 William Cheng: Okay, so therefore, in a in a in a fast file system. 87 00:11:09.720 --> 00:11:19.020 William Cheng: Or directory entry header is eight bytes law. So the COP. A buys over here. So when you try to read a direct answer you should read the first eight by and the buyers will tell you, helping the entire data structure. Yes. Yeah. 88 00:11:19.170 --> 00:11:23.670 William Cheng: So the first four bars over here is going to be the I O number, right. So again, you know the the 89 00:11:24.030 --> 00:11:34.410 William Cheng: The purpose will a directory entry is to tell you what component A map to what I know numbers are. Here's the, I don't remember it takes off, or vice. Yeah. The second part here is two bites law. 90 00:11:34.740 --> 00:11:42.630 William Cheng: Okay, tells you how big this data structure is right. So this guy, they will say that this day or so to 16 bytes. So, therefore, you know that the entire data store. You're 16 by 91 00:11:42.780 --> 00:11:49.020 William Cheng: You're ready read a by so if you want to finish the entire data structure, you're going to read a more bites and then you're going to have the entire directory entry. 92 00:11:49.410 --> 00:11:56.370 William Cheng: There. The next one over here is going to be the length of the string. And by the way, the size of the record has to be a multiple of four bytes. 93 00:11:57.120 --> 00:11:59.370 William Cheng: That so therefore you need one more argument have 94 00:12:00.060 --> 00:12:06.180 William Cheng: One more fields in your data structure tell you how long the component name is right. So basically this the string length. 95 00:12:06.420 --> 00:12:14.280 William Cheng: Of the component name. So in this case, is equal to four, and then you know that you need the pack size zero at the end. So that means that this particular students can take a five bites. 96 00:12:14.460 --> 00:12:17.730 William Cheng: There. So after the fight is over here. They can be garbage. So you're supposed to ignore. 97 00:12:18.300 --> 00:12:30.540 William Cheng: That. So in this example, the first directory entries over here. I know number is 107 the size of the record is 16 and then this one is equal to four. So you have a string of five characters at the last three characters over here at garbage. 98 00:12:31.170 --> 00:12:36.570 William Cheng: Okay, so, so in this case after you finished reading the first director, ensure that you know exactly where the next one starts 99 00:12:36.930 --> 00:12:40.200 William Cheng: Right. So the first one over here is eight bytes was a bias. And once you finish reading that 100 00:12:40.380 --> 00:12:50.610 William Cheng: They can continue to read a directory file. You can again read the first eight bytes and this one tells you that the data structure of the entire data storage is taught by law. So that means that there's four more buyers for you to read. So this case again. 101 00:12:50.820 --> 00:12:54.660 William Cheng: The stream is equal to three, so therefore it will take up the entire four bytes that 102 00:12:55.050 --> 00:13:07.410 William Cheng: So in this example, the last one over here is low. Weird. This is the last entry over here, guys, so you will tell you the number of here is equal to 180 and the size of this data structure. They actually include all the free space on the bottom of the directory file. 103 00:13:08.520 --> 00:13:18.210 William Cheng: Okay, so, so, because again for for directory file. They're allocating, you know, the, you know, a multiple of page size. And so therefore, at the end over here. 104 00:13:18.600 --> 00:13:27.180 William Cheng: If a patient has 1024 than the minimum size of a director is going to be 1024 by law, right, if the if the page size is 4096 that again the minimum 105 00:13:27.660 --> 00:13:30.120 William Cheng: Size of the block is going to be 4096 106 00:13:30.810 --> 00:13:40.110 William Cheng: Well, so this one. I'll tell you the entire block is. And then over here, it tells you the stream ninth is so if this one equal to three, then you should know that this should be equal to 12 right, just like in the second case over here. 107 00:13:40.380 --> 00:13:47.760 William Cheng: Okay, so since this value of here does not equal to 12 they you know that, then the remaining space inside this direct with our free space. 108 00:13:48.390 --> 00:13:54.870 William Cheng: Okay, so next time when you try to add an entry into the directory file you know exactly where to go to. And this way you can actually add more directory entries over here. 109 00:13:55.920 --> 00:14:04.950 William Cheng: At the bottom of your directory for that. Alright. So again, this is the content of a direct result of the way you should think about the content of a file is that it's a 110 00:14:05.250 --> 00:14:13.440 William Cheng: It's simply not a real bites. Okay, so, except that in this case we will again since we know that it's a directory file. We know that 111 00:14:13.650 --> 00:14:23.310 William Cheng: Not only to around buys it also an array of directory entries. Right. And we know exactly what the data structure is for the direct Reggie. So this way we can actually read one direct entry at a time. 112 00:14:25.140 --> 00:14:32.130 William Cheng: Right, so, so this this particular file can be as long as you want. So remember that the component name or via how long can a component name be 113 00:14:33.240 --> 00:14:41.910 William Cheng: Lost. So remember that over here we have the, the, the, you know, the number of bytes in a direct to n g. So, the number of bytes over there to enjoy the largest one is over here is going to be 114 00:14:42.330 --> 00:14:58.590 William Cheng: This is a two by number so to buy number is 16 bits. So, therefore, to, to the 16 is going to be the biggest data structure. Okay, so you can actually see that the component and can be almost 64 to 60 and 64,000, it can be all you know almost 64,000 by law. 115 00:15:00.270 --> 00:15:03.210 William Cheng: Right to the to the 60 64,000 so so 116 00:15:03.570 --> 00:15:15.660 William Cheng: This component that can be really, really long. But also we're trying to be space efficient, right, because if it turns out your component name is very, very sure. Like the example of etc only take up three buyers. So in this case, the directory entries on the top is law. 117 00:15:16.500 --> 00:15:28.530 William Cheng: Alright, so again, the first file system design things this way so I can have a very flexible flexible, you know, data structure that. Alright. So again, if you look at your Yo yo yo yo directly. I know, here's a director. I know, right. So, 118 00:15:30.360 --> 00:15:34.530 William Cheng: If this is using Flash file system and file, file system is using that this map. 119 00:15:34.890 --> 00:15:41.370 William Cheng: That the same this map, and this is the boss is them. So in that case, the bottom part over here. There's 13 point or the first, you know, the first 10 pointer. 120 00:15:41.580 --> 00:15:50.760 William Cheng: Point directly to this blog and the next one is the indirect blog and then doubling direct, indirect, etc. OK. So again, if you look at the file data, the file data look like this. 121 00:15:51.660 --> 00:15:56.580 William Cheng: Guy. So this picture over here is showing you what the file data it look like for a directory file. 122 00:15:57.060 --> 00:16:05.460 William Cheng: Okay. And this, you know, this particular file over here. Again, it's just a concatenation, a bunch of directory entry and this file can be really, really long. So if you look at the directory 123 00:16:05.760 --> 00:16:13.920 William Cheng: You know the directory slash user been last time I mentioned that it has over 1000 entry on the Linux is them. So in that case, this file can be really, really long. 124 00:16:14.370 --> 00:16:20.820 William Cheng: OK, so I'll be here he says you're more maybe on the, on the average, every one of these this block, right. So in this case, they will span multiple this block. 125 00:16:21.150 --> 00:16:23.850 William Cheng: Every this blog is, you know, one kilobytes four kilobytes, whatever it is. 126 00:16:24.240 --> 00:16:27.570 William Cheng: Every this blog over here has on the average for the fastball system. 127 00:16:27.780 --> 00:16:37.650 William Cheng: 100 to 200 entries there because now we don't know since the, the actual entry is not fixed size. So we don't really know how many entry is going to hold right. It depends on you know whether the user 128 00:16:38.040 --> 00:16:42.210 William Cheng: You know name that the component name very short very long as this case we're variable. 129 00:16:42.660 --> 00:16:54.780 William Cheng: Right, so, so, yeah, these things, doesn't really have to fit inside of this blog as far as we're concerned, this entire blog. The all these this blog over here, we should concatenate them one after another. And then we sort of look at this and 130 00:16:56.310 --> 00:17:04.710 William Cheng: All the spots together as one long stream of bytes. Okay. Because the Unix or Linux system. Whenever you look at a file, file is simply 131 00:17:05.700 --> 00:17:14.730 William Cheng: The for every far as simply a stream of bytes. Okay, so this way again when we try to open a file, we do open when we read, and we read, and we read, we're reading the spice dream. 132 00:17:16.080 --> 00:17:19.860 William Cheng: All right. But in this case, if this particular file is a directory file one then then 133 00:17:20.970 --> 00:17:28.500 William Cheng: Then in that case we know that it's not justify stream. We also the patient has a structure. And then we know the structure has to look like this. Now, 134 00:17:29.940 --> 00:17:36.240 William Cheng: Alright, so. So in this case, you know, this data structure over here can be really long. Right. So in that case, when you as okay this 135 00:17:36.540 --> 00:17:49.680 William Cheng: I know is the user been I know right every time. Will you try to run a program. We need to scan through all these directory entries over here, try to look for the program. And now we're trying to run go. So in this case, you know, if the number of blocks over here is equal to n. 136 00:17:50.730 --> 00:17:59.130 William Cheng: Okay, since we are looking for these things sequentially right then the performance going to be order n. So, so in this case we might hit the disk on the order of n times 137 00:17:59.640 --> 00:18:03.330 William Cheng: Okay, so you've and as a large number. In this case, the performance will be really, really poor. 138 00:18:03.690 --> 00:18:10.620 William Cheng: Because we need to, you know, wait for this to transfer all this data because again we have a good buffer cache going to get cash is cash is in that case. 139 00:18:11.190 --> 00:18:18.780 William Cheng: Maybe the performance will be just fine. Okay. But again, without a Yo, you know, without a large buffer cache. We have to go to the this many times 140 00:18:19.530 --> 00:18:27.630 William Cheng: Now, so therefore, in this case, again, even if all these parts are in the memory. In this case you need to walk down a linear, is it lingo is really, really long it's gonna take you a long time. 141 00:18:27.930 --> 00:18:38.100 William Cheng: So in this case we have a typical computer science problem. We're using array implementation to implement a directory file in order for us to speed it up. Okay. He's a tree data structure or he's a hash table. 142 00:18:38.820 --> 00:18:44.040 William Cheng: Okay, so if he is a true data structure we're going to end up with the login perform as though, as in that case is pretty good. 143 00:18:44.340 --> 00:18:50.970 William Cheng: We can also use a hash table to get an order one performance. Yeah. So we're going to do that. We're going to first look at the hash table solution. 144 00:18:51.360 --> 00:18:59.040 William Cheng: You know, instead of forces them and then we're going to look at a tree solution, as it turns out, the tree solution. We're going to use is also something borrow from the database community. 145 00:18:59.310 --> 00:19:04.020 William Cheng: We call it a b plus tree. Those of you have taken the database clouds, you know that there are a lot of the tree. 146 00:19:04.380 --> 00:19:17.460 William Cheng: So we're going to sort of talk about, you know, we're actually we're not going to talk too much about the tree. I'm going to sort of introduce that to you. Okay, I just wanted to see what a data structure look like. Yeah. Okay, so first over here is that we need to build this. We need to 147 00:19:18.900 --> 00:19:23.220 William Cheng: You know, we need to implement a directory file using a hash table. 148 00:19:24.120 --> 00:19:31.710 William Cheng: Okay, so we're gonna remember that directory father directory is simply performing a look up functioning given a component A return the item number 149 00:19:32.010 --> 00:19:42.390 William Cheng: Okay, so therefore I can use an array implementation or the variable size of rainbow nation that we just saw before. Well, we can use the tree data structure to perform the look operation or we can use a hash table to look up operation. 150 00:19:42.840 --> 00:19:49.380 William Cheng: Okay, so the first solution. We're going to look at this. Look at a hash table implementation. Okay, so what is going to be the reason why condition with a hash table. 151 00:19:49.680 --> 00:19:55.830 William Cheng: Okay, the reasonable implementation of a hash table is that every this block over here is going to correspond to a hash bucket. 152 00:19:56.610 --> 00:20:01.410 William Cheng: Because remember whenever you try to use a hash table. The first thing that you will do is that you will take the key. 153 00:20:02.310 --> 00:20:08.100 William Cheng: So in this case, what would be the key, right. So again, we tried to map component name I know number. So therefore, the component there will be the key. 154 00:20:08.280 --> 00:20:17.910 William Cheng: We're going to take the key. We're going to send it to a higher functioning and the highest one she was going to give us a bucket number right so in this case over here, we might have multiple bucket bucket zero pocket one bucket to over here. 155 00:20:18.360 --> 00:20:28.800 William Cheng: Inside the bucket is going to be the collision resolution check right so once he saw the closure of losing check right that inside the collision resolution are all the key value pairs that are hash into the same bucket. 156 00:20:29.910 --> 00:20:36.210 William Cheng: Okay, so they have a what we need to do is if waterproof if want to perform look up. So let's look up a file called food RC. 157 00:20:37.260 --> 00:20:40.650 William Cheng: Guys, so this will be the component. Right. So, therefore, we're going to send this to a hydrogen 158 00:20:40.860 --> 00:20:44.550 William Cheng: The hydrogen is going to give us a bucket number over here. So in this case we bucket number two. 159 00:20:44.760 --> 00:20:54.630 William Cheng: There so inside this bucket is going to be one. This blah. So we go to the desk and read only one this blog and this this file will contain all the key value pairs that hash to bucket number two. 160 00:20:55.140 --> 00:21:01.980 William Cheng: Okay. So, therefore, if we just look through this and this entire block we either going to find it, or you know, food, I see is not inside this directory 161 00:21:03.060 --> 00:21:08.040 William Cheng: Okay, so therefore I'll you know once we treat one of these broad than that data structure will look like what we what we saw before, it will be 162 00:21:08.220 --> 00:21:19.710 William Cheng: Will be looking at this data structure. So again, we can actually, you know, you read the whole thing into memory and then we can quickly go through this data, data structure and try to see if food. I see is one of the one of the component name or not. 163 00:21:20.220 --> 00:21:24.750 William Cheng: Okay. In the component name over here doesn't exist, then we know that this component that does not 164 00:21:25.770 --> 00:21:27.060 William Cheng: Exist inside this directory 165 00:21:28.200 --> 00:21:36.630 William Cheng: That so so so again the hashtag, which is pretty straightforward, right, and also a very good performance is actually order one in terms of the number of this blog, you have to go to 166 00:21:36.840 --> 00:21:45.390 William Cheng: All you have to do is to go to one this blog and you can t to either find it or you can determine that that file is not inside this directory. Okay, so what is wrong with this implementation. 167 00:21:49.800 --> 00:22:01.950 William Cheng: Okay, so that's the typical problem with the hash table implementations that eventually you're going to run out of space. Okay, this bucket over here. What I keep adding files into my hash table at some point is going to overflow, you know, 168 00:22:02.550 --> 00:22:06.030 William Cheng: The overflow this this particular bucket. Why would a bucket overflow. 169 00:22:06.660 --> 00:22:11.760 William Cheng: Right. I mean, typically, if you think about if you're taking a data structure class your bucket is a link Liz can grow forever. 170 00:22:11.910 --> 00:22:21.420 William Cheng: But now, since the entire bucket needs to fit inside of this block and that this blog is fixed size sooner or later you're gonna run out of disk space because sooner or later you're gonna run out of space inside of this wall. 171 00:22:21.930 --> 00:22:25.290 William Cheng: Okay, so in that case what what do you have to do. Right. So again, typical 172 00:22:25.740 --> 00:22:35.340 William Cheng: Sort of a traditional you know hash table implementation is that when you run out of space, would you need to do that, you need to get a new hash function and then you have to take all the content over here and rehash everything 173 00:22:35.880 --> 00:22:45.870 William Cheng: That. So in this example, I started out with a hash function. The hash into three buckets 01 and two right once one of the bucket over here is completely full. I need to get a new function that will has to four buckets. 174 00:22:46.080 --> 00:22:52.980 William Cheng: Got 1234 over here. And then I so when I use a different hash function, all these hash value over here, they will ask you a different bucket. 175 00:22:53.310 --> 00:22:59.310 William Cheng: Because whenever I need to take all the data over here. We all have them into memory rehash all of them and then read them out Otter, the desk. 176 00:23:00.030 --> 00:23:11.070 William Cheng: That. So this operation can be really, really slow. Okay. Because in the end, you have to read all the data blocks over here. And then you have read all the data blocks right so the number of this read a number of this right can be, you know, can, can be too many 177 00:23:11.550 --> 00:23:17.490 William Cheng: So therefore, we need a solution in case we run out of space inside. Inside of this bar then 178 00:23:18.810 --> 00:23:27.120 William Cheng: Alright, so we're going to see that you saw the process them the, you know, some of them were actually implement something called the extensible hashing. So extensible hashing is a specialized hash function. 179 00:23:27.420 --> 00:23:32.400 William Cheng: Okay, we're going to use a sequence of highs function and these hash function are related to each other. 180 00:23:32.970 --> 00:23:45.150 William Cheng: Okay, we're going to call these functions 80818 have to have the, you know, so, so the bigger index is the one that has to work baguettes, okay. So whenever you run out of space. You go to the next, the next, hash function that 181 00:23:45.600 --> 00:23:56.850 William Cheng: So let's take a look at all these as function to see how they are related to each other. Number one, if the subsequent over here is is so h i hashes. The names, which is the component name into to to the ice bucket. 182 00:23:57.360 --> 00:24:07.470 William Cheng: Okay, so H zero has to one bucket H water has to bucket is to a hospital for about a through a hash the ace. But again, and a 1632 64 so there are the power of tools. 183 00:24:08.040 --> 00:24:20.130 William Cheng: Okay, so that's one way they're related to each other. And number two, this is more important over here for any component of x. The lower order I bits of H. I. O. Box or the same in h i plus one of 184 00:24:21.300 --> 00:24:22.560 William Cheng: Them are the same in hai 185 00:24:24.180 --> 00:24:32.940 William Cheng: Hai plus 12 x. Okay. So this sounds really were. So let's put some numbers here, x is equal to food RC right over here. I food. I see over here. 186 00:24:33.180 --> 00:24:48.510 William Cheng: Okay. The lower I bit of age. So let's say I could have to write so H2 food ice. So let's say you know what is possible valuable. Ah, ah, have to write a show to hash it into four buckets. So the possible value 00 in binary 011011 187 00:24:49.980 --> 00:24:59.610 William Cheng: Okay, so we hashed H to have food. I see over here. These are the four possible value. So, for example, so let's say that we actually, we use this hash function we hash through that. See, India, we get one, zero. 188 00:25:00.210 --> 00:25:14.100 William Cheng: Okay, so what does this rule tell you right for any name any component and food. I see over here, the lowest I would have to two bits of H2 of x are the same in H3 of x. Okay, so if you take a story of food. I see. 189 00:25:15.180 --> 00:25:24.510 William Cheng: That food. I see over here, right. A through F without see over here. Okay. So over here, it says the lowest to bits has to be the same as as H2 of food. I see. 190 00:25:24.840 --> 00:25:36.990 William Cheng: Okay. And we know that as to our food is equal to one, zero. So that means that a story or food. I see the lowest two kids will be has to be one zero and the leading big can be either zero or one, because it can be 010110 191 00:25:37.590 --> 00:25:45.060 William Cheng: Okay, so another way to look at it is that if he to have food. I see hash to the number two, A three or food. I see. It has to be either two or six 192 00:25:46.560 --> 00:25:54.690 William Cheng: Okay, so if you're using extensible hashing, these are the only two of you know the two possible value if as to have the same street is equal to two. 193 00:25:55.530 --> 00:26:08.520 William Cheng: OK, so again this is. Yeah. So over here, this is why it says H3 is an extension of a su. So we start with the value of h2. You just extended by one bit. And that will give you an H3 because all the lowest level or the lower order because they are exactly the same. 194 00:26:10.050 --> 00:26:16.620 William Cheng: Okay, so we're going to sort of see, take a look at the example to see how they actually use this hash function I show you why does the hash function. 195 00:26:17.070 --> 00:26:27.030 William Cheng: Can actually make you know what you know when when you describe run out of space. You know, when we try to rehash everything it can be done in a fairly efficient manner. 196 00:26:29.130 --> 00:26:39.660 William Cheng: To use extensible Hashem. We also need to one level in direction because we love one level interaction, right. So in this case without one. Nobody indirection. It's not going to work very well. So again, we're going to add one direction, then 197 00:26:40.080 --> 00:26:42.120 William Cheng: So let's take a look at how we actually implement this. 198 00:26:42.870 --> 00:26:46.530 William Cheng: That. So we're going to have you know a bunch of indirect bucket over here and we're going to use the 199 00:26:46.830 --> 00:26:57.870 William Cheng: Other the highest function H2 over here to hash all the component name. So when we have a nice to have for food. I see over here. Right. What we're gonna do is, I'm going to get one of the bucket is going to be 00011011 200 00:26:58.140 --> 00:27:02.880 William Cheng: And then you follow the pointer and the pointer over here you will appoint award punchy to the disbar 201 00:27:04.080 --> 00:27:14.460 William Cheng: Okay. So, therefore, you know what, you know, what will you perform H2 over here. You don't directly go to this right, you go to an indirect bucket over here and he going through the indirect bucket a watch your point to it this blog so sort of the 202 00:27:14.730 --> 00:27:25.410 William Cheng: Negativity or just even this block number one, five to seven, eight over here. Okay, so that we did this bother you need to go to find the collision resolution chain for every key that's hash to bucket number two. 203 00:27:26.640 --> 00:27:36.990 William Cheng: Yeah, alright. So, again, every bucket over here is going to be implemented using one this block. So when we try to retrieve it. This block. Well, we are doing is reading, reading one block of data from the desk that 204 00:27:37.440 --> 00:27:42.780 William Cheng: Alright, so let's take a look at these on the other. So here's an indirect block that will continue to block number of the actual data, blah. 205 00:27:43.260 --> 00:27:48.450 William Cheng: So again, you know, every day has on the order 100 to 200 strings. I don't really want to draw all that 206 00:27:48.690 --> 00:27:55.800 William Cheng: Okay, so I'm going to sort of use a very, very simple example where the maximum number of entry that you can store inside the blog is going to be two strings. 207 00:27:56.280 --> 00:28:00.690 William Cheng: Okay, or two components. Right. And then, then we're going to, we're going to become out of space, right. 208 00:28:01.110 --> 00:28:04.830 William Cheng: Alright so this picture. So tell you that if you hash Ralph the string, Ralph. 209 00:28:05.070 --> 00:28:09.480 William Cheng: With ACE to function, you're going to get 00 better because over here to sit inside this this block. 210 00:28:09.660 --> 00:28:14.490 William Cheng: And this this wise point by the right up up up up entry number 00 guys over here again. 211 00:28:14.670 --> 00:28:28.080 William Cheng: This is the collision resolution temple bucket zero, so fuck is it will contain Ralph. It also contains Lily. So if you hash movies with HR to you also get the value of 00 so again you follow the pointer at the end and then you can come to this data structure. 212 00:28:28.290 --> 00:28:34.020 William Cheng: And then again you can perform that linear search inside this this block and the day you will find the corresponding I know number 213 00:28:34.950 --> 00:28:39.000 William Cheng: Guys, again, this is a data structure, given a component and you need to look out 214 00:28:39.300 --> 00:28:44.370 William Cheng: You know, the, the I O number, right. So again, you have to do H2. You follow the pointer over here, go through this bra. 215 00:28:44.580 --> 00:28:54.510 William Cheng: Read the entire this block from the this into memory and there you go through a linearly computer every entry against the key that you're looking for. And if you can find it. You also find the corresponding I know number 216 00:28:55.500 --> 00:29:04.680 William Cheng: One. Alright, so what about Joe right if you're has Joe they sue you. You get bucket number one again. You can follow the pointer over here, you will find Joe and you find the corresponding index number 217 00:29:04.890 --> 00:29:10.680 William Cheng: And this this blog over here. As it turns out, it's 50% full guys so you can actually add more stuff into this this this box. 218 00:29:11.520 --> 00:29:21.150 William Cheng: That Belinda, and George if you hash them move the H2, you'll get the value of one, zero. So again, if you follow the point are you gonna find this this bra. And again, they're sitting there for you to fetch. There are no number 219 00:29:21.390 --> 00:29:32.010 William Cheng: Then finally, hurry and Betty. They all have two, three, when you use the H2 hash function. Okay, so say that kids are going to fall in the fall of the pointer and then you find them inside this this block. 220 00:29:32.580 --> 00:29:40.320 William Cheng: Right. So, so far so good when you start a profile and look up operations. So let's say you want to look up George right you you saw when you look up George over here. 221 00:29:40.470 --> 00:29:48.510 William Cheng: You send it to H2. You follow the pointer Georgia has to bucket number two you follow this point are you read it. This part and you will find a phone number for George know 222 00:29:49.980 --> 00:29:54.840 William Cheng: Alright, so let's take a look at an example. So don't go looking up. It's very straightforward and only going to cost you. 223 00:29:55.110 --> 00:30:04.290 William Cheng: You know what this screen because you have to read it. This block from a distant memory, right. Also, if this indirect right there will also be storing that this are in the worst case, you're going to read to this block right and then YC read this. 224 00:30:04.500 --> 00:30:13.170 William Cheng: First, this blog over here, you know, keep you know keep keep that keep it inside of buffer cache. So from this point. All you have to do is to read the second block on the desk. Okay. 225 00:30:14.760 --> 00:30:21.510 William Cheng: Alright, so now we're going to try to look at a case where you run out of space. So in this case, we're going to insert Fritz into 226 00:30:21.810 --> 00:30:29.370 William Cheng: You know into our director file. So this guy's Asia to Africa to to to rise, when we try to perform a toy insert. What do we have to do. 227 00:30:29.700 --> 00:30:38.580 William Cheng: Okay, so just like when you're using a tree data structure or using Linda is when you try to ask something to Linda's you got to make sure that whatever that you're adding does not exist on the link does already 228 00:30:39.120 --> 00:30:47.820 William Cheng: Okay, so here we have the highest a ball when we try to perform an insertion operation. The first thing we need to do is to verify that the fritz, it's not inside the 229 00:30:48.870 --> 00:30:58.920 William Cheng: Fridge is not inside this directory while well. So again, what, what should we do, we take the string Fred's hashing with H2. And this example is equal to, to rise again 0123 over here. 230 00:30:59.130 --> 00:31:04.080 William Cheng: We're going to follow the pointer over here and we're going to read this, this off on a distant memory. And I'm going to go through this 231 00:31:04.290 --> 00:31:16.890 William Cheng: You know that they're actually entries over here. And then we found out that there's no fridge so therefore we are allowed to insert Fritz into, you know, into this directory file now and then we found out that this particular disappears completely for 232 00:31:17.910 --> 00:31:24.450 William Cheng: That. So in this case, we need to go to the next hash function, right, because what we need to do that, we need to rehash everything. Okay, so we need 233 00:31:24.660 --> 00:31:34.320 William Cheng: We need to replace this house party with h3, we need to take all the keys over here and rehash everyday. As it turns out, if you're using extensible Hashem. This can be done very, very easily, then 234 00:31:34.590 --> 00:31:41.700 William Cheng: All you have to do is the following. I'm going to take my indirect what bucket. I'm going to make it twice as big, because now it needs to create a different buckets. 235 00:31:42.030 --> 00:31:49.590 William Cheng: That. So, therefore, you know, the bucket over here is going to be four or five, six and seven. All I have to do is to take the top part over here. Make a copy into the bottom part 236 00:31:50.820 --> 00:32:03.240 William Cheng: Okay, well you finish doing that when you copy pointer. They point to the same place. So therefore 4568. What were they point to a four point right here. If I will point right here six point right here and said Mopar right here. Okay, so when I finished doing that it will look like this. 237 00:32:04.260 --> 00:32:07.860 William Cheng: Okay, I'm gonna claim that I finished rehashing everything 238 00:32:09.000 --> 00:32:13.080 William Cheng: Okay, because now when we take the value of Ralph, and we rehashing h3. 239 00:32:13.320 --> 00:32:14.070 William Cheng: Ralph when you have 240 00:32:14.280 --> 00:32:28.170 William Cheng: A see what are you gonna get right. We know that each to have Rob equal to 00 right so therefore a stereo route can be either 000 or 100. But let me say it's gonna be a 04. So in this case, when he tried to look up rough weather is equal to zero or four, 241 00:32:28.380 --> 00:32:31.650 William Cheng: They all point to the same at this bar so they have a discount. You can buy it. 242 00:32:32.310 --> 00:32:40.200 William Cheng: Okay. Same thing with when you rehash Lily with age of three, you will see that he's actually sitting on the right. Right place because he can only be 04 243 00:32:40.740 --> 00:32:53.880 William Cheng: With Joe. Joe has to one with HR to so therefore the only choices for Joe for HR three is going to be either hashtag one or five. So why is gonna be followed this point to advise them to follow this this pointer. So therefore choice at the right place. 244 00:32:54.270 --> 00:33:05.460 William Cheng: For Belinda, enjoy the has to to with HR to so there needs to hashtag or six and HR three. So again, there are sitting on the right place for to us here at success here. And then finally, hurry. I'm ready. 245 00:33:06.030 --> 00:33:10.530 William Cheng: They hatch. The three Nate with Asia to so they will have to either three or seven, you know, 246 00:33:11.070 --> 00:33:17.190 William Cheng: 7803 that so here I saw the picture on the right hand side over here is that this bucket over here is being shared. 247 00:33:17.520 --> 00:33:27.840 William Cheng: You know so. So these two buckets over here, share the same disbar right bucket number zero and four there instead of saying this blog pocket. Number five info. Here's a here bucket number two sixes right here, a bucket number three and seven right here. 248 00:33:28.800 --> 00:33:37.380 William Cheng: That. So in this case rehashing. Everything is super easy. If you're using extensible Hashem. Okay. But we're not done yet. We still need to insert Fritz, you know, into 249 00:33:38.070 --> 00:33:46.200 William Cheng: Into our directory file. So in this case, again, if you have friends using age of three, we're going to get either two or six. So we actually have to sort of figure out which one 250 00:33:46.860 --> 00:33:49.950 William Cheng: Which one do you get that. So in this example over here. 251 00:33:50.310 --> 00:34:00.750 William Cheng: You know, Asia, three or phrase is going to be equal to six. So we come to six over here we go to this point here, I can see that this, this book is full. Also, we know that this this blog is being shared by two or six 252 00:34:01.440 --> 00:34:08.250 William Cheng: Okay, so therefore we need to take all the content over here. We're going to rehash everything and then one of them is going to stay in the bucket number two. 253 00:34:08.430 --> 00:34:18.660 William Cheng: And the other one's going to go to bucket number six. Okay. So, therefore, in this is going to allocate a new this block over here. We're going to call this one bucket number six. And this was going to be exclusively for bucket number two. 254 00:34:18.900 --> 00:34:24.180 William Cheng: And we're going to take all the keys over here and rehash everything using h3, you know, for only business law. 255 00:34:24.840 --> 00:34:37.470 William Cheng: Okay, we have a good hash function where you to rehash every key you rehash every key inside the bucket. We're going to end up 50% of them is going to go to bucket number two and 50% of them is going to go to bucket numbers bucket number six on the average 256 00:34:38.610 --> 00:34:49.140 William Cheng: Okay, so therefore, when we finished doing that it will look like this. Right. And this guy is 50% of that Belinda stay in bucket to and then George will be here. Go to bucket number six. And then now we can actually ask Fritz into it and then again. 257 00:34:49.650 --> 00:34:58.800 William Cheng: You know, H3 Fritz over your equal to six. So therefore, we follow this pointer, we found that the inside this this box there is space. So therefore, we're going to insert Fritz right here. So, India for example right here. 258 00:35:00.480 --> 00:35:08.820 William Cheng: Okay, so you will see that in this case when we go from h2, h3. We rehash everything in the end. How many this block. Have you dislike, do we have the access 259 00:35:09.300 --> 00:35:17.460 William Cheng: Okay, we need to access this this rock right and also we need to access this this box over here. So no matter how big you know this pretty good directly file is 260 00:35:17.670 --> 00:35:26.100 William Cheng: All we are doing over here is modify to this box. Okay, what about the indirect this one. This one also over here, we need to make it twice as big. So then we also need to modify this as blog. 261 00:35:26.310 --> 00:35:38.070 William Cheng: So in the end, by using extensible hashing by using one level interaction. All we're doing is that we're going to end up modifying three this block. Exactly. So, therefore, this is going to be order one even under the case where we have to rehash everything 262 00:35:38.880 --> 00:35:46.410 William Cheng: OK. So again, this show you the power of indirection. If you do this do it just the right way. He did everything is still going to be order one. Yeah. 263 00:35:49.500 --> 00:35:56.550 William Cheng: So that's by using the hash table or the next thing we're going to look at is how to actually use a tree data structure. So in this case, we're going to use a specialized data structure. 264 00:35:56.820 --> 00:36:07.860 William Cheng: That that's used instead of databases that know SP tree. Okay. So Peter is very different from binary tree as what is the binary tree right binary tree every internal know can have, you know, a most you know 265 00:36:08.940 --> 00:36:15.180 William Cheng: Has the most to two children right can have one children no children or to me or it has no told him to leave note. 266 00:36:15.660 --> 00:36:21.000 William Cheng: There is that we are binary tree for a big tree or the teacher has more than two children. 267 00:36:21.360 --> 00:36:31.860 William Cheng: Okay, so we're here, we're going to use the parameter and to show you what is the maximum number of children. They could have. So a Petri of order and has a following property get every know has less than or equal to n children. 268 00:36:32.550 --> 00:36:35.250 William Cheng: Are so for example if n equals two, three. 269 00:36:35.460 --> 00:36:45.570 William Cheng: That means every know can have one, two, or three children right over here can have three children. I'll be here. So this one, maybe have three children at this one has to. And this one, maybe have three children at the end of year or something like that. 270 00:36:46.140 --> 00:36:53.070 William Cheng: Yeah. So number one is that every know has less than or equal to em children. Number two is that every know has a minimum number of children. 271 00:36:53.310 --> 00:37:03.180 William Cheng: A greater than or equal to sitting on em over to children. So again, n equals 133 divided by two years at 1.5 and 1.5 of the seating. A 1.5 is going to be equal to two. 272 00:37:04.440 --> 00:37:08.910 William Cheng: Right, so that means that every know can have two or three children, it cannot have only one children. 273 00:37:09.390 --> 00:37:19.860 William Cheng: OK. So the example that show over here you have this one has has two children. This one has no children. In that case, so yeah. Did you know that could be leave now leave no has no children. So this one is actually sad. So I did quite 274 00:37:20.220 --> 00:37:23.880 William Cheng: Well this one is wrong. This one only has one children. So, therefore, this is actually no good. Right. 275 00:37:24.300 --> 00:37:31.800 William Cheng: Okay, so don't worry, this guy's the feature is kind of a weird tree it has, you know, it has at most three children or has, you know, at least two children. 276 00:37:32.250 --> 00:37:34.980 William Cheng: In the case where an equal to three, but 277 00:37:35.580 --> 00:37:45.180 William Cheng: The rule has at least two children, unless it's also a leaf. So you know if the route is also leave. Then in that case, you only have a tree on has one knows. So that's not very interesting. That's also a special case. 278 00:37:45.390 --> 00:37:55.170 William Cheng: Okay, so when you have a tree down as one know they, in that case, you know, the know that that will be the root and that doesn't have two children. Okay. Otherwise, a rule always have at least two children that 279 00:37:56.760 --> 00:38:01.950 William Cheng: The snare. That's one over here is a rule where it says all leave appear at the same level and carry no keys. 280 00:38:02.370 --> 00:38:10.950 William Cheng: And they're usually a metaphor metaphor of the drawing right so again. So remember, over here, the trees, the index structure. What it will do is it will try to try to to lead to a disbar 281 00:38:11.850 --> 00:38:21.030 William Cheng: OK, so those all got we're trying to implement a directory file the directory for the purpose of the different varieties that given a component name you tried to find that this blog that contain other. The, the, I know number 282 00:38:21.570 --> 00:38:32.550 William Cheng: One. Alright, so this guy is over the years, as you know, all the leaves or appear at the same level. So in that case, what, you know, what would the picture actually look like. So a lot of times when you see people are being drawn some people would rather be true, like this. 283 00:38:34.530 --> 00:38:44.040 William Cheng: Okay, just a big triangle. The reason for that is that all the leaf nodes. They're all at the bottom of the tree over here and all the leaves. And over here, they are the same distance from the route. 284 00:38:44.520 --> 00:38:47.340 William Cheng: Okay, so it's really we're looking tree over here has over here. 285 00:38:47.760 --> 00:38:53.370 William Cheng: All the leaves are the same level all the way up here at the same level and they carry no keys or whether we're just that will point to this blog. 286 00:38:53.490 --> 00:38:58.800 William Cheng: And instead of this blog, you're going to have the component name followed by I know number, right. So, in that case, they will they'll have the key and a value. 287 00:38:59.280 --> 00:39:06.450 William Cheng: Okay, so instead of over here as the bottom over here that those will be the one that I should point to the content of this mock number. I will tell you where to go. 288 00:39:07.590 --> 00:39:17.760 William Cheng: And you know the belief dot over here are typically you know omitted fauna joining. So what you want to do is that you want to have a way to actually reverse all the way to the bottom of the tree. And in that case, you're going to get the 289 00:39:18.060 --> 00:39:27.210 William Cheng: This book number and they use that number to actually open that this blog and all the this blog. Yeah, they have on the order of 100 to 200 entries. OK. So again, the goal is to reach that that 290 00:39:28.530 --> 00:39:36.120 William Cheng: Alone now leaf know with K children content k minus one keys over here. So remember, if you have a binary, you know, if you're if you're a binary tree over here. 291 00:39:36.660 --> 00:39:44.220 William Cheng: But if you have a binary tree inside the boundary job. We are going to store a key, right. The key tells you whether you should go left or right. Right. So if you try to look up a 292 00:39:45.090 --> 00:39:51.270 William Cheng: Particular component name, you need to compare against the key. If it's less than or equal to the key you go left, maybe also less than or equal to go left. 293 00:39:51.450 --> 00:40:00.270 William Cheng: If it's greater than you go right or or it can be. If it's less than you go left or if it's greater or equal going alright so again you know you have to employ. You have to decide which way. Which way which way you want to go. 294 00:40:00.720 --> 00:40:09.210 William Cheng: OK, so now if we have three key as as I saw it, but no has three children. In this case, what would you do well in that case, you would need to keys right one is K one the other ones K to 295 00:40:09.420 --> 00:40:17.640 William Cheng: If it's less than or equal to Ky you go left. If it's greater than one, but less than equal to k to you go to the middle. If it's greater than equal or greater than K to then go to the right. 296 00:40:18.780 --> 00:40:29.550 William Cheng: Okay, so if you have, if you have k children. So in that case, you need to read. Are you have k minus one key. So therefore, all the internal note inside the Petri they all store these k minus one case. 297 00:40:30.480 --> 00:40:42.270 William Cheng: Okay, so again, it depends on how many children have you have a store different number of keys. Okay, so. So in this case, what's interesting about the bee bee tree over here is that every internal know we hear what we try to do is that we try to fit them inside of this block. 298 00:40:43.500 --> 00:40:52.290 William Cheng: Okay, so just like a hash table right the conflict resolution. Jamie to fit inside of the inside of this blog in a Petri, you know, typically the M is a very, very large number 299 00:40:52.650 --> 00:40:57.900 William Cheng: Okay, and can be on the order of 50 or 100 or 200. Okay, so in that case is the tree that has lots 300 00:40:58.170 --> 00:41:06.540 William Cheng: Of lots of pointer over here. Again, the purpose for that is that every node over here on these to fit inside of this bra and we want this this block to be 301 00:41:06.840 --> 00:41:17.790 William Cheng: To be so as far as possible. Okay, so by requiring the you know the number of children to be between M AMP divided by two, that means that every note over here in the middle is at least 50% full 302 00:41:18.480 --> 00:41:22.830 William Cheng: Okay. So, this way we don't use up too much this space over here to store this index structure. 303 00:41:23.190 --> 00:41:35.850 William Cheng: Because, oh yeah, the beach or it's just an index structure. What are you trying to do is try to point to you to the display that contain the, the, the, you know, that the attitude of the country. The, the key value pairs. 304 00:41:38.640 --> 00:41:46.170 William Cheng: Oh, I also am is often lost to reduce the number of this access over here so I can be on the order of 50 or 100 or even they even got a 200 there. 305 00:41:46.980 --> 00:41:55.320 William Cheng: So if you have taken a database cause you have seen between already. And there are lots of different version of the tree, there's, you know, p star tree. There's B plus tree and there's be be there, you know, 306 00:41:55.800 --> 00:42:01.470 William Cheng: There's lots of different variety or b tree. We're going to look at one special version as know as the p plus three. 307 00:42:01.680 --> 00:42:12.030 William Cheng: So in this case, the internal notes does not contain any data. Right. So if you have a binary tree, you can actually store data inside a binary tree. So again, in this case the beach is just an index structure for you to get to the bottom. 308 00:42:12.480 --> 00:42:16.260 William Cheng: Okay, so the goal is always to get to the bottom, not gotten them to go to somewhere in the middle. 309 00:42:16.770 --> 00:42:24.360 William Cheng: So in the middle part over here. They're not only contains key again the notes are 50% for because we follow the Petri the beach rules over here that 310 00:42:25.050 --> 00:42:29.100 William Cheng: Also the leaf nodes over here. A link to to East sort of sequence or reversal. 311 00:42:30.000 --> 00:42:41.040 William Cheng: That. So if you have a data structure like this, right, you always go to the left or something that smaller and go to the right for something bigger than all the keys at the bottom over here. They are sorted linearly. You know, if you can actually go across. 312 00:42:41.670 --> 00:42:51.000 William Cheng: Okay, so some time. For example, you want to traverse with directory or directory file and you just want to list all the, you know, all the findings directory in assorted order. 313 00:42:51.510 --> 00:43:00.480 William Cheng: Okay, so here's what would you do, right, I mean, again, if you have learned are the binary surgery algorithm. They are something called in order to reversal Posada reversal reversal. 314 00:43:01.050 --> 00:43:08.790 William Cheng: So if you want to sort sort of traverse if I knew surgery you perform in order to reversal. So this way you will visit all the notes in the sort of order. 315 00:43:09.480 --> 00:43:15.090 William Cheng: Okay, for a b tree over here, as in that case where you can do is that is that you don't really want to go up and down. 316 00:43:15.300 --> 00:43:18.180 William Cheng: You know the speech data structure. Again, if you use a recursive algorithm you 317 00:43:18.390 --> 00:43:25.410 William Cheng: end up going up and down the beach over here. And again, you need to visit all you know many, many, you know, this blog over here and that will be really, really, really inefficient. 318 00:43:25.740 --> 00:43:35.010 William Cheng: Okay, so if you want to retrieve the content of the directory file in assorted order. All you have to do is do you go all the way to the left over here, find the smallest component, you know, 319 00:43:35.310 --> 00:43:40.170 William Cheng: He saw directory file and then the bottom part over here. They're actually linked together using a likeness. 320 00:43:41.370 --> 00:43:44.850 William Cheng: Okay, over here for a b plus treat the leaf. Those are linked together. 321 00:43:45.210 --> 00:43:50.640 William Cheng: He sorta sequencers reversal. So once you find the smallest component. And I'm over here is where you can do that. You can actually 322 00:43:50.850 --> 00:44:00.720 William Cheng: Just go across the the linguist upon your at the bottom over here. And this way you can actually retrieve all these data blocks over here without going up and down to be going up and down to be plus g 323 00:44:01.830 --> 00:44:11.340 William Cheng: Okay. A lot of times when he tried to retrieve a directory content you want things to be sorted. So, therefore, this is a very common operation. So the B plus tree is optimized for sequential reversal. 324 00:44:11.760 --> 00:44:16.320 William Cheng: You know, in a sort of manner, you know, for all the component name inside the directory file, man. 325 00:44:17.670 --> 00:44:26.070 William Cheng: Alright, so there are lots of big variation between is very complicated. So again, we only going to introduce to you to be true briefly I'll be showing you some example. 326 00:44:26.550 --> 00:44:28.470 William Cheng: Okay, so without going too much into it. Okay. 327 00:44:29.010 --> 00:44:37.710 William Cheng: All right. Here is a picture of order three is OK if it's order three every know has at least two or three children over here so you can see that this know has three pointer point two three children. 328 00:44:37.920 --> 00:44:48.060 William Cheng: This 1.23 children. This one part of three children. And this 1.2 children over here. So again, this way appointed directed directed this block of what the data is stored. Well, that's okay. This is that 329 00:44:48.600 --> 00:44:54.780 William Cheng: Sort of index structured over here. It looks like a triangular shape because all the leaf nodes are at the same level. 330 00:44:55.620 --> 00:45:05.460 William Cheng: Okay. So this guy is where you need to go to the bottom part over here, you need to retrieve all that this blog over here. So in this case, you're going to end up with the law, the law and performance right log up you know 331 00:45:06.660 --> 00:45:17.460 William Cheng: Log and over here. We're ends the number of internal notes inside this data structure. But in this case, since we're using a betrayal order n, the base over here is not equal to, to the biz op is actually what I am. 332 00:45:18.360 --> 00:45:28.650 William Cheng: Okay, so you can imagine it em over here, equal to 100 then in that case, you know the Lord based 100 event typically is going to be a very, very small number is going to equal to two on the worst case can be equal to three. 333 00:45:29.520 --> 00:45:37.350 William Cheng: Okay, so this way. Again, we can quickly go down the beach, very, very, very quickly in order for us to to to to to to to get to the data box now. 334 00:45:39.330 --> 00:45:49.950 William Cheng: So, so, so, so what I need to do is over here is that when we tried to perform insertion and deletion operation on the victory. If a know end up with one and four children. In that case, we need to take that. No, we need to break it up into two 335 00:45:50.160 --> 00:45:58.080 William Cheng: If I know when we started to be something arbitrary. If a know has less than two children. So in that case, we need to actually or we need to merge it with one of the neighbors. 336 00:45:58.410 --> 00:46:01.470 William Cheng: Okay, so let's take a look at how this operation actually works over here. 337 00:46:02.190 --> 00:46:09.270 William Cheng: So let's say for example, we try that again. Performing insertion operation we started being, you know, I guess in the hash function over here we are. Fritz into the 338 00:46:09.690 --> 00:46:17.730 William Cheng: Into the extensive extensible hashing. So now we're going to insert Lucy into our data structure. OK. So again, these are the keys over here inside internal know 339 00:46:18.210 --> 00:46:24.960 William Cheng: It will go into a string compare. If it's less than or equal. We're going to go to the left. If it's greater than or equal to sorry if it's greater than the first key aside. 340 00:46:26.550 --> 00:46:30.780 William Cheng: Let me look at the data structure over here. Okay. If it's greater than or equal. We're actually going to go to the right. 341 00:46:31.020 --> 00:46:39.150 William Cheng: There. So again, if it's less than the first key. We're going to go left. If is greater than or equal to first keep a less than a second key. We're going to go to the middle of is greater than or equal to the third kid over here's 342 00:46:39.390 --> 00:46:41.160 William Cheng: The second key over here. We're going to go to the right. 343 00:46:41.760 --> 00:46:48.900 William Cheng: Okay, so when we perform that look our function for Lucy when we try to insert something. I don't know if you remember your insertion algorithm for pop. I do surgery. 344 00:46:49.110 --> 00:46:57.420 William Cheng: In order for you to insert something you first you have to perform look up operation. Make sure you cannot find it. And then the last know that You visit that will be for you try to insert 345 00:46:58.200 --> 00:47:04.680 William Cheng: Where we have to create a new note. Okay, so we're going to do the same thing with a be treated. First we need to pick up a one look up operation. 346 00:47:05.160 --> 00:47:13.560 William Cheng: So in this guy's whereas Lucy so Lucy is L is bigger than I. So therefore, we're gonna we're gonna follow the middle point over here, it's less than ours. We're going to go to the middle over here. 347 00:47:13.830 --> 00:47:21.060 William Cheng: And then Lucy is bigger than Lisa, right. So, therefore, when you again, we need to Paul. The middle point over here. And then finally reached the end over here. 348 00:47:21.840 --> 00:47:28.080 William Cheng: That. So in this case, we found that that you know Lucy is not inside. Inside over here because we have these are Matthew and the call. 349 00:47:29.070 --> 00:47:36.690 William Cheng: Yeah. So in this case, what we need to do is that, so, so, so, so we cannot find Lucy, so that will be good news. And now we need to add Lucy into this data structure. 350 00:47:36.990 --> 00:47:43.230 William Cheng: So this is where she would are Lucy right Lucy need to be added these three to Lisa and Matthew because it needs to be in a sorted order. 351 00:47:43.710 --> 00:47:49.230 William Cheng: Okay, so in this case we only got one to end up with four entries over here inside the state of charge, and I will be too many 352 00:47:49.830 --> 00:47:56.970 William Cheng: Guys so they have in this case, we need to split this into two notes. Right. So when we split into two nodes, each one of them is going to contain two keys now. 353 00:47:57.600 --> 00:48:03.780 William Cheng: So I'm out of space over here. So I'm going to shuffle things around a little bit curious. I'm rune over here, right. So what we're gonna do want to create a new note over here. 354 00:48:04.020 --> 00:48:14.310 William Cheng: This new note we're going to copy Matthew and the call to the right over here. So we don't want to call it over here and Matthew over here, went to the end the call and Matthew from here and then we're going to put Lucy right here. 355 00:48:15.810 --> 00:48:22.560 William Cheng: Okay, so we finished doing this, you know, so, so this is going to be working right, because every note of your content, two or three keys. Right. So again, 356 00:48:22.800 --> 00:48:30.840 William Cheng: They can come to the big data structure. But what about this internal know we hear about this one, at least appointed the middle one over here. So this one is two pointers. 357 00:48:31.050 --> 00:48:36.570 William Cheng: Again, that's too many pointers over here. So we need to split this one into two. So I'm going to create a new note over here. 358 00:48:36.720 --> 00:48:48.660 William Cheng: And then the two right funders over here. We're going to copy them to the right. So, so this one is going to part right here and the right point oh pirate killed entity. These two pointer and now this you know this original, you know, in the original 359 00:48:49.050 --> 00:48:52.320 William Cheng: The intermediate note over here is gonna have to ponder. Ponder this one and this one. 360 00:48:53.400 --> 00:49:01.200 William Cheng: That but then again the parent over here is going to point, a point of four children over here. So, again, that doesn't really work. So again, we need to take this know is split into two 361 00:49:01.410 --> 00:49:06.870 William Cheng: And the original one keep track of the two left pointer over here and then they are the other one is going to keep track of the two right pointer. 362 00:49:07.590 --> 00:49:18.510 William Cheng: So now we're going to end up with a two room. Now that's not allowed. So therefore, we're going to create a new uno right here to keep track of these two and also we need to set up the keys over here just rise so that we know we need to go left or go right 363 00:49:19.080 --> 00:49:27.330 William Cheng: Okay, so I'm not going to go to the detail to see exactly how to build this data structure. Again, if you're interested in that you should take a database class. Yeah. So when you're done, it will look like this. 364 00:49:29.100 --> 00:49:33.480 William Cheng: Okay, so you will see that when the beach. He gets taller. He actually grow at the top. 365 00:49:34.860 --> 00:49:40.380 William Cheng: Okay, so that's the main difference between DP tree and other kind of tree right if you have a binary tree. Where do you grow you grow the bottom 366 00:49:40.680 --> 00:49:46.530 William Cheng: When you have a beach where you can grow at the top. But if you think about it actually makes sense, right, because the beach or it looks like a triangle over here. 367 00:49:46.950 --> 00:49:52.260 William Cheng: Okay, can I grow at the bottom. If I grow at the bottom over here. What, then, one of the leaf over here will be at the wrong level. 368 00:49:53.340 --> 00:50:03.210 William Cheng: Okay, so therefore I cannot grow at the bottom. So therefore, the only way I can go to the beach is new. What is to do something like this. Right. And now I'm going to end up with a tree. Again, we're all the leaf nodes over here. I had exactly the same level. 369 00:50:04.770 --> 00:50:12.810 William Cheng: OK, so the tree is kind of weird when you try to grow, you actually grow at the top, you know, like, like this, but that's one of the sort of interesting characteristic or b tree. Yeah. 370 00:50:15.420 --> 00:50:18.870 William Cheng: So, so, so, as it turns out, there's another you know you know so 371 00:50:19.470 --> 00:50:25.470 William Cheng: So, by the way, once you have the beach and become taller know in order for you to get to the bottom over here. Every time you add as you 372 00:50:25.740 --> 00:50:34.320 William Cheng: Actually have to have to retrieve for data blocks and now you know before you have to retrieve three data blocks and now you know you're going to be 33% slower. 373 00:50:35.070 --> 00:50:42.330 William Cheng: Okay. So in this case, when he tried to grow the big trees in Canada, very, very big penalty so therefore we should try very, very hard not to grow the b tree. 374 00:50:43.050 --> 00:50:50.880 William Cheng: Okay, so in that case, what other options do we have, right. So, here again we try to insert Lucy, Lucy needs to go right here and there's no room over here. What are the other option. 375 00:50:51.270 --> 00:50:55.200 William Cheng: Okay, so maybe, in this case, you know, maybe you try to use will include the new roommate and you find out 376 00:50:55.710 --> 00:51:03.420 William Cheng: That you don't fit. So maybe you can actually tell one of your roommate is the hey you know the other you know that the other room. There are people that you like, why don't you move to the other room. 377 00:51:03.960 --> 00:51:13.020 William Cheng: And so in this case we should look left and right over here that we can see that on the right over here. There's going to be a room that only has two keys. So if we ask Nicole over here to move to the 378 00:51:13.710 --> 00:51:19.110 William Cheng: To the room that everything's gonna be okay and then we'll end up creating room here for Lucy. Okay, so in that case we 379 00:51:20.220 --> 00:51:29.250 William Cheng: Will actually be able to. So in that case will be able to do that, we will move the call here to the right over here. But in this case, we need to update all these key. So if you have actually taken to 380 00:51:30.210 --> 00:51:41.010 William Cheng: This day to start your class there's an operation know it's a rotation operation. And so you can actually rotate know the call to the right over here. And then in this case we're going to end up creating space inside the snow and we can put those in there. 381 00:51:42.240 --> 00:51:50.280 William Cheng: Okay, what are the neighboring note over here doesn't have space. How far do we actually have to look to the right and to the left, to see if there is a place we can actually, you know, move them around. 382 00:51:50.460 --> 00:52:02.970 William Cheng: Because in the end, we don't want the Petri to grow because growing the Petri is a very expensive operation. OK. So again, the actual between algorithm is very, very complicated. I'm going to just show you what is going to look like without going through the details that 383 00:52:04.380 --> 00:52:09.630 William Cheng: I ended up deleting a particular note right we're going to delete out over here I again we need to perform a search 384 00:52:09.900 --> 00:52:15.360 William Cheng: On the route over here, auto is bigger than Matthew, we're going to go right auto is less than the richer, we need to go left. 385 00:52:15.570 --> 00:52:26.040 William Cheng: Auto is greater than or equal to auto. We're going to go to the right over here, where to find auto we're going to delete this note over here. So now in this know we only have one key that's not allowed. We have to either two or three key. 386 00:52:26.250 --> 00:52:34.080 William Cheng: Right. So in this case, what should we do well we can merge it with our neighbor over here because our neighbor has two keys if we merged with them going to end up with three keys that I will be perfectly fine. 387 00:52:34.290 --> 00:52:42.510 William Cheng: But when we do that, the parent over here only has one pointer. Right. So again, that's not allowed. So therefore, we're going to merge with the the liberty to note over here that we know has to ponder. 388 00:52:42.690 --> 00:52:46.860 William Cheng: If you merge it with them, then end up with three pointers over here again the parent over here is only 389 00:52:47.280 --> 00:52:52.740 William Cheng: Going to have one pointer because they've merged into one block. So in this case, again, we need to we need to merge. 390 00:52:53.190 --> 00:52:58.110 William Cheng: Merge with the neighbor, while we do that the rule will disappear. Okay. Because now we're going to end up with one route. 391 00:52:58.530 --> 00:53:07.920 William Cheng: Okay, so for beat tree. They grow at the top. They also shrink at the top. Again, the reason for that is that all the leaf know they have the bed. They have to be at exactly the same level but 392 00:53:10.110 --> 00:53:12.600 William Cheng: Also we have the same thing over here, you know, 393 00:53:13.710 --> 00:53:18.330 William Cheng: You know so. So again, when you try to change the height of the tree. So, you know, 394 00:53:19.020 --> 00:53:27.240 William Cheng: You actually gonna end up making a lot of changes over here. If we try not to do that. It is possible when we took the lead auto over here. 395 00:53:27.480 --> 00:53:35.580 William Cheng: Get another possibility that we can do over here is that when we do the auto be here, you know, maybe we can actually look for one roommate from your from another room to borrow the roommate. 396 00:53:35.820 --> 00:53:40.560 William Cheng: And then in this case we don't have to shuffle this data structure. So this guy is on the right over here. 397 00:53:41.490 --> 00:53:51.660 William Cheng: On the right over here we have, you know, we have three people over here. So if we cared about and we move richer from the right back into this note over here and now, every node has, you know, 398 00:53:52.200 --> 00:53:55.710 William Cheng: Has two keys. So in that case, we don't have to change our this particular data structure. 399 00:53:56.400 --> 00:54:04.110 William Cheng: But you got to be very careful because when you move ritual from the right to the left over here. Some of the keys over here will have to change because now, Richard moving to the left side. 400 00:54:04.500 --> 00:54:11.040 William Cheng: Okay, so, India, you still might have to end up I modify a lot of data structure over here because the change you can propagate all the way to the root of 401 00:54:11.670 --> 00:54:18.000 William Cheng: The tree guys have again the beach or younger than a very complicated. If you're interested, you should take a shower. Take a database class. 402 00:54:19.080 --> 00:54:25.170 William Cheng: All right, this is a good breaking point. So next I'm going to continue and look at some other issues, you know, instead of 403 00:54:26.490 --> 00:54:27.630 William Cheng: Instead of fastball system.