1
00:00:00,000 --> 00:00:14,759
Hey guys and welcome back.

2
00:00:14,759 --> 00:00:20,960
So in the previous nugget we explored some solutions that we could use in order to monitor

3
00:00:20,960 --> 00:00:25,760
our CPU utilization as well as our memory utilization.

4
00:00:25,760 --> 00:00:29,679
Now in this nugget right here we're going to be looking at some additional tools that

5
00:00:29,679 --> 00:00:38,760
we can use with respect to monitoring and we'll also focus on disk IE, disk input, output

6
00:00:38,760 --> 00:00:39,760
monitoring.

7
00:00:39,760 --> 00:00:43,960
Now the first command which I just wanted to touch upon is the uptime command so if

8
00:00:43,960 --> 00:00:48,799
we go into the man page here we can see here this is actually going to allow us to see

9
00:00:48,799 --> 00:00:52,520
how long our system has been running.

10
00:00:52,520 --> 00:00:55,440
It's also going to tell us our load averages.

11
00:00:55,440 --> 00:01:01,840
Now we're going to see three different values within our output with the uptime command

12
00:01:01,840 --> 00:01:07,600
and these three different values are going to correlate to three different time spans.

13
00:01:07,600 --> 00:01:14,400
The first one is one minute, the second one is five minutes and the last value correlates

14
00:01:14,400 --> 00:01:18,680
to the system load average over the last 15 minutes.

15
00:01:18,680 --> 00:01:25,080
Now when we're actually talking about the system load averages right here it says here

16
00:01:25,080 --> 00:01:32,240
this is the average number of processes that are either runnable or in an uninterruptible

17
00:01:32,240 --> 00:01:33,240
state.

18
00:01:33,240 --> 00:01:37,040
Now ultimately what we're really talking about here is CPU usage.

19
00:01:37,040 --> 00:01:44,920
So a runnable state is when the CPU is being used or a process is waiting to use a CPU.

20
00:01:44,920 --> 00:01:51,240
Now with respect to an uninterruptible state this is when the process is waiting for input

21
00:01:51,240 --> 00:01:58,879
output access i.e. the hard disk may be fully utilized therefore a particular process has

22
00:01:58,879 --> 00:02:03,599
to wait before it can access the disk in order to execute a particular task.

23
00:02:03,599 --> 00:02:10,000
Now these load averages we are seeing within this uptime command relates to this type of

24
00:02:10,000 --> 00:02:11,400
information right here.

25
00:02:11,400 --> 00:02:17,280
Now one thing to note about the output with respect to the uptime command and let me just

26
00:02:17,280 --> 00:02:19,200
press Q to clear my screen.

27
00:02:19,200 --> 00:02:24,960
Now when we're talking about our three different values of one minute, five minutes and fifteen

28
00:02:24,960 --> 00:02:29,360
minutes we're actually going to see the values denoted a particular way.

29
00:02:29,360 --> 00:02:38,040
Now if we happen to see the value 0.5 what this tells us is that the CPU is ultimately

30
00:02:38,040 --> 00:02:39,040
half used.

31
00:02:39,039 --> 00:02:40,039
Okay?

32
00:02:40,039 --> 00:02:43,039
Kind of makes sense 0.5 half used.

33
00:02:43,039 --> 00:02:50,439
If the CPU is at the value of one that means that it's being fully utilized whereas if

34
00:02:50,439 --> 00:02:58,799
it's at 0.25 then only a quarter of the CPU is being used and again you can see this at

35
00:02:58,799 --> 00:03:00,719
different time intervals.

36
00:03:00,719 --> 00:03:06,280
Now a few important points to note is that you might actually see a value such as 1.5

37
00:03:06,280 --> 00:03:12,000
this is going to tell you that your CPU really is being taxed because whilst it might not

38
00:03:12,000 --> 00:03:19,639
make sense that how can 1.5 of the CPU being used if one is the entire CPU being used what

39
00:03:19,639 --> 00:03:26,400
this is actually indicating to us is that some processes are having to wait and as such

40
00:03:26,400 --> 00:03:29,840
the CPU is in effect being overloaded.

41
00:03:29,840 --> 00:03:35,280
So if we get a value greater than one well we may have problems to address.

42
00:03:35,280 --> 00:03:39,640
Now one thing I just want to point out here is that the values I'm giving you relates

43
00:03:39,640 --> 00:03:43,719
to if you happen to have one CPU available.

44
00:03:43,719 --> 00:03:50,599
If you happen to have let's say for example four CPUs on your system this actually changes

45
00:03:50,599 --> 00:03:51,599
the values.

46
00:03:51,599 --> 00:03:56,639
So think about this if I happen to see the value from the uptime command and the value

47
00:03:56,639 --> 00:03:58,080
is at one.

48
00:03:58,080 --> 00:04:05,840
So 1.0 you may automatically think that the CPU is at full utilization.

49
00:04:05,840 --> 00:04:13,880
In actuality what this means in the context of having four CPUs only 25% of the CPU is

50
00:04:13,880 --> 00:04:19,000
being used ultimately what you're doing is you're dividing this number here by the number

51
00:04:19,000 --> 00:04:27,840
of CPUs so it's one divided by four CPUs is actually 0.25 so only 25% is being used

52
00:04:27,839 --> 00:04:35,319
again if we happen to see something like say 0.5 being used over four CPUs.

53
00:04:35,319 --> 00:04:42,479
The CPUs are actually not at half utilization instead it's going to be 0.5 divided by four

54
00:04:42,479 --> 00:04:48,719
instead this is only going to come out as 12.5% utilization.

55
00:04:48,719 --> 00:04:54,719
If we actually happen to run this command here uptime we can see these values right

56
00:04:54,720 --> 00:05:04,600
here 0.09, 0.07 and 0.01 again these are values over one minute in this case here five minutes

57
00:05:04,600 --> 00:05:08,360
in this case and fifteen minutes in this case.

58
00:05:08,360 --> 00:05:13,320
So the uptime command really is a valuable command that we can use to assess these types

59
00:05:13,320 --> 00:05:18,200
of values it really does give us insight into our system.

60
00:05:18,199 --> 00:05:25,879
Now the next thing I want to talk to you about is relating to IO weight i.e. input output

61
00:05:25,879 --> 00:05:28,039
weight so where am I going with this?

62
00:05:28,039 --> 00:05:36,560
So what IO weight actually is is ultimately a percentage of CPU actions that are waiting

63
00:05:36,560 --> 00:05:43,560
for disk access so like I happened to say when we talk about IO waiting if you had maybe

64
00:05:43,560 --> 00:05:50,439
say a program perhaps it might be a browser and your system is already using 100% of the

65
00:05:50,439 --> 00:05:57,079
disk then your browser which needs disk access in order to perform its actual task this process

66
00:05:57,079 --> 00:06:01,840
i.e. the process being used for the browser opening this is actually going to go into

67
00:06:01,840 --> 00:06:09,720
a state of IO blocked input output has been blocked in real terms what this means for

68
00:06:09,720 --> 00:06:16,440
us as the user is that the system is overloaded at least in respect to the disk utilization

69
00:06:16,440 --> 00:06:21,560
so when you try to open that browser you double click that icon the process might actually

70
00:06:21,560 --> 00:06:26,500
hang i.e. the browser is just not going to pop up on your screen you might be waiting

71
00:06:26,500 --> 00:06:32,560
a while because the process really does need to wait until the resources free and here

72
00:06:32,560 --> 00:06:40,319
is the thing the actual percentage of CPU processes that have to wait that is referred

73
00:06:40,319 --> 00:06:47,519
to as the IO weight percentage now one of the ways we can interrogate this type of information

74
00:06:47,519 --> 00:06:52,439
comes from my favorite command we just talked about it in the previous nugget that happens

75
00:06:52,439 --> 00:06:57,600
to be the top command let's try and check this out and see the information we can actually

76
00:06:57,600 --> 00:07:02,760
derive so what i'll do here is i'll just clear my screen right now and i'll use that

77
00:07:02,760 --> 00:07:09,560
good old command top so check this out there is a particular piece of information we can

78
00:07:09,560 --> 00:07:15,760
focus in relating to our IO weight if you can spot it here it's this one right here

79
00:07:15,760 --> 00:07:20,760
which is in fact let me get my drawing pen because it's not showing quite so well this

80
00:07:20,759 --> 00:07:29,240
one right here w a so this number is ultimately a percentage and right now it is at 0.0 that

81
00:07:29,240 --> 00:07:36,519
is good meaning that we don't have any CPU processes actually waiting so we know we actually

82
00:07:36,519 --> 00:07:43,159
have a tool downloaded on our system which we can use to stress our system to simulate

83
00:07:43,159 --> 00:07:48,399
this type of problem so that commands is going to be the stress command so what i will do

84
00:07:48,439 --> 00:07:55,000
actually is i'll go to my options and i will open a new terminal so we have a new terminal

85
00:07:55,000 --> 00:08:00,279
right here what i'm going to do is use this stress command if i happen to say man stress

86
00:08:00,279 --> 00:08:07,319
we can see we have the option of dash D or dash dash HDD this is going to spawn the number

87
00:08:07,319 --> 00:08:13,439
of workers spinning on right basically this is going to allow us to simulate a lot of

88
00:08:13,439 --> 00:08:19,079
disk utilization so what i'll do here then is i'll press q to quit to clear the screen

89
00:08:19,079 --> 00:08:26,319
and i will say stress dash dash HDD and i'll choose the number one for one hard drive process

90
00:08:26,319 --> 00:08:31,759
if i hit enter in fact before i actually do this let me minimize this keep an eye on this

91
00:08:31,759 --> 00:08:40,320
value here the w a value okay so if i hit enter now that is going to actually spin up

92
00:08:40,320 --> 00:08:46,040
a process and you can automatically see or rather instantly see should i say the waiting

93
00:08:46,040 --> 00:08:50,600
value the percentage is actually beginning to spike that is due to the fact that we're

94
00:08:50,600 --> 00:08:56,360
actually stressing our hard disk so to speak we are really using that resource and as such

95
00:08:56,360 --> 00:09:01,600
if i actually try to go and maybe say open a browser like here it might take a little

96
00:09:01,600 --> 00:09:07,160
bit of time to actually open up it's not going to be quite so snappy that is because a lot

97
00:09:07,159 --> 00:09:12,000
of my processes are actually having to wait right now they're not getting access to the

98
00:09:12,000 --> 00:09:17,039
disk therefore they can't operate as they ordinarily would so right now we can see here

99
00:09:17,039 --> 00:09:22,360
we're still waiting on this browser and waiting and waiting and waiting sadly our browser

100
00:09:22,360 --> 00:09:27,759
is kind of at the back of the queue it can't access this resource is being hogged so if

101
00:09:27,759 --> 00:09:34,079
i actually just stop stress by pressing ctrl c now straight away the browser is going to

102
00:09:34,080 --> 00:09:39,720
load up because we are no longer overloading the hard disk therefore the process could access

103
00:09:39,720 --> 00:09:44,639
the resource and lo and behold we could open up the browser but again think about what

104
00:09:44,639 --> 00:09:49,400
we're actually seeing here in this case we had a problem whereby our browser was very

105
00:09:49,400 --> 00:09:54,160
very slow to load if we just were happened to be looking at the top command we would

106
00:09:54,160 --> 00:10:01,759
have seen that the i o weight value was around 80% or 90% and we could actually tell or infer

107
00:10:01,759 --> 00:10:06,879
that the hard disk was being particularly taxed so straight away with our troubleshooting

108
00:10:06,879 --> 00:10:12,439
mind on we would have quite a good idea of where to go to solve that problem or the type

109
00:10:12,439 --> 00:10:18,840
of action we would need to take in order to solve that problem now honestly as helpful

110
00:10:18,840 --> 00:10:25,120
is top is and it is very helpful with respect to input output it is a little bit limited

111
00:10:25,120 --> 00:10:30,960
so what i want to do is to actually show you some additional tools that we can use if i

112
00:10:30,960 --> 00:10:36,840
press q to quit i'll just close this terminal for now and i'll also close my browser for

113
00:10:36,840 --> 00:10:42,240
now i'll clear the screen and what i'm going to do is install something called i o top

114
00:10:42,240 --> 00:10:50,080
this is like top but for input output so i'll say sudo apt install i o i can get it i o top

115
00:10:50,080 --> 00:10:54,960
hit enter type in the passwords this is going to begin installation just give it a little

116
00:10:54,960 --> 00:11:01,759
minute or so if i try to run the command i o top watch what happens operation not permitted

117
00:11:01,759 --> 00:11:06,600
you can probably guess where i'm going with this the way to actually use this command

118
00:11:06,600 --> 00:11:12,160
requires super user privileges so clear the screen and i'll say sudo i o top and i will

119
00:11:12,160 --> 00:11:17,800
hit enter so what we're seeing is top like information but we are really focusing on

120
00:11:17,800 --> 00:11:24,200
this right here the i o utilization and what processes happen to be using a lot of disk

121
00:11:24,759 --> 00:11:29,240
so what i can actually do here is i can go and open up a new terminal and we'll do a similar

122
00:11:29,240 --> 00:11:35,280
thing to what we saw before we'll use the stress command to actually put some stress on the

123
00:11:35,280 --> 00:11:41,720
disk so i'll say stress dash dash hdd and i'll just say one and now the stress command

124
00:11:41,720 --> 00:11:47,080
should actually be putting some pretty hard work on the i o but look at this we're actually

125
00:11:47,080 --> 00:11:51,640
getting to see the input output and getting to see what is taking up a lot of that so

126
00:11:51,639 --> 00:11:56,879
we can see the stress hd one taking up quite a lot of the resources and again if we go back

127
00:11:56,879 --> 00:12:02,960
here i just press ctrl c to stop this instantly that's going to come back down and the i o

128
00:12:02,960 --> 00:12:09,519
top command shows us that so a really valuable tool right here as opposed to just using the

129
00:12:09,519 --> 00:12:15,679
regular top command now what i will do here is i'll press q to quit this clear the screen

130
00:12:15,679 --> 00:12:22,279
i'll now say sudo apt install and the package i want to install is called sys stat if i

131
00:12:22,279 --> 00:12:27,120
hit enter here this is going to install the sys stat package just give it a little minute

132
00:12:27,120 --> 00:12:33,120
or so so now i should have access to a command called i o stat if i go into the page here

133
00:12:33,120 --> 00:12:39,519
this is going to report cpu statistics for input output so this is going to give us again

134
00:12:39,519 --> 00:12:46,199
just like with i o top i o related information and statistics now this is in my opinion not

135
00:12:46,199 --> 00:12:52,759
quite as good as i o top but nevertheless the examination does mention i o stat within

136
00:12:52,759 --> 00:12:58,159
the exam objectives so it does actually make sense that we are actually familiar with it

137
00:12:58,159 --> 00:13:02,439
so clear the screen and i'll just say i o stat and what this is ultimately doing is giving

138
00:13:02,439 --> 00:13:07,399
us a snapshot so we're getting to see a lot of information about data being written to

139
00:13:07,399 --> 00:13:12,240
disk and read from disk this is ultimately what is taxing the particular disk we can

140
00:13:12,240 --> 00:13:18,799
see the i o weight percentage right here which in this case is 1.77 quite low but like i

141
00:13:18,799 --> 00:13:24,299
say this is really just a snapshot whereas if you recall the i o top command was giving

142
00:13:24,299 --> 00:13:30,199
us real time updates and much more expansive so i can clear the screen another command

143
00:13:30,199 --> 00:13:35,879
we want to be able to be familiar with again not quite as good as just using i o top but

144
00:13:35,960 --> 00:13:41,840
the exam does require we are familiar if we go in to the man page for the sar command

145
00:13:41,840 --> 00:13:48,000
this is going to allow you to collect report or save system activity information so you

146
00:13:48,000 --> 00:13:53,439
can scroll right through this man page and you will see plenty options and plenty descriptions

147
00:13:53,439 --> 00:13:58,039
of what this command can actually do there is quite a lot here for now i'll press q

148
00:13:58,039 --> 00:14:04,360
what i'm going to do is i'm just going to say sar 1 and then 5 so what this command

149
00:14:04,360 --> 00:14:11,440
is going to do it's going to allow us to take 5 different data points 1 second apart so this

150
00:14:11,440 --> 00:14:17,279
is quite good to check how things progress over a time frame in this case here 5 seconds

151
00:14:17,279 --> 00:14:23,639
because we have 5 recordings every single second so it's a little bit better of a snapshot

152
00:14:23,639 --> 00:14:28,680
than i o stack gives us we have more data points for example but it's not quite like

153
00:14:28,759 --> 00:14:34,839
i say as good as i o top giving us that streaming information and continually updating but

154
00:14:34,839 --> 00:14:39,799
nevertheless let's hit enter and see how we go we're going to get 5 pieces of information so we

155
00:14:39,799 --> 00:14:48,039
start at 27 seconds we get our first data point at 28 that's number one then two three four and

156
00:14:48,039 --> 00:14:53,879
five so five data points one second apart we can see the percentage so we get statistics over a

157
00:14:53,960 --> 00:15:00,120
whole bunch of different types of information such as system or idle but really the one we are

158
00:15:00,120 --> 00:15:07,320
focusing on right now is i o weight as it transpires we are not actually taxing the i o so the values

159
00:15:07,320 --> 00:15:13,960
right here all happen to be zero but as you can expect we could like we say actually stress the

160
00:15:13,960 --> 00:15:20,840
hdd once again and we can rerun this command by hitting enter we get five different values right

161
00:15:20,920 --> 00:15:28,040
here and at the very bottom we get the average of those five values and again we could change

162
00:15:28,040 --> 00:15:35,480
these values to be maybe two and three meaning we will get three data points two seconds apart if

163
00:15:35,480 --> 00:15:41,560
we hit enter we can see here the actual times the first one on zero zero the next one on zero two

164
00:15:41,560 --> 00:15:48,040
the next one on zero four and at the bottom we get the average of those three over that time period

165
00:15:48,039 --> 00:15:53,399
so again another really cool command to be aware of now what i'll do here is i'll just press control

166
00:15:53,399 --> 00:15:58,759
c to stop this stress test in fact let me just close this terminal right here and i'll clear the

167
00:15:58,759 --> 00:16:05,799
screen now another command i want to show you is one called ls o f this is going to allow us to list

168
00:16:05,799 --> 00:16:12,039
our open files this should be one you will recall from the lp1 examination no doubt and like i say

169
00:16:12,039 --> 00:16:18,679
it's going to allow us to track the open files on your system being all the open files now if we

170
00:16:18,679 --> 00:16:24,279
just happen to use this command on its own ls o f we're going to be bombarded with an absolute

171
00:16:24,279 --> 00:16:31,240
megaton of information just far too much to actually read but we can do things such as filtering to

172
00:16:31,240 --> 00:16:38,679
make this information much more palatable so what i'll do actually let me just quickly open terminal

173
00:16:38,679 --> 00:16:43,959
and we'll just rerun this stress command what i can actually do here is i can run this command

174
00:16:43,959 --> 00:16:50,039
okay that would just show me absolutely everything but what if i happen to grep for stress itself

175
00:16:50,039 --> 00:16:56,919
now we can actually see here what files are being opened by the stress program so if we happen to

176
00:16:57,479 --> 00:17:04,200
note a particular process or a particular command is causing a lot of i o issues we can actually

177
00:17:04,200 --> 00:17:11,080
correlate what actual files are being opened are being used by that particular command or process

178
00:17:11,080 --> 00:17:16,600
so this just gives us more granular information that we can use to interrogate what is exactly going

179
00:17:16,600 --> 00:17:23,000
on now one point i should actually add and in fact again let me just stop this right here one point

180
00:17:23,000 --> 00:17:29,160
i should add is that with respect to disk i o problems one of the ways of course we can stop

181
00:17:29,160 --> 00:17:36,200
such a thing is by not running so many processes that are actually requiring that resource but as

182
00:17:36,200 --> 00:17:42,040
you know in the world of enterprise computing it's not quite so easy just to make sure that everyone

183
00:17:42,040 --> 00:17:48,759
is giving up processes that they might actually need in case we may need better solutions altogether

184
00:17:48,759 --> 00:17:56,040
now one of those solutions may be that the rate in reading of the hard disk is kind of slow and

185
00:17:56,039 --> 00:18:01,319
inefficient and this can actually cause things to be bogged down because the longer it takes a

186
00:18:01,319 --> 00:18:07,480
particular process to write into reads the longer it's going to engage that resource i e the more

187
00:18:07,480 --> 00:18:14,039
of a backlog on that resource is going to happen one of the best ways you can actually ease this

188
00:18:14,039 --> 00:18:21,159
type of problem is by changing things from a hard disk to an s sd a solid state drive this has got

189
00:18:21,240 --> 00:18:27,720
far far far faster and more efficient read and write operations and this will actually alleviate

190
00:18:27,720 --> 00:18:33,480
a lot of the problems that you're seeing if you do happen to be encountering a lot of i o weight

191
00:18:33,480 --> 00:18:39,640
on your system okay doc so that is us for our introduction and to be able to monitor our disk

192
00:18:39,640 --> 00:18:45,720
i o utilization like i say we saw a handful of different commands some more useful than others

193
00:18:45,720 --> 00:18:50,920
but as we know in these exams very often you still have to be able to be familiar with

194
00:18:51,160 --> 00:18:58,759
perhaps some less efficient and older commands such as what we saw with respect to the i o stat

195
00:18:58,759 --> 00:19:04,200
command for example so really what i'm saying is don't just skip over the commands may not be so

196
00:19:04,200 --> 00:19:10,440
obviously useful definitely dig in and be familiar with them at least just for the purposes of the

197
00:19:10,440 --> 00:19:15,880
examination okay doc so that is us for monitoring disk input output the next thing we have to tackle

198
00:19:16,200 --> 00:19:22,200
network input output and how we can actually monitor that too and well that is what we're

199
00:19:22,200 --> 00:19:27,160
going to be doing in the very next nuggets i hope this has been informative for you i'd like to thank

200
00:19:27,160 --> 00:19:28,280
you for viewing