1
00:00:00,000 --> 00:00:18,160
Hey guys and welcome back. So in the previous nugget we discussed the general concept around

2
00:00:18,160 --> 00:00:24,719
backups, we also discussed the type of directories that we may or may not want to target for

3
00:00:24,719 --> 00:00:30,640
our backups and now what I want to do is to continue further down this path of discussing

4
00:00:30,640 --> 00:00:37,200
backup strategies. So what I first want to talk about are the different types of backups that we

5
00:00:37,200 --> 00:00:44,159
can actually do. Now the very first type that we can do is something called a full backup. This name

6
00:00:44,159 --> 00:00:51,439
is rather self-explanatory. Simply put when we conduct a full backup of our data what it's going

7
00:00:51,439 --> 00:01:00,000
to do it's going to copy the entire data set. So absolutely everything that you have specified

8
00:01:00,000 --> 00:01:06,159
should be backed up will be backed up. So the advantages of this is that it's going to be the

9
00:01:06,159 --> 00:01:12,400
most robust in terms of backing up your data i.e it's going to provide the best protection for

10
00:01:12,400 --> 00:01:19,439
your data because absolutely everything is going to be copied but the reality is this is not always

11
00:01:19,439 --> 00:01:26,480
used. So the question may be why would you not want to invoke this type of backup? Well simply put

12
00:01:26,480 --> 00:01:33,359
because when you conduct a full backup this is ultimately going to take up a lot of storage.

13
00:01:33,359 --> 00:01:39,759
Think about it if you're copying all the data all the time that data has to be stored somewhere

14
00:01:39,759 --> 00:01:44,719
and the more data you're copying the more storage you are going to need and it is for this reason

15
00:01:44,719 --> 00:01:51,359
that most organizations will not rely on a full backup as their sole means of a backup strategy.

16
00:01:51,359 --> 00:01:58,640
And by the way it is not just the storage space that is a problem when dealing with a full backup

17
00:01:58,640 --> 00:02:05,039
you can imagine that the more data you have to copy the more time consuming that is therefore the

18
00:02:05,039 --> 00:02:11,840
process indeed does take a lot longer. So if you happen to be an organization who prioritizes time

19
00:02:11,840 --> 00:02:17,280
and efficiency and you don't necessarily have unlimited storage you're not exactly someone

20
00:02:17,280 --> 00:02:26,400
like say google you may instead rely on something called incremental backups. Now incremental backups

21
00:02:26,400 --> 00:02:34,240
were kind of designed to ultimately reduce the time it took to perform a backup as well as reduce

22
00:02:34,240 --> 00:02:40,640
the amount of storage needed to perform the backup. So that sounds like a big win we're getting

23
00:02:40,639 --> 00:02:47,039
our data backed up quicker and using less disk space. How on earth is this actually implemented

24
00:02:47,039 --> 00:02:53,279
well the reality is when we are talking about an incremental backup what we're doing is we are

25
00:02:53,279 --> 00:03:01,199
backing up data that has changed since the last backup. So think about this imagine that you have

26
00:03:01,199 --> 00:03:08,000
regular backups so really what you're going to do is a combination of a full backup as well as

27
00:03:08,000 --> 00:03:14,319
an incremental backup. So let's imagine let's say on the very first day of the week that would be

28
00:03:14,319 --> 00:03:20,240
maybe say in Monday. On a Monday the organization decides to do a full backup that means like we

29
00:03:20,240 --> 00:03:28,000
say absolutely everything is going to be copied and carried over onto some type of storage maybe

30
00:03:28,000 --> 00:03:34,159
say an external server. So a full backup is done however we don't want to repeat that process on

31
00:03:34,159 --> 00:03:40,400
a Tuesday because again it's going to take a long time and it's going to take up a lot of storage.

32
00:03:40,400 --> 00:03:45,919
So instead on Tuesday we could just do an incremental backup. Similarly you could do the

33
00:03:45,919 --> 00:03:52,319
same thing on Wednesday an incremental backup and the reality is you could do the same on Thursday

34
00:03:52,319 --> 00:03:58,319
and Friday. So think about it all we are doing is on the very first day we're copying all the data

35
00:03:58,319 --> 00:04:04,479
and on the next day if the data has not changed from the day before we don't actually have to

36
00:04:04,479 --> 00:04:11,039
copy anything there is no backup necessary because the data is the exact same as when we back to tub

37
00:04:11,039 --> 00:04:20,399
on Monday. However let's say on day three we happen to add file 1.txt just one additional file obviously

38
00:04:20,399 --> 00:04:26,000
this is not the most realistic example but just play along. So indeed with respect to this backup

39
00:04:26,079 --> 00:04:32,959
here the only thing that we have to copy is this file because this file is the only thing that has

40
00:04:32,959 --> 00:04:40,720
changed from when the last full backup was performed. Now this is actually more efficient with respect

41
00:04:40,720 --> 00:04:47,040
to backing up data because like I say we just have to back up those changes and if those changes

42
00:04:47,040 --> 00:04:52,800
are very minimal and that is going to be a very speedy process. The difference here though is that

43
00:04:52,800 --> 00:05:00,160
this can actually take a long time to restore our data because think about it what if you wanted to

44
00:05:00,160 --> 00:05:07,280
restore data from Wednesday. Well you couldn't just restore Wednesday on its own because that would

45
00:05:07,280 --> 00:05:14,639
only contain the differences that were implemented on Wednesday say for example that file 1.txt that

46
00:05:14,639 --> 00:05:20,240
is clearly not a full backup instead we would have to restore whatever it is on Wednesday and then

47
00:05:20,240 --> 00:05:25,840
we would have to restore whatever it was on Tuesday and then we would have to restore the full backup

48
00:05:25,840 --> 00:05:32,400
on Monday. So really we have to do multiple restores from multiple different points and build

49
00:05:32,400 --> 00:05:39,199
that data up gradually depending when it is you actually want to restore from. Also if you happen

50
00:05:39,199 --> 00:05:46,160
to have some type of damaged media say for example the disk that you're copying to is damaged you may

51
00:05:46,160 --> 00:05:52,880
only have some type of incomplete data recovery i.e. you do not get to restore all of the data.

52
00:05:52,880 --> 00:06:01,280
Now like I say we have the full backup we have the incremental backup and we also have one called a

53
00:06:01,280 --> 00:06:07,760
differential backup. So what exactly is a differential backup? Well a differential backup is

54
00:06:07,760 --> 00:06:15,040
really quite similar to an incremental backup. The main difference here is that when you use the

55
00:06:15,040 --> 00:06:22,319
incremental backup method as we just saw what we are actually copying is only the data that has

56
00:06:22,319 --> 00:06:28,879
changed since the last backup. So think about it to go back to our analogy on Monday we do our full

57
00:06:28,879 --> 00:06:36,320
backup on Tuesday let's say we add file 1.txt so when we do our backup this is the only file we

58
00:06:36,319 --> 00:06:46,480
have to backup then on Wednesday we add file 2.txt so all we have to do now is to copy and backup

59
00:06:46,480 --> 00:06:52,319
file 2 because that is the only difference between the last backup i.e what happened on Tuesday.

60
00:06:52,319 --> 00:06:58,560
Now with respect to our differential backup strategy the difference here is that every

61
00:06:58,560 --> 00:07:04,319
single backup we're doing with a differential backup is going to contain all of the data that

62
00:07:04,319 --> 00:07:10,319
was changed since the last full backup big difference. So think about this again we do our

63
00:07:10,319 --> 00:07:18,800
strategy on Monday we do our full backup on Tuesday let's just add file 1.txt so on this day here

64
00:07:18,800 --> 00:07:26,719
all we're going to do just like with our incremental backup we're only going to backup the changes

65
00:07:26,719 --> 00:07:31,839
from the last full backup which in this case here is file 1 nothing is changed here. However

66
00:07:31,839 --> 00:07:41,439
if we go to day 3 on Wednesday and we add in file 2.txt the differential backup is not just going

67
00:07:41,439 --> 00:07:49,039
to copy file 2 it's going to make a copy of file 2 yes but it's also going to make a copy of file 1

68
00:07:49,039 --> 00:07:54,879
because these are still changes that have been implemented since the last full backup so in

69
00:07:54,879 --> 00:08:00,799
day 3 here we're actually going to back up file 1 once again which was already copied yesterday

70
00:08:00,800 --> 00:08:08,240
as well as file 2 so a little bit different indeed so the advantage here is that we're going to have

71
00:08:08,240 --> 00:08:14,879
a shorter restore time because all we have to do is we pick a day that we want to restore from and

72
00:08:14,879 --> 00:08:21,280
that's going to have all of the changes from the last backup in one location and then we combine

73
00:08:21,280 --> 00:08:27,600
that with the last full backup. Now you might be thinking well a differential backup sounds like it's

74
00:08:27,760 --> 00:08:33,279
much better solution than an incremental backup why on earth would you ever want to choose an

75
00:08:33,279 --> 00:08:38,560
incremental backup well the reality is a differential backup even though it gives us

76
00:08:38,560 --> 00:08:45,200
similar properties but it does allow us to restore quicker the reality is the differential backup uses

77
00:08:45,200 --> 00:08:52,240
more storage so if you happen to be low on storage maybe still despite the longer restore time the

78
00:08:52,240 --> 00:08:57,680
incremental backup would be a better solution for you whereas if you have a little bit more

79
00:08:57,680 --> 00:09:04,320
storage to spare then indeed the differential backup strategy might be the best way to go and of

80
00:09:04,320 --> 00:09:10,799
course if you happen to have a ton of available storage and a lot of free time then indeed you

81
00:09:10,799 --> 00:09:17,759
could just back up absolutely everything with a full backup every single day but again this begs

82
00:09:17,759 --> 00:09:23,439
question do we have to back up every day is that just a given that we have to implement this well

83
00:09:23,439 --> 00:09:29,279
the answer to that is no of course what we want to do is we want to use our judgment so maybe it

84
00:09:29,279 --> 00:09:35,120
makes more sense for your company to back up maybe twice a week or once a week really this is going

85
00:09:35,120 --> 00:09:41,840
to come down to you understanding what it is you want to back up i.e what file systems and think

86
00:09:41,840 --> 00:09:48,560
about how often that data actually changes and not just how often it changes how much it changes

87
00:09:48,560 --> 00:09:54,399
if we happen to have a lot of changes that would be hard to restore and recreate if we lost that

88
00:09:54,399 --> 00:10:00,800
data then it makes sense to implement more frequent backups to protect you from that event of a

89
00:10:00,800 --> 00:10:07,840
catastrophic data loss however if your data is relatively unchanging and even if it does change

90
00:10:07,840 --> 00:10:13,759
it only changes by a small amount well that may tell you that maybe you can relax a little bit

91
00:10:13,759 --> 00:10:19,280
and back up a little less frequently but the reality is is that you maybe have to introduce

92
00:10:19,280 --> 00:10:26,480
a combination of backup strategies maybe for some data you want to be doing maybe say a full backup

93
00:10:27,040 --> 00:10:32,639
regularly because it's super valuable it's always changing you cannot afford to lose it but there

94
00:10:32,639 --> 00:10:37,840
maybe are other disks on your systems other partitions whatever it may be other file systems

95
00:10:37,840 --> 00:10:43,759
maybe you want to back up that data a little less frequently and again there is no absolute perfect

96
00:10:43,759 --> 00:10:49,679
golden rule here this is again one of the cases whereby we want to understand the variables that

97
00:10:49,679 --> 00:10:56,799
play here and use our best judgment to inform our decision now one final component we want to talk

98
00:10:56,799 --> 00:11:03,839
about is where exactly should we store this data does it just go to the cloud and disappear into

99
00:11:03,839 --> 00:11:09,120
the sky well to be honest there are different solutions that we can use and we have seen

100
00:11:09,120 --> 00:11:14,959
different solutions historically and we have to understand what those different solutions are so

101
00:11:14,959 --> 00:11:22,319
one of the original solutions the og would be magnetic tape okay now one of the advantages

102
00:11:22,320 --> 00:11:28,320
of using magnetic tape is that this would be fairly cheap to use so it's quite cost effective in that

103
00:11:28,320 --> 00:11:34,400
sense similarly you also get some fairly good storage capacity with this media it's cheap you

104
00:11:34,400 --> 00:11:40,879
get a lot of it you can store a lot of data you can also introduce fairly good security on this media

105
00:11:40,879 --> 00:11:47,120
and it is generally quite reliable to use now the problem here with the tape is that you have fairly

106
00:11:47,200 --> 00:11:53,440
slow data access so it's not exactly going to be like an ssd whereby you can get quick reading

107
00:11:53,440 --> 00:12:00,000
rights and quite honestly when you're using magnetic tape it does require a lot of maintenance

108
00:12:00,000 --> 00:12:05,679
just by the physical nature of the media itself so as an aside with respect to magnetic tape we

109
00:12:05,679 --> 00:12:11,279
know we have the command tar this is where the name of this utility actually comes from it comes

110
00:12:11,279 --> 00:12:18,240
from a tape archive so this was the media in mind when this utility was first developed now as an

111
00:12:18,240 --> 00:12:24,319
aside we also have out with the tar command we also have a command called mt let me show you what

112
00:12:24,319 --> 00:12:31,199
this is and we can see here this command is going to allow us to control magnetic tape drive operation

113
00:12:31,199 --> 00:12:37,679
so if you happen to be working with magnetic tape this is the linux based tool that you would want

114
00:12:37,679 --> 00:12:44,239
to use in order to control this media okay so we have magnetic tape here the next one we can talk

115
00:12:44,239 --> 00:12:51,919
about is optical storage this would be something like say a cd rom the advantage of the optical media

116
00:12:51,919 --> 00:12:57,120
is that it's fairly fast it's not the fastest of course not like ssd but it still gives you

117
00:12:57,120 --> 00:13:03,519
fairly good speed it's also fairly cheap to buy so once again this is going to be a winner with

118
00:13:03,519 --> 00:13:09,919
respect to enterprises that don't want to spend a lot of money on storage it's also easy to obtain

119
00:13:09,919 --> 00:13:16,480
pretty much you can get optical media from almost anywhere it's readily available and by the way

120
00:13:16,480 --> 00:13:22,319
the physical size of the media itself i.e the discs don't take up too much room so you can

121
00:13:22,319 --> 00:13:27,840
actually store a lot of data on these physical discs and not have these physical discs taken up a

122
00:13:27,920 --> 00:13:33,920
massive amount of space within your physical location say for example your office or whatever

123
00:13:33,920 --> 00:13:40,080
it may be now the problem with optical media is that it doesn't have very great storage capacity

124
00:13:40,080 --> 00:13:44,639
so whilst it doesn't actually take up much room physically it also doesn't allow you to save

125
00:13:44,639 --> 00:13:52,160
that much data relative to other storage media also it's got rather fairly low durability if you

126
00:13:52,240 --> 00:13:58,799
remember the days of cds or you use dvds you may remember that you can easily scratch these discs

127
00:13:58,799 --> 00:14:05,039
and if you scratch these discs the disc starts to begin to skip because the media on that disc

128
00:14:05,039 --> 00:14:10,240
has now become corrupted and it cannot be correctly read this is the same problem here

129
00:14:10,240 --> 00:14:16,639
whilst storing your system data on such a media if you happen to maybe leave the disc out and it

130
00:14:16,639 --> 00:14:23,679
gets scratched or whatever it may be the data can quite easily become corrupted and inaccessible

131
00:14:23,679 --> 00:14:30,240
certainly something to consider also with respect to your optical media one of the things it may

132
00:14:30,240 --> 00:14:36,399
actually implement is that it can only be written to once i.e you're not just going to be able to

133
00:14:36,399 --> 00:14:42,879
easily reuse and overwrite data on this media which means instead you would have to go out and buy

134
00:14:42,879 --> 00:14:49,840
more blank discs also with respect to our optical drive it still is not quite the fastest as well so

135
00:14:49,840 --> 00:14:56,000
i would still say the speed is something that is left to be desired next of course we have the

136
00:14:56,000 --> 00:15:01,439
physical hard disk the advantage is here is that the hard disk is relatively fast we can read and

137
00:15:01,439 --> 00:15:07,919
write data fairly quickly from this media it is also very widely available it's not hard to get your

138
00:15:07,919 --> 00:15:15,199
hands on physical hard drives and of course hard drives especially nowadays can actually offer you

139
00:15:15,199 --> 00:15:21,759
fairly good storage limits so you can store a lot of data on this media the problem may be with a

140
00:15:21,759 --> 00:15:27,360
hard disk depending on the type of hard disk is that you may have a rather sizable hard disk so

141
00:15:27,360 --> 00:15:32,799
it's not the most portable to take about with you it's not like say for example a usb thumb drive

142
00:15:32,799 --> 00:15:40,159
and hard drives can also sometimes lack durability i.e if you're not careful the data itself can

143
00:15:40,159 --> 00:15:47,039
easily become corrupted which is of course a big disaster scenario when this contains your backup

144
00:15:47,039 --> 00:15:53,120
data now of course the other option we could have is some type of cloud storage the advantage of the

145
00:15:53,120 --> 00:15:58,959
cloud storage would be that the availability is excellent assuming you have some type of internet

146
00:15:58,960 --> 00:16:06,160
connection you could be on vacation in paris and be able to get a wi-fi access and download your

147
00:16:06,160 --> 00:16:12,720
data remotely the data in the cloud is also very typically very well secured because you are relying

148
00:16:12,720 --> 00:16:18,960
on a lot of these massive corporations which have very strong legal requirements to sufficiently

149
00:16:18,960 --> 00:16:24,320
protect your data so they have all the incentives to pay good engineers to ensure good security is

150
00:16:24,320 --> 00:16:29,520
in place of course that is not always a given but generally speaking good security is a feature of

151
00:16:29,520 --> 00:16:35,040
the cloud but of course the problem with the cloud is that depending on how much you use the running

152
00:16:35,040 --> 00:16:42,640
costs of storing this data could potentially add up and become costly so really all of these different

153
00:16:42,640 --> 00:16:49,040
factors and the type of backup strategy we want to implement a full backup a differential backup

154
00:16:49,120 --> 00:16:55,839
or an incremental backup as well as the actual media that we want to backup to say for example

155
00:16:55,839 --> 00:17:02,159
a magnetic tape or in more modern terms the clouds all of these things have to be measured up and

156
00:17:02,159 --> 00:17:08,879
tailored to suit your own needs and like so often there is no absolute golden rule the best thing

157
00:17:08,879 --> 00:17:15,759
you can do is to understand the pros and the cons and use your best judgments now there are some

158
00:17:15,759 --> 00:17:20,879
additional tools that we have to know for the purposes of the LPIC 2 examination that we can

159
00:17:20,879 --> 00:17:26,079
use directly on our Linux systems so what exactly are those tools well that's what we're talking about

160
00:17:26,079 --> 00:17:31,599
in the very next nuggets I hope this has been informative for you and I'd like to thank you for viewing

