1
00:00:06,639 --> 00:00:09,900
While working with text files, you
are going to like regular expressions.

2
00:00:10,660 --> 00:00:15,789
Regular expressions or regex are text patterns that
are used by tools like grep and others.

3
00:00:16,949 --> 00:00:21,559
When you use regular expressions,
always put it between single quotes.

4
00:00:22,219 --> 00:00:26,434
That is because regular expressions
contain special characters and you want

5
00:00:26,434 --> 00:00:30,649
to avoid these special characters
to be interpreted as shell metacharacters.

6
00:00:31,570 --> 00:00:34,049
In regular expressions you
have these special characters.

7
00:00:35,149 --> 00:00:39,500
They look the same as
wildcard characters, also known as globbing.

8
00:00:40,039 --> 00:00:41,500
Make sure not
to confuse them.

9
00:00:42,060 --> 00:00:48,310
A star in a regular expression is a completely
different way as a star in a wildcard character.

10
00:00:49,429 --> 00:00:54,725
So that means that you can use a
command like grab a, where the first part identified

11
00:00:54,725 --> 00:01:00,020
by the single quotes is the regular expression
and the second part is is the globing pattern.

12
00:01:01,679 --> 00:01:05,204
Regular expresses are for use
with specific tools only, and these

13
00:01:05,204 --> 00:01:08,730
are utilities like grep and
vim and awk and sed.

14
00:01:09,390 --> 00:01:14,540
Mainly regular expresses are for utilities
that are doing something with text files.

15
00:01:16,120 --> 00:01:21,579
For more details, you can consult the man
page man page in section number 7 for regex.

16
00:01:23,319 --> 00:01:29,870
Regular expresses are often considered to be confusing, and
I can understand that now. Why is that so?

17
00:01:30,450 --> 00:01:34,060
That is because there are basic
regular expresses that work with most tools.

18
00:01:34,579 --> 00:01:37,900
There are also extended regular
expressions that don't always work.

19
00:01:38,859 --> 00:01:42,479
If it's an extended
regular expression, you need

20
00:01:42,479 --> 00:01:46,099
grep to identify it
as an extended regex.

21
00:01:47,239 --> 00:01:51,159
And some scripting languages like Perl come
with their own regular expresses as well.

22
00:01:52,909 --> 00:01:57,736
Now, as a result, you might be
googling for regular expressions and find something

23
00:01:57,736 --> 00:02:02,563
that looks very convenient, but it might
be a Perl regular expression, and that's

24
00:02:02,563 --> 00:02:07,390
not going to work if you
run it with commands like grep.

25
00:02:09,310 --> 00:02:11,590
Now let's talk
about using regular expressions.

26
00:02:12,330 --> 00:02:16,835
Now let me demonstrate regular expressions for
your convenience. The commands that I'm going

27
00:02:16,835 --> 00:02:21,340
to use are on the slide
as well, so let me show you.

28
00:02:23,219 --> 00:02:29,599
So for these demos, I created a file with the name
regex with a lot of text, which you can see right here.

29
00:02:30,460 --> 00:02:35,460
Now let's demonstrate
some regular expressions starting

30
00:02:35,460 --> 00:02:40,460
with grab, carrot
l on regex.

31
00:02:41,780 --> 00:02:45,500
What is that? Well, that is looking
for lines that are starting with an link.

32
00:02:46,039 --> 00:02:49,555
The caret is what we call an
anchor, and it looks for lines that

33
00:02:49,555 --> 00:02:53,069
they're starting with. We also have
the opposite, and that is the dollar.

34
00:02:53,650 --> 00:02:57,270
So anadolar is looking for
lines that are ending with.

35
00:02:58,289 --> 00:03:00,629
Then we have the
one character regular expression.

36
00:03:01,889 --> 00:03:05,854
So what can we
do? Well, we can

37
00:03:05,854 --> 00:03:09,819
do something like
grab, carat dot dollar.

38
00:03:10,419 --> 00:03:16,870
That means that it is looking for lines
that have one character and only one character.

39
00:03:17,409 --> 00:03:19,430
And as you can see, there
is one line that is matching.

40
00:03:20,409 --> 00:03:25,270
You can look for words
using grep between single quotes.

41
00:03:27,090 --> 00:03:32,905
B, anna, B on the
vowel, and that is looking

42
00:03:32,905 --> 00:03:38,719
for lines that have the
text Anna is a word.

43
00:03:39,259 --> 00:03:45,199
It might be starting with. It might be in the
middle of the line, but for sure it's a word.

44
00:03:45,759 --> 00:03:50,219
Now, as you notice, the B
followed by the text that you're looking

45
00:03:50,219 --> 00:03:54,680
for is not very clear, but
that's the way how it works.

46
00:03:55,400 --> 00:03:59,909
Then you have the zero
or more times a regular expression.

47
00:04:00,409 --> 00:04:03,930
So
grep n.star

48
00:04:03,930 --> 00:04:07,449
x
on MyFal.

49
00:04:08,819 --> 00:04:14,900
The star here is a modifier of the character
right before. So we are looking for a line

50
00:04:14,900 --> 00:04:20,980
that starts at an end. Then we have zero
or more characters, and at the end of the

51
00:04:20,980 --> 00:04:27,060
line we have an X. That's my val,
it's regex and there you can see the result.

52
00:04:27,860 --> 00:04:33,602
Also pretty convenient is
a regular expression for spaces

53
00:04:33,602 --> 00:04:39,345
or tabs. You can
use grab single quote, dot,

54
00:04:39,345 --> 00:04:45,087
dot star, double square
bracket, colon, space colon double

55
00:04:45,087 --> 00:04:50,829
square bracket, dot dot
star single quote in regex.

56
00:04:51,670 --> 00:04:58,069
Now that looks complicated. How do we interpret? Well, the
first dot is it should have at least one character.

57
00:04:58,670 --> 00:05:04,504
Then there's a dot star that means followed by
zero or more characters. But then we need a

58
00:05:04,504 --> 00:05:10,339
space or a tab or anything like that. And
that is followed again by at least one character.

59
00:05:11,100 --> 00:05:12,040
And this
is the result.

60
00:05:12,819 --> 00:05:15,639
Sadly, only one result,
but one is enough here.

61
00:05:16,519 --> 00:05:19,420
So these are examples
of base regular expressions.

62
00:05:20,560 --> 00:05:22,879
Let's have a look at
the next part of this demo.

63
00:05:27,920 --> 00:05:31,279
In the next example, we
have a few extended regular expressions.

64
00:05:31,850 --> 00:05:35,970
And extended regular expressions need to be
called with the grep minus uppercase e.

65
00:05:36,850 --> 00:05:42,050
So I'm going to look
for bi t on regex. What

66
00:05:42,050 --> 00:05:47,250
is that? Well, the preceding
character needs to occur one or

67
00:05:47,250 --> 00:05:52,449
more times, and as you
can see. Oh, one time.

68
00:05:52,589 --> 00:05:53,709
Well, one
time is enough.

69
00:05:54,790 --> 00:05:59,050
Then we also have the zero or one
time, and that will be a question mark.

70
00:05:59,889 --> 00:06:05,730
So bit is showing a match
on BT as well as bit.

71
00:06:07,670 --> 00:06:09,269
Then we have
the repeating operator.

72
00:06:09,990 --> 00:06:15,480
And in the repeating operator you
can use indicators to tell the

73
00:06:15,480 --> 00:06:20,970
regular expression that a character should
be occurring a couple of times.

74
00:06:20,970 --> 00:06:26,460
Now, the syntax is a little
bit awkward because this is what

75
00:06:26,459 --> 00:06:31,949
you use to indicate that we
have three ends in our text.

76
00:06:32,810 --> 00:06:38,225
So we are looking for the n, and the
n is followed by the repeating operator. Now, the

77
00:06:38,225 --> 00:06:43,640
challenge is that in the repeating operator there is
a curly brace and we need to escape the

78
00:06:43,639 --> 00:06:49,055
special meaning of the curly brace. And that is
why in front of the opening, as well as

79
00:06:49,055 --> 00:06:54,470
a closing curly brace, there's Backslash and oh
boy, nothing in there. Well, let's use echo bollen.

80
00:06:57,699 --> 00:07:02,240
Great. Then great. Then to my regex
file and then we try it again.

81
00:07:03,339 --> 00:07:09,444
Oh boy. What is going wrong? Well, I
can see what is going wrong. I make a

82
00:07:09,444 --> 00:07:15,550
beginner error and that is I'm using grep
e and grep e is for extended regular expressions.

83
00:07:16,129 --> 00:07:19,894
But if the pattern you
are looking for is a

84
00:07:19,894 --> 00:07:23,659
standard regular expression, you
get no match. No match.

85
00:07:25,490 --> 00:07:32,110
Now next let's look
for a string. That must

86
00:07:32,110 --> 00:07:38,730
be a word. So
grep on Anna in regex.

87
00:07:39,389 --> 00:07:40,730
I think we
already saw that.

88
00:07:41,970 --> 00:07:46,950
But what is going on? Well, I'm
making a typo. It shouldn't be N,

89
00:07:46,950 --> 00:07:51,930
should be B. And the final option
is a pretty powerful option as well.

90
00:07:52,730 --> 00:07:56,540
Grep e extended
regular expression this time.

91
00:07:57,100 --> 00:08:01,040
And we are
looking for either SVM

92
00:08:01,040 --> 00:08:04,980
or VMX in
the file regex.

93
00:08:05,560 --> 00:08:12,439
So that will give a match if either one of these texts
occurs. And if one of them doesn't occur, then it's okay as well.

94
00:08:13,240 --> 00:08:15,959
And that is also
an extended regular expression.

95
00:08:16,540 --> 00:08:20,465
That's fun, isn't it? These extended
regular expressions and base regular expressions

96
00:08:20,465 --> 00:08:24,389
that need to be addressed separately.
But it's the way it is.

97
00:08:25,050 --> 00:08:29,100
Now, as I know that you
can't get enough of it. In

98
00:08:29,100 --> 00:08:33,149
this final slide we have some
terrible advanced examples of regular expressions.

99
00:08:34,389 --> 00:08:39,899
Not because I'm expecting you to learn immediately
what this is all about, but because it's a

100
00:08:39,899 --> 00:08:45,409
part of the power of regular expressions and
it doesn't hurt to have seen some of it.

101
00:08:46,409 --> 00:08:51,519
So the first one is going
to search for IP addresses in log

102
00:08:51,519 --> 00:08:56,629
files, for instance, so sudo grab
minus e. It's an extended regular expression.

103
00:08:57,149 --> 00:09:03,279
And Here we go. Range 09
and repeating Operator 13 at least 1

104
00:09:03,279 --> 00:09:09,409
and maximum 3 followed by a
dot, but the dot is escaped. So

105
00:09:09,409 --> 00:09:15,539
we are really looking for a
dot and it for any character.

106
00:09:16,100 --> 00:09:20,900
And then we repeat this
09 again one up to three

107
00:09:20,900 --> 00:09:25,700
times. And we need to
repeat that a few more times.

108
00:09:29,519 --> 00:09:32,850
Let's do that
in var log messages.

109
00:09:35,840 --> 00:09:37,980
Oh no, no
result. That can't happen.

110
00:09:38,399 --> 00:09:42,534
Let me do
a logger 192.68.29.11

111
00:09:42,534 --> 00:09:46,669
and do it
again and aha.

112
00:09:46,929 --> 00:09:47,570
Do you
see that?

113
00:09:48,250 --> 00:09:52,720
It's small but it's there. The so called
typo. And now it's looking so much better.

114
00:09:53,759 --> 00:09:59,240
I add a dot instead of a comma between the
one and the three and of course then it doesn't work.

115
00:10:00,600 --> 00:10:06,309
So that's example number one. Now
do we have any Mac addresses

116
00:10:06,309 --> 00:10:12,019
in the log sudo grep e.
Now what is a Mac address?

117
00:10:12,860 --> 00:10:16,159
Well, a Mac
address is another pattern.

118
00:10:16,860 --> 00:10:20,129
And the pattern that we are
looking for is A F A09.

119
00:10:33,360 --> 00:10:38,269
Oh my goodness, that's a complex
pattern. So we're looking for uppercase

120
00:10:38,269 --> 00:10:43,179
a till F, lowercase A till
F and 0 followed by 9.

121
00:10:45,159 --> 00:10:46,559
And that's what
we do two times.

122
00:10:47,179 --> 00:10:53,210
So AB a 09, A,
stuff like that would all work.

123
00:10:53,950 --> 00:10:56,919
And all of that is
what we put between parentheses.

124
00:10:58,679 --> 00:11:05,605
And we tell the regular
expression to repeat that five times

125
00:11:05,605 --> 00:11:12,529
in total, followed by a
f once more, a FN092 times.

126
00:11:19,700 --> 00:11:23,870
And we check for
that in var log messages

127
00:11:23,870 --> 00:11:28,039
and there we can
see the Mac address.

128
00:11:29,500 --> 00:11:32,620
So that's a regular expression
that will look for Mac address.

129
00:11:33,500 --> 00:11:37,610
Then let's see how we
can look for dates in the

130
00:11:37,610 --> 00:11:41,720
4 digit year, 2 digit
month and 2 digit date format.

131
00:11:42,240 --> 00:11:49,108
Well, that will be sudo
grab e again and this time

132
00:11:49,108 --> 00:11:55,976
b. So we are looking
for a word 09 and we

133
00:11:55,975 --> 00:12:02,844
will do that four times
followed by 09 and now we're

134
00:12:02,844 --> 00:12:09,712
looking for months. So two
times only and again a 09,

135
00:12:09,711 --> 00:12:16,580
two times only as a
word in var log messages.

136
00:12:18,360 --> 00:12:23,900
And I'm getting an error here. Do you see the typo?
Well, we have an invalid range end and that means that

137
00:12:23,900 --> 00:12:29,440
we have something in the square brackets, curly braces and so
on that doesn't match. And now it's working so much better.

138
00:12:29,940 --> 00:12:30,980
And here
we see.

139
00:12:31,960 --> 00:12:37,850
Now the final one
is sudo grab minus

140
00:12:37,850 --> 00:12:43,740
O minus E followed
by HTTPs question mark,

141
00:12:43,740 --> 00:12:49,629
colon, slash, slash,
a Z a Z09.

142
00:12:51,649 --> 00:12:53,289
That's a pattern that
we have seen before.

143
00:12:54,309 --> 00:13:00,423
Dot, slash, question
mark equals underscore minus

144
00:13:00,423 --> 00:13:06,536
square bracket plus
on. Well, let's do

145
00:13:06,536 --> 00:13:12,649
that on var
log messages again.

146
00:13:14,730 --> 00:13:17,470
And that is
looking for URL format.

147
00:13:18,370 --> 00:13:24,993
Now what is this? Well, in this regular expression, the
plus at the end continues the regular expression as long as

148
00:13:24,993 --> 00:13:31,616
any character between the square bracket is encountered and it
will stop on finding an invalid character like space. That's the

149
00:13:31,616 --> 00:13:38,240
end of the URL. And that brings us a result
in the way that you can see it right here.

150
00:13:38,799 --> 00:13:43,599
And of course I don't expect you to
be able to produce irregular expresses like this

151
00:13:43,599 --> 00:13:48,399
immediately, but it does make sense having
seen them and knowing how to interpret them.
