In this article:
Regular Expression is the language to describe a pattern of text, you can refer to
Wikipedia for more details about Regular Expression.
There are small differences between each variant of Regular Expression.
In ULogViewer, we use
.NET Regular Expression.
Except for letters and digits, most of symbols such as
. (dot) or
, (comma) are used to describe pattern of text instead of symbol itself.
You can use
\ (backslash) as escape character to describe a symbol just like most of programming language.
For example,
\. means dot character and
\( means left bracket character.
⬆️
Back to top
You can use the followings to represent special characters in Regular Expression:
-
. (dot)
Represents ANY characters.
-
^
Represents start of text. Please note that it is not actually a character.
-
$
Represents end of text. Please note that it is not actually a character.
-
\s
Represents a whitespace.
-
\S
Represents characters EXCEPT FOR whitespaces.
-
\d
Represents a digit.
-
\D
Represents characters EXCEPT FOR digits.
-
\w
Represents a letter.
-
\W
Represents characters EXCEPT FOR letters.
For example, you can use the following Regular Expressions to describe the pattern of
"Hello World!":
- Hello World\!
- ^Hello World\!$
- He...\sWo...\!
- \w\w\w\w\w\s\w\w\w\w\w\!
- ^\S\S\S\S\S \D\D\D\D\D\W
⬆️
Back to top
You can use
[ ] to describe a character in specific character group in Regular Expression:
Positive Character Group
Use
[{characters...}] to match one of given characters,
or
[{character}-{character}...] to match characters in given range of characters. For example:
-
[abc]
Represents a, b or c character.
-
[\w\d\s]
Represents a letter, digit or whitespace character.
-
[a-k]
Represents character in the range a, b, ..., k.
-
[0-9]
Represents character in the range 0, 1, ..., 9.
-
[a-z\s]
Represents character in the range a, b, ..., z or whitespace.
-
[\+0-9a-f]
Represents character in the range 0, 1, ..., 9 or the range a, b, ..., f or + (plus).
Negative Character Group
Use
[^{characters...}] to match character
EXCEPT FOR given characters or range of characters. For example:
-
[^xyz]
Represents character EXCEPT FOR x, y and z.
-
[^\s]
Represents character EXCEPT FOR whitespace. This is same as \S.
-
[^0-9]
Represents character EXCEPT FOR characters in the range 0, 1, ..., 9. This is same as \D.
-
[^\+0-9a-f]
Represents character EXCEPT FOR characters in the range 0, 1, ..., 9 and the range a, b, ..., f and + (plus).
⬆️
Back to top
You can use
( ) to describe a group (sequence of characters) in Regular Expression:
Anonymous Groups
Use
({expression}) to define an anonymous group. For example:
-
(\w\w\-\d\d)
Represents a group with 5 characters: 2 letters, a hyphen and 2 digits.
-
(0x[0-9a-f][0-9a-f])
Represents a group with 4 characters: 0, x and 2 hexadecimal digits.
Use
(?<{name}>{expression}) to define a named group.
For
Log Patterns and
Log Analysis, named groups are used for capturing sequence of characters and mark with specific name (usually a
Name of Log Property).
For
Log Filtering, there is no special purpose of using named group.
For example:
Advanced Groups
⬆️
Back to top
To describe the number of occurrence of character or group:
-
{character or group}*
The character or group can occur zero or multiple times. For example:
-
a*
a can occur zero or multiple times.
-
[\w\d]*
Either letter or digit can occur zero or multiple times.
-
(Hello)*
The word Hello can occur zero or multiple times.
-
{character or group}+
The character or group should occur at least one time. For example:
-
.+
At least one character should occur.
-
[\w\d]+
A sequences mixed with letters and digits.
-
(Hello)+
The word Hello should occur at least one time.
-
{character or group}?
The character or group should occur zero or one time. For example:
-
.?
A character or none.
-
\s?
A whitespace or none.
-
{character or group}{{number}}
The number of occurrence of character or group should be given value. For example:
-
{character or group}{{number},{number}}
The number of occurrence of character or group should be in given range. For example:
-
{character or group}{{number},}
The number of occurrence of character or group should
AT LEAST be given value. For example:
-
{character or group}{,{number}}
The number of occurrence of character or group should
AT MOST be given value. For example:
⬆️
Back to top
Use
| to construct alternation/selection between two or more expressions. For example:
-
(a|b|c)
A character which is either a, b or c. This is same as [abc].
-
(\w+|\d+)
A character sequence consist of either letters or digits.
-
Hello (John|Kate)
A sentence which is either "Hello John" or "Hello Kate". This is same as (Hello John|Hello Kate).
-
(USD|EUR)\$\d+(\.\d+)?
A price with/without decimal places in either USD or EUR.
⬆️
Back to top
-
\d{2}:\d{2}:\d{2} (am|pm)
A time in format Hour:Minute:Second followed by am or pm.
-
^[\w\d\-]+(\s*\,|\s*[\w\d\-]+)*\.$
A simple sentence consist of words, whitespaces, , (comma) and end with . (dot). Each word consist of letter, digit and hyphen.
-
[\+\-]?(0|[1-9]\d*)(\.\d{1,3})?
A decimal number starting with/without sign. The integer part can be either zero or 1-9 followed by other digits. The number ends with at most 3 decimal places.
-
(?<Timestamp>\d{4}\-\d{1,2}\-\d{1,2}\s+\d{2}\:\d{2}\:\d{2})\s+(?<Level>\w+)\s+(?<Message>.*)
Splits text into 3 named groups by continuous whitespaces:
-
Timestamp
A timestamp consist of date (in format Year-Month-Day) and time (in format Hour:Minute:Second). One or more whitespaces are needed between date and time.
-
Level
A word consist of letters.
-
Message
A sentence starting with non-whitespace and extending to end of text.
⬆️
Back to top