Quick-Start to Use Regular Expressions

In this article: Regular Expression is the language to describe a pattern of text, you can refer to Wikipedia for more details about Regular Expression. There are small differences between each variant of Regular Expression. In ULogViewer, we use .NET Regular Expression.

Basis of Language

Except for letters and digits, most of symbols such as . (dot) or , (comma) are used to describe pattern of text instead of symbol itself. You can use \ (backslash) as escape character to describe a symbol just like most of programming language. For example, \. means dot character and \( means left bracket character.
⬆️ Back to top

Special Characters

You can use the followings to represent special characters in Regular Expression:
  • . (dot)
    Represents ANY characters.
  • ^
    Represents start of text. Please note that it is not actually a character.
  • $
    Represents end of text. Please note that it is not actually a character.
  • \s
    Represents a whitespace.
  • \S
    Represents characters EXCEPT FOR whitespaces.
  • \d
    Represents a digit.
  • \D
    Represents characters EXCEPT FOR digits.
  • \w
    Represents a letter.
  • \W
    Represents characters EXCEPT FOR letters.
For example, you can use the following Regular Expressions to describe the pattern of "Hello World!":
  • Hello World\!
  • ^Hello World\!$
  • He...\sWo...\!
  • \w\w\w\w\w\s\w\w\w\w\w\!
  • ^\S\S\S\S\S \D\D\D\D\D\W
⬆️ Back to top

Character Group

You can use [ ] to describe a character in specific character group in Regular Expression:

Positive Character Group

Use [{characters...}] to match one of given characters, or [{character}-{character}...] to match characters in given range of characters. For example:
  • [abc]
    Represents a, b or c character.
  • [\w\d\s]
    Represents a letter, digit or whitespace character.
  • [a-k]
    Represents character in the range a, b, ..., k.
  • [0-9]
    Represents character in the range 0, 1, ..., 9.
  • [a-z\s]
    Represents character in the range a, b, ..., z or whitespace.
  • [\+0-9a-f]
    Represents character in the range 0, 1, ..., 9 or the range a, b, ..., f or + (plus).

Negative Character Group

Use [^{characters...}] to match character EXCEPT FOR given characters or range of characters. For example:
  • [^xyz]
    Represents character EXCEPT FOR x, y and z.
  • [^\s]
    Represents character EXCEPT FOR whitespace. This is same as \S.
  • [^0-9]
    Represents character EXCEPT FOR characters in the range 0, 1, ..., 9. This is same as \D.
  • [^\+0-9a-f]
    Represents character EXCEPT FOR characters in the range 0, 1, ..., 9 and the range a, b, ..., f and + (plus).
⬆️ Back to top

Grouping

You can use ( ) to describe a group (sequence of characters) in Regular Expression:

Anonymous Groups

Use ({expression}) to define an anonymous group. For example:
  • (\w\w\-\d\d)
    Represents a group with 5 characters: 2 letters, a hyphen and 2 digits.
  • (0x[0-9a-f][0-9a-f])
    Represents a group with 4 characters: 0, x and 2 hexadecimal digits.

Named Groups

Use (?<{name}>{expression}) to define a named group. For Log Patterns and Log Analysis, named groups are used for capturing sequence of characters and mark with specific name (usually a Name of Log Property). For Log Filtering, there is no special purpose of using named group.
For example:
  • (?<Date>\d\d\/\d\d)
    Represents a group with 5 characters and named Date: 2 digits, / (slash) and 2 digits.

Advanced Groups

⬆️ Back to top

Quantifiers

To describe the number of occurrence of character or group:
  • {character or group}*
    The character or group can occur zero or multiple times. For example:
    • a*
      a can occur zero or multiple times.
    • [\w\d]*
      Either letter or digit can occur zero or multiple times.
    • (Hello)*
      The word Hello can occur zero or multiple times.
  • {character or group}+
    The character or group should occur at least one time. For example:
    • .+
      At least one character should occur.
    • [\w\d]+
      A sequences mixed with letters and digits.
    • (Hello)+
      The word Hello should occur at least one time.
  • {character or group}?
    The character or group should occur zero or one time. For example:
    • .?
      A character or none.
    • \s?
      A whitespace or none.
  • {character or group}{{number}}
    The number of occurrence of character or group should be given value. For example:
    • \d{4}
      4 digits.
    • 0x[0-9a-f]{8}
      A hexadecimal number starting with 0x and containing 8 hexadecimal digits.
  • {character or group}{{number},{number}}
    The number of occurrence of character or group should be in given range. For example:
    • \d{1,8}
      1 to 8 digits.
  • {character or group}{{number},}
    The number of occurrence of character or group should AT LEAST be given value. For example:
    • \d{1,}
      1 or more digits which is same as \d+.
  • {character or group}{,{number}}
    The number of occurrence of character or group should AT MOST be given value. For example:
    • \d{,8}
      At most 8 digits or none.
⬆️ Back to top

Alternation

Use | to construct alternation/selection between two or more expressions. For example:
  • (a|b|c)
    A character which is either a, b or c. This is same as [abc].
  • (\w+|\d+)
    A character sequence consist of either letters or digits.
  • Hello (John|Kate)
    A sentence which is either "Hello John" or "Hello Kate". This is same as (Hello John|Hello Kate).
  • (USD|EUR)\$\d+(\.\d+)?
    A price with/without decimal places in either USD or EUR.
⬆️ Back to top

Samples

  • \d{2}:\d{2}:\d{2} (am|pm)
    A time in format Hour:Minute:Second followed by am or pm.
  • ^[\w\d\-]+(\s*\,|\s*[\w\d\-]+)*\.$
    A simple sentence consist of words, whitespaces, , (comma) and end with . (dot). Each word consist of letter, digit and hyphen.
  • [\+\-]?(0|[1-9]\d*)(\.\d{1,3})?
    A decimal number starting with/without sign. The integer part can be either zero or 1-9 followed by other digits. The number ends with at most 3 decimal places.
  • (?<Timestamp>\d{4}\-\d{1,2}\-\d{1,2}\s+\d{2}\:\d{2}\:\d{2})\s+(?<Level>\w+)\s+(?<Message>.*)
    Splits text into 3 named groups by continuous whitespaces:
    • Timestamp
      A timestamp consist of date (in format Year-Month-Day) and time (in format Hour:Minute:Second). One or more whitespaces are needed between date and time.
    • Level
      A word consist of letters.
    • Message
      A sentence starting with non-whitespace and extending to end of text.
⬆️ Back to top