C# - Regular Expressions: A Comprehensive Guide with Examples

Regular expressions (often abbreviated as regex or regexp) are powerful patterns used to match and manipulate text strings in various programming languages, including C#. A regular expression is a sequence of characters that defines a search pattern. It allows you to find, match, or replace parts of a string based on specific rules or patterns.

In C#, regular expressions are supported through the System.Text.RegularExpressions namespace, which provides the Regex class. This class allows you to create regular expressions and use them for pattern matching operations.

In this article, we’ll explore the basics of regular expressions, their key concepts, and how to use them in C# with practical examples.

Basic Concepts of Regular Expressions

Before diving into code, let’s understand the basic building blocks of regular expressions:

1. Literal Characters

Literal characters match exactly the same characters in the target string. For example, the regex cat matches the string "cat".

2. Metacharacters

Metacharacters have special meanings in regular expressions and allow you to define patterns more flexibly. Some common metacharacters include:

  • . (dot): Matches any single character except newline.
  • * (asterisk): Matches zero or more occurrences of the preceding character.
  • + (plus): Matches one or more occurrences of the preceding character.
  • ? (question mark): Matches zero or one occurrence of the preceding character.
  • | (pipe): Specifies alternatives (e.g., cat|dog matches "cat" or "dog").

3. Character Classes

Character classes allow you to match any character from a set of characters. For example:

  • [abc] matches "a", "b", or "c".
  • [a-z] matches any lowercase letter from "a" to "z".

4. Anchors

Anchors specify the position of a match within the string:

  • ^ matches the beginning of a line or string.
  • $ matches the end of a line or string.

5. Quantifiers

Quantifiers control the number of occurrences of a pattern:

  • a{2} matches exactly two "a" characters.
  • a{2,4} matches between two to four "a" characters.

6. Escape Sequences

Backslashes (\) are used to escape special characters. For example:

  • \. matches a literal period (.), not any character.
  • \d matches any digit (equivalent to [0-9]).

Using Regular Expressions in C#

To work with regular expressions in C#, you use the Regex class from the System.Text.RegularExpressions namespace. Here’s how you can use it:

Example: Matching and Replacing Text

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
	string input = "The quick brown fox jumps over the lazy dog.";

	// Create a Regex object with a pattern
	Regex regex = new Regex(@"\b\w{4}\b"); // Matches four-letter words

	// Match and print all occurrences of the pattern in the input string
	MatchCollection matches = regex.Matches(input);
	foreach (Match match in matches)
	{
		Console.WriteLine(match.Value);
	}

	// Replace occurrences of the pattern with a specific text
	string replacedText = regex.Replace(input, "****");
	Console.WriteLine(replacedText);
}
}

Explanation of the Code

  • Regex Pattern: The pattern @"\b\w{4}\b" is used to match four-letter words.
  • Matching: The Matches method finds all occurrences of the pattern in the input string and returns a MatchCollection.
  • Replacing: The Replace method replaces all matches of the pattern with the specified text ("****" in this case).

Output

over
lazy
**** quick brown **** jumps **** **** ****.

Practical Examples of Regular Expressions in C#

Let’s explore more examples to understand how regular expressions can be used in real-world scenarios.

Example 1: Validating an Email Address

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
	string email = "example@domain.com";
	Regex regex = new Regex(@"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$");

	if (regex.IsMatch(email))
	{
		Console.WriteLine("Valid email address.");
	}
	else
	{
		Console.WriteLine("Invalid email address.");
	}
}
}

Explanation

The regex pattern validates an email address:

  • ^[a-zA-Z0-9._%+-]+: Matches the local part of the email (before the @).
  • @[a-zA-Z0-9.-]+: Matches the domain part (after the @).
  • \.[a-zA-Z]{2,}$: Matches the top-level domain (e.g., .com, .org).

Example 2: Extracting Dates from a String

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
	string input = "Event dates: 2023-10-15, 2023-11-20, and 2024-01-05.";
	Regex regex = new Regex(@"\d{4}-\d{2}-\d{2}");

	MatchCollection matches = regex.Matches(input);
	foreach (Match match in matches)
	{
		Console.WriteLine(match.Value);
	}
}
}

Output

2023-10-15
2023-11-20
2024-01-05

Explanation

The regex pattern \d{4}-\d{2}-\d{2} matches dates in the format YYYY-MM-DD.

Example 3: Splitting a String by Multiple Delimiters

using System;
using System.Text.RegularExpressions;

class Program
{
static void Main()
{
	string input = "apple,banana;orange mango";
	Regex regex = new Regex(@"[,; ]+");

	string[] fruits = regex.Split(input);
	foreach (string fruit in fruits)
	{
		Console.WriteLine(fruit);
	}
}
}

Output

apple
banana
orange
mango

Explanation

The regex pattern [,; ]+ splits the string by commas (,), semicolons (;), or spaces ( ).

Tips for Using Regular Expressions

  • Keep It Simple: Start with simple patterns and gradually build complexity.
  • Test Your Patterns: Use online regex testing tools like regex101.com to test and debug your patterns.
  • Optimize for Performance: Complex regex patterns can be slow. Optimize them for better performance.
  • Use Comments: For complex patterns, use the RegexOptions.IgnorePatternWhitespace option to add comments and improve readability.

Conclusion

Regular expressions are a powerful tool for text manipulation in C#. They allow you to search, match, and replace text based on specific patterns. By understanding the basic concepts and practicing with examples, you can leverage regular expressions to solve a wide range of text-processing problems.

Whether you’re validating user input, extracting data, or transforming strings, regular expressions are an essential skill for any C# developer. Take the time to learn and experiment with them, and you’ll find them invaluable in your programming toolkit.