C# - Regular Expressions: A Comprehensive Guide with Examples
Regular expressions (often abbreviated as regex or regexp) are powerful patterns used to match and manipulate text strings in various programming languages, including C#. A regular expression is a sequence of characters that defines a search pattern. It allows you to find, match, or replace parts of a string based on specific rules or patterns.
In C#, regular expressions are supported through the System.Text.RegularExpressions
namespace, which provides the Regex
class. This class allows you to create regular expressions and use them for pattern matching operations.
In this article, we’ll explore the basics of regular expressions, their key concepts, and how to use them in C# with practical examples.
Basic Concepts of Regular Expressions
Before diving into code, let’s understand the basic building blocks of regular expressions:
1. Literal Characters
Literal characters match exactly the same characters in the target string. For example, the regex cat
matches the string "cat"
.
2. Metacharacters
Metacharacters have special meanings in regular expressions and allow you to define patterns more flexibly. Some common metacharacters include:
.
(dot): Matches any single character except newline.
*
(asterisk): Matches zero or more occurrences of the preceding character.
+
(plus): Matches one or more occurrences of the preceding character.
?
(question mark): Matches zero or one occurrence of the preceding character.
|
(pipe): Specifies alternatives (e.g., cat|dog
matches "cat"
or "dog"
).
3. Character Classes
Character classes allow you to match any character from a set of characters. For example:
[abc]
matches "a"
, "b"
, or "c"
.
[a-z]
matches any lowercase letter from "a"
to "z"
.
4. Anchors
Anchors specify the position of a match within the string:
^
matches the beginning of a line or string.
$
matches the end of a line or string.
5. Quantifiers
Quantifiers control the number of occurrences of a pattern:
a{2}
matches exactly two "a"
characters.
a{2,4}
matches between two to four "a"
characters.
6. Escape Sequences
Backslashes (\
) are used to escape special characters. For example:
\.
matches a literal period (.
), not any character.
\d
matches any digit (equivalent to [0-9]
).
Using Regular Expressions in C#
To work with regular expressions in C#, you use the Regex
class from the System.Text.RegularExpressions
namespace. Here’s how you can use it:
Example: Matching and Replacing Text
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "The quick brown fox jumps over the lazy dog.";
// Create a Regex object with a pattern
Regex regex = new Regex(@"\b\w{4}\b"); // Matches four-letter words
// Match and print all occurrences of the pattern in the input string
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
// Replace occurrences of the pattern with a specific text
string replacedText = regex.Replace(input, "****");
Console.WriteLine(replacedText);
}
}
Explanation of the Code
- Regex Pattern: The pattern
@"\b\w{4}\b"
is used to match four-letter words.
- Matching: The
Matches
method finds all occurrences of the pattern in the input string and returns a MatchCollection
.
- Replacing: The
Replace
method replaces all matches of the pattern with the specified text ("****"
in this case).
Output
over
lazy
**** quick brown **** jumps **** **** ****.
Practical Examples of Regular Expressions in C#
Let’s explore more examples to understand how regular expressions can be used in real-world scenarios.
Example 1: Validating an Email Address
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string email = "example@domain.com";
Regex regex = new Regex(@"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$");
if (regex.IsMatch(email))
{
Console.WriteLine("Valid email address.");
}
else
{
Console.WriteLine("Invalid email address.");
}
}
}
Explanation
The regex pattern validates an email address:
^[a-zA-Z0-9._%+-]+
: Matches the local part of the email (before the @
).
@[a-zA-Z0-9.-]+
: Matches the domain part (after the @
).
\.[a-zA-Z]{2,}$
: Matches the top-level domain (e.g., .com
, .org
).
Example 2: Extracting Dates from a String
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "Event dates: 2023-10-15, 2023-11-20, and 2024-01-05.";
Regex regex = new Regex(@"\d{4}-\d{2}-\d{2}");
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
}
}
Output
2023-10-15
2023-11-20
2024-01-05
Explanation
The regex pattern \d{4}-\d{2}-\d{2}
matches dates in the format YYYY-MM-DD
.
Example 3: Splitting a String by Multiple Delimiters
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = "apple,banana;orange mango";
Regex regex = new Regex(@"[,; ]+");
string[] fruits = regex.Split(input);
foreach (string fruit in fruits)
{
Console.WriteLine(fruit);
}
}
}
Output
apple
banana
orange
mango
Explanation
The regex pattern [,; ]+
splits the string by commas (,
), semicolons (;
), or spaces (
).
Tips for Using Regular Expressions
- Keep It Simple: Start with simple patterns and gradually build complexity.
- Test Your Patterns: Use online regex testing tools like regex101.com to test and debug your patterns.
- Optimize for Performance: Complex regex patterns can be slow. Optimize them for better performance.
- Use Comments: For complex patterns, use the
RegexOptions.IgnorePatternWhitespace
option to add comments and improve readability.
Conclusion
Regular expressions are a powerful tool for text manipulation in C#. They allow you to search, match, and replace text based on specific patterns. By understanding the basic concepts and practicing with examples, you can leverage regular expressions to solve a wide range of text-processing problems.
Whether you’re validating user input, extracting data, or transforming strings, regular expressions are an essential skill for any C# developer. Take the time to learn and experiment with them, and you’ll find them invaluable in your programming toolkit.