Grabbing HTML Tags分析HTML标记
<TAG\b[^>]*>(.*?)</TAG> matches the opening and closing pair of a specific HTML tag. Anything between the tags is captured into the first backreference. The question mark in the regex makes the star lazy, to make sure it stops before the first closing tag rather than before the last, like a greedy star would do. This regex will not properly match tags nested inside themselves, like in <TAG>one<TAG>two</TAG>one</TAG>.
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1> will match the opening and closing pair of any HTML tag. Be sure to turn off case sensitivity. The key in this solution is the use of the backreference \1 in the regex. Anything between the tags is captured into the second backreference. This solution will also not match tags nested in themselves.
Trimming Whitespace去除空白
You can easily trim unnecessary whitespace from the start and the end of a string or the lines in a text file by doing a regex search-and-replace. Search for ^[ \t]+ and replace with nothing to delete leading whitespace (spaces and tabs). Search for [ \t]+$ to trim trailing whitespace. Do both by combining the regular expressions into ^[ \t]+|[ \t]+$ . Instead of [ \t] which matches a space or a tab, you can expand the character class into [ \t\r\n] if you also want to strip line breaks. Or you can use the shorthand \s instead.
IP AddressesIP地址
Matching an IP address is another good example of a trade-off between regex complexity and exactness. \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b will match any IP address just fine, but will also match 999.999.999.999 as if it were a valid IP address. Whether this is a problem depends on the files or data you intend to apply the regex to. To restrict all 4 numbers in the IP address to 0..255, you can use this complex beast: \b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b (everything on a single line). The long regex stores each of the 4 numbers of the IP address into a capturing group. You can use these groups to further process the IP number.
If you don't need access to the individual numbers, you can shorten the regex with a quantifier to: \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b . Similarly, you can shorten the quick regex to \b(?:\d{1,3}\.){3}\d{1,3}\b
数字范围
- 000..255: ^([01][0-9][0-9]|2[0-4][0-9]|25[0-5])$
- 0 or 000..255: ^([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])$
- 0 or 000..127: ^(0?[0-9]?[0-9]|1[0-1][0-9]|12[0-7])$
- 0..999: ^([0-9]|[1-9][0-9]|[1-9][0-9][0-9])$
- 000..999: ^[0-9]{3}$
- 0 or 000..999: ^[0-9]{1,3}$
- 1..999: ^([1-9]|[1-9][0-9]|[1-9][0-9][0-9])$
- 001..999: ^(00[1-9]|0[1-9][0-9]|[1-9][0-9][0-9])$
- 1 or 001..999: ^(0{0,2}[1-9]|0?[1-9][0-9]|[1-9][0-9][0-9])$
- 0 or 00..59: ^[0-5]?[0-9]$
- 0 or 000..366: ^(0?[0-9]?[0-9]|[1-2][0-9][0-9]|3[0-6][0-9]|36[0-6])$
-
To match a date in mm/dd/yyyy format, rearrange the regular expression to (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d . For dd-mm-yyyy format, use (0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\d\d . You can find additional variations of these regexes in RegexBuddy's library.
-
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b 电子邮件
评论