Now I've seen plenty of tutorials and documentation mentioning some of the special characters inside a character class. The hat (^) symbol means negate the following characters if it is the first character in the class.
Ex. [^ABC] means not A or B or C
A lesser known special character is the period (.) Which outside the character class usually means any character except a newline but inside the character class matches only a period.
In the string 123.456
.+ matches the entire string
but [.]+ only matches the period in the string
I say this is lesser known because most of the expressions I’ve seen escape the period within the character class [\.] While this isn't wrong is also isn't necessary but I think a lot of people think it is.
The only other special character I see mentioned regarding a character class is \b which has a totally different meaning inside the brackets than outside.
But the special character I haven't seen mentioned is the dash (-)
While working on my latest version of my datetime regex I realized something tricky about character classes. I use the dash as one of my three date separators a period or a slash or a dash using the character class [./-] notice that none of the characters are escaped . I mentioned to someone about adding an additional separator, a colon (:), to the class when I realized you can't just add it to the end of the existing characters. [./-:] does not simply add the colon to the previous group of three characters. It also adds all of the digits and removes the dash. Though most may know you can use the dash to specify a range [a-zA-Z] for all the characters in the English alphabet or [0-9] ,for digit zero through nine, are some common constructs. But it's not restricted to being used with only those characters or combinations, those are just used the most. In fact the range can be between any two SINGLE characters. So [&-Z] is perfectly valid to match all digit and upper case letters plus a lot of other characters, like = or @ and others. By default it will just be the range between the two ASCII or Unicode values so you can type in a value or use the ASCII \xXX syntax or Unicode characters with the \uXXXX syntax. The only catch is the second character must have a higher (ASCII or Unicode) value than the first
But here's the tricky part its only a range indicator if it's between two SINGLE characters, or an escaped equivalent of a single character \xXX or \uXXXX. Otherwise it only matches a dash. In my character class used in my datetime regex [./-] it only matches a dash because it at the end of the list of characters. If it was at the beginning it would also work but [.-/] is not the same. In fact the dash doesn't match. I think I realized this in an earlier version of the datetime regex but I didn't fully grasp what was happening. You'll notice I stressed the words 'single character' because as of this writing if the dash is between and character escape that represent multiple characters and a single character a dash matches only a dash
For instance [0-Z] matches a bunch of characters include digits and Uppercase English letters but [\d-Z] matches any digit or a dash or a Z.
Bottom line if you want to match a dash in a character classes with other characters place it first or last or be safe and escape it.