I'm trying to create a regex that matches valid SharePoint file names. We have designed some ASP.NET 2.0 web pages within SharePoint that allow our users to upload a file attachment into SharePoint and rename the attachment to whatever they like before the system actually saves it. We need to ensure that the file name entered by the user fits the required format. Because we are using a .NET RegularExpressionValidator control to perform the validation, we need to perform the validation with a single regex which is defined in the RegularExpressionValidator's "ValidationExpression" property.
Here are SharePoint's rules for file names:
- The following are invalid characters in a file name: " # % & * : < > ? \ / { | } ~
- The length of the file name can't exceed 128 characters
- The period (dot) character can't be used at the beginning of a file name
- The period (dot) character can't be used at the end of a file name
- The period (dot) character can't be used consecutively in the middle of a file name
The following would be valid matches:
- Filename 1.jpg
- Proposal_document (1st draft).doc
- R.txt
- 1R
The following would be invalid file names:
- Filename 1..jpg (...use of consecutive dot characters)
- Proposal_Document {1st draft}.doc (...use of invalid characters...curvy braces)
- R.txt. (...use of dot character at end of file name)
I've been able to create a regex that satisfies requirements 1-4, but I can't figure out how to add in #5 (disallowing consecutive dot characters). Here's what I have so far:
([^\"#%&*:<>?\\/{|}~.][^\"#%&*:<>?\\/{|}~]{0,126}[^\"#%&*:<>?\\/{|}~.]|[^\"#%&*:<>?\\/{|}~.])
This regex uses the following negated character class [^\"#%&*:<>?\\/{|}~] repeatedly to disallow the characters in the "invalid characters" list of requirement #1. In order to ensure that the dot character does not appear at the beginning or end of the file name, I added the dot character to the end of the negated character class to make [^\"#%&*:<>?\\/{|}~.] and used this character class twice...once at the beginning and once at the end (of the left side of the alternation). I then included the same negated character class [^\"#%&*:<>?\\/{|}~] without the dot in the middle for a possible repetition of 0 to 126 characters. This combination requires a non-special character at the beginning, 0-126 non-special characters in the middle, and another non-special character at the end. The problem with this is that at minimum it required at least one non-special character at the beginning and end of the file name, which in effect required a minimum file length of two characters, and which wouldn't have allowed the user to enter a one-character file name (would be weird but still valid). To solve this, I used the same negated character class [^\"#%&*:<>?\\/{|}~.] on the right side of the alternation to allow for single-character file names.
I'm brand new to regexes and I can't help thinking "man your regex looks ugly"...so feel free to let me know if there is a more elegant way to do what I've already done! However, if not, it's at least working for requirements #1-4, so what I need most is help on adding requirement #5 to the regex. I've tried a number of ways to prevent consecutive dot characters from showing up in the file name, but to no avail. I keep trying something to the effect of putting [^\.][^\.] into the regex to prevent two consecutive dots, but the construction of the current regex always ends up including the consecutive dots as valid matches by the character class that uses the {0,126} repeater. Instead of using the syntax for excluding single characters as I'm doing, is there a syntax for excluding multiple consecutive characters? I feel like this should be simple and I'm just missing something obvious because I'm new at this. Can any of you seasoned veterans help me out?
Thanks much,
Scott