Got more questions? Find advice on: ASP | SQL | XML | Windows
in Search
Welcome to RegexAdvice Sign in | Join | Help

Validating SharePoint file names

Last post 09-07-2007, 3:00 AM by ScottW. 3 replies.
Sort Posts: Previous Next
  •  09-06-2007, 5:06 PM 34558

    Validating SharePoint file names

    I'm trying to create a regex that matches valid SharePoint file names. We have designed some ASP.NET 2.0 web pages within SharePoint that allow our users to upload a file attachment into SharePoint and rename the attachment to whatever they like before the system actually saves it. We need to ensure that the file name entered by the user fits the required format. Because we are using a .NET RegularExpressionValidator control to perform the validation, we need to perform the validation with a single regex which is defined in the RegularExpressionValidator's "ValidationExpression" property.

     Here are SharePoint's rules for file names:

    1. The following are invalid characters in a file name: " # % & * : < > ? \ / { | } ~ 
    2. The length of the file name can't exceed 128 characters
    3. The period (dot) character can't be used at the beginning of a file name
    4. The period (dot) character can't be used at the end of a file name
    5. The period (dot) character can't be used consecutively in the middle of a file name

    The following would be valid matches:

    • Filename 1.jpg
    • Proposal_document (1st draft).doc
    • R.txt
    • 1R

    The following would be invalid file names:

    • Filename 1..jpg          (...use of consecutive dot characters)
    • Proposal_Document {1st draft}.doc       (...use of invalid characters...curvy braces)
    • R.txt.          (...use of dot character at end of file name)

    I've been able to create a regex that satisfies requirements 1-4, but I can't figure out how to add in #5 (disallowing consecutive dot characters). Here's what I have so far:

    ([^\"#%&*:<>?\\/{|}~.][^\"#%&*:<>?\\/{|}~]{0,126}[^\"#%&*:<>?\\/{|}~.]|[^\"#%&*:<>?\\/{|}~.])

    This regex uses the following negated character class [^\"#%&*:<>?\\/{|}~] repeatedly to disallow the characters in the "invalid characters" list of requirement #1. In order to ensure that the dot character does not appear at the beginning or end of the file name, I added the dot character to the end of the negated character class to make [^\"#%&*:<>?\\/{|}~.] and used this character class twice...once at the beginning and once at the end (of the left side of the alternation). I then included the same negated character class [^\"#%&*:<>?\\/{|}~] without the dot in the middle for a possible repetition of 0 to 126 characters. This combination requires a non-special character at the beginning, 0-126 non-special characters in the middle, and another non-special character at the end. The problem with this is that at minimum it required at least one non-special character at the beginning and end of the file name, which in effect required a minimum file length of two characters, and which wouldn't have allowed the user to enter a one-character file name (would be weird but still valid). To solve this, I used the same negated character class [^\"#%&*:<>?\\/{|}~.] on the right side of the alternation to allow for single-character file names.

    I'm brand new to regexes and I can't help thinking "man your regex looks ugly"...so feel free to let me know if there is a more elegant way to do what I've already done! However, if not, it's at least working for requirements #1-4, so what I need most is help on adding requirement #5 to the regex. I've tried a number of ways to prevent consecutive dot characters from showing up in the file name, but to no avail. I keep trying something to the effect of putting [^\.][^\.] into the regex to prevent two consecutive dots, but the construction of the current regex always ends up including the consecutive dots as valid matches by the character class that uses the {0,126} repeater. Instead of using the syntax for excluding single characters as I'm doing, is there a syntax for excluding multiple consecutive characters? I feel like this should be simple and I'm just missing something obvious because I'm new at this. Can any of you seasoned veterans help me out?

    Thanks much,

    Scott

  •  09-06-2007, 5:36 PM 34560 in reply to 34558

    Re: Validating SharePoint file names

    Here's something to try (.NET):

    ^(?!\..*)(?!.*\.\.)(?=.*[^.]$)[^\"#%&*:<>?\\/{|}~]{1,128}$

     This handles your conditions as follows:

    (?!\..*) prevents the string from starting with a period.

    (?!.*\.\.) prevents two consecutive periods within the string.

    (?=.*[^.]$) requires a non-period at the end of the string

    The [^\"#%&*:<>?\\/{|}~] lists the disallowed character classes, as you're aware.

    {1,128} forces the string to be between 1 and 128 characters

    ^ and $ define the beginning and end of the string.

  •  09-07-2007, 12:55 AM 34573 in reply to 34560

    Re: Validating SharePoint file names

    Yes!! Beautiful! This works perfectly!

    This solution is so elegant and I am going to take some time now to learn from what you have shown me here...especially to better understand the use of the (?!) and (?=) constructs which I had seen in the regex documentation but as I was trying to learn and absorb as much as possible these past couple of days just seemed too much over my head.

    Thank you for your help!

    Scott

  •  09-07-2007, 3:00 AM 34574 in reply to 34560

    Re: Validating SharePoint file names

    Your solution has really helped me learn a lot. I now can begin to grasp and understand the use of lookaheads. When I had originally looked at the documentation, I saw the lookaheads being explained as a way to test that something does not follow something else. So I had tried using something to the effect of .*(?!\.\.) but this wasn't working. I never realized that you could construct the lookahead without including a specific token for the "something else" that precedes the "something" (whoa does that make any sense?!).

    It was really helpful for me to learn how the lookaheads do not consume any characters in the string, but only return a success or failure and then leave the regex to continue matching from the same string position. That opens up a huge realm of possibilities in checking for multiple invalid conditions because it lets you always "restart" the checking at the beginning of the string for each condition you want to test. I tweaked the (?!\..*) just a bit to become (?!\.) and that seemed to work too.

    I had to mull over the (?!.*\.\.) for a while because I didn't really grasp backtracking and I kept asking myself, "how will this lookahead ever get to check for the \.\. when the .* will always successfully consume every character in the string?", but after studying some more I realized that yes, the .* will consume every character in the string, but then the engine will see the \.\. and realize that the overall match has failed. The engine will then backtrack and re-search the string up until the last character, which will be compared against the remaining \.\. which also will fail. It will continue to backtrack through the whole string until it can ensure that there is no instance of a set of characters that is followed by the double dots.

    The (?=.*[^.]$) lookahead was great because it helped me to learn that the $ can be used to designate the ending position of the string...but that it can actually be used more than once within the regex and not just as the very last token of the complete regex. May sound silly but it's amazing how easy it can be to misinterpret these things when you're starting off. With my newfound skills I tried to see if I could convert your positive lookahead into an equivalent negative lookahead as you had done with the first two...and (?!.*\.$) seemed to work. Cool!

    Finally it was really neat to see how you used the [^\"#%&*:<>?\\/{|}~]{1,128}$ after all the lookahead checks to actually consume the characters in the string and ensure that none were in the invalid character set.

    When I started this a few days ago I know I could have used some string checking code on the server side to do multiple IndexOf() checks, Length() checks, etc., on the file name that was input by the user to make sure that none of the conditions were violated. I could have written this code in a couple of minutes which would have been the quick short-term solution but it would have required a server round-trip and multiple lines of code...ugly and inefficient. It was really worth it to finally look into using regexes. Your solution allows me to accomplish the validation completely on the client side in one line of code. Nice!

    Thanks again for the help,

    Scott

View as RSS news feed in XML