Without a good set of sample text to test against, I am not in a position to make comments on if they 'look good'. I would suggest that you create a set of examples that you know:
- definitely should match
- definitely should not match
- are on the borderline but should match
- are on the borderline but should not match
Looking at the example of:
([^<]*?)\"
I think there may be an easier way. What you are asking for is a lazy search forward that will stop just before the next '<' OR the next double-quote. The regex engine will be stepping forward a character at a time (effectively) checking for both the '<' and '"' end conditions. Therefore you can make its job more deterministic by using
([^<"]*)" (I'll leave you to work out the necessary escaping)
When the first part completes, the next character is either a '<', the end of the text/line or the double-quote character. In the first two cases you are not dealing with a double-quote before the start of the next tag so you want it to fail anyway. The second case is what you want and it will succeed on the following double-quote.
Once you have found a 'quote-space' (or whatever) pattern you are then scanning forward to the start of the next tag (with similar comments about the use of the character class actually searching for the '>' at the end of the next tag and backtracking to the '<'). What this will do is miss out those cases where there are two (or more) instances of the pattern between the tags. For example:
<a>hello "world" and "how are you" today</a>
would have:
hello "world
as match #1 and:
and "how are you" today
as match #2. Although you have the 'g' option set, the regex engine will not rescan the replaced text and so will get:
<a>hello "world”\sand "how are you" today</a>
as the result without multiple passes. Do you really need to match the text to the start of the next tag? You already know that the '" ' text must be after a tag and before the start of the next one from the first match part and, because of the check for the end of the preceding tag, you will need to make multiple passes anyway. Personally, I would use
"\s(?=[^<>]*(<|$))
(I'm fairly sure that variable length lookaheads) which only matches the 'double-quote space' pattern but checks ahead for the start of the next tag or the end of the string (assuming the multiline option is off). It will sill not find a match within a tag because the character set within the lookahead will stop at either a '<' or '>' (or the end of the text by implication) but the following '<' will cause the lookahead to fail if it is within a tag.
By the way, you will not find the patterns you are after before the first or after the last tag. Is that a problem? My suggestion above will get over that as well.
Finally, can you put a '\s' into the replacement string in javascript without it putting the literal "\s" characters into the output? Normally '\s' is a match shortcut for a character set with all of the whitespace characters in it. I suspect that you want a literal space character in the replacement string.
Susan