Can you please read the posting guidelines in the sticky note at the beginning of this forum where we ask you to identify the regex variant and the platform you are using and also ask that you do NOT make up examples. In you case, the output you say you get would appear to be derived from some other input than your sample text (the "aearttt" disappeared and replaced by "nullzfezggezg").
The '(?d)' part of your pattern looks like it is applying a modifier of some form but I don't recognise it (and neither does my regex tester but that is because I may not beusing the same regex variant as you).
Also you pattern can be cleaned up quite a bit. For example, you have added in quite a few capture groups that appear to be unnecessary - '(((200)[0-9]|(201)[0-9]))' can be replaced by '(200[0-9]|201[0-9])'. Adding in groups where you don't use the captured text just makes for unnecessary work within the regex engine. (I know there are cases where you need to have groups; to limit the range of the alternative operator for example)
However, even this can be simplified as follows: factor out the '20' from the start beginning of the alternatives
20(0[0-9]|1[0-9])
and also the '[0-9]' from the end of the alternatives (also using the '\d' which amounts to the same thing)
20(0|1)\d
The alternative can also be expressed as a character set so the pattern becomes
20[01]\d
A similar thing can be done to '(\\>\\>|\\>)' part. I can only assume that the programming language or the regex variant you are using needs to escape the ">" character (but none that I can think of at the moment do) When you remove (if only temporarily) the '\\' parts and start factoring, this becomes
>(>|)
Now an alternative that has a null option is generally expressed better using the '?' quantifier and so this becomes
>>?
Also, this will involve a bit less processing by the regex engine. If it sees ">DECEMBER" in the text, it will try the alternative sin the order they are expressed in the pattern and so attempt to match the '>>' part. It will match the first character but fail on the second. It will therefore have to backtrack and release the matched ">" character before trying the second alternative.n this case it is a single '>' and will again match before going on to the next part of the pattern. Compare that to my suggestion where the ">" is matched and then the "D" of the text is compared with the '>?' of the pattern - there is no match but the '?' quantifier says that no match is OK and the overall matching process will continue.
There is no need to include the modifiers both within the pattern and as options in the function call. Therefore, either use the "Pattern.DOTALL" parameter value or the '(?s)'. Also I woulod combine all of the in-pattern options into a single group - '(?smi)'.
I assume that you are using the '(.{0,10})' part of your pattern to match the line breaks after the digits on the first line and the ">" or ">>" on the second. The problem with this is that the '{0,10}' form of the quantifier is greedy and so it will start by grabbing 10 characters which is far more than is needed. In this case, it will not quite get as far as the end of the "DECEMBER" but if it were "MAY" then you would actually get beyond the ">" or ">>" at the start of the 3rd line. When you start to backtrack to try to match the '>' in the pattern that follows, you might end up at the start of the wrong line. Your pattern will probably find this out eventually (when tries to find that last part of your pattern after the next '(.{0,10})' instance) but it will have to do up to 100 different character matching combinations to determine this.
If you make this non-greedy by making it '.{0,10}?' then it will only match as many characters as it needs to in order to find the ">" that follows. If your sample text is truly representative of your actual text, this will only match the line terminating character(s) which are one of '\r', '\n' or '\r\n' depending on your platform. Regardless, there are only 2 characters. I must admit that I would be tempted to use '.*?' and let the regex engine find the number for me.
I do not understand why your "result" includes the "null" text fragments as these would appear to be where the line terminators would go. Do you really get the "null" text being added in there?
Leaving in your patterns for validating the first and last numbers (but removing the '{1,2}' quantifier - is this really necessary? Can you have up to 4 digits here?, also you can merge the 2 inner alternatives), the pattern:
(0?[1-9]|[12]\d|3[01]).*?>>?.*?>>?.?(20[01]\d)
and the replacement text of
<deadline>$0</deadline>
with the 'singleline' option set would appear to match your text and generate the output you want:
aearttt <deadline>15
>>DECEMBER
>>2009</deadline>
aearttt <deadline>15
>DECEMBER
>2009</deadline>
from my regex tester.
Susan