Got more questions? Find advice on: ASP | SQL | XML | Windows
Welcome to RegexAdvice Sign in | Join | Help

invalid characters in xml tags

  •  02-24-2008, 6:51 PM

    invalid characters in xml tags

    I'm looking for a way to strip out invalid characters in xml tags. Ideally, they shouldn't be put in there in the first place, however the program that outputs the XML is dumb and it basically just dumps text inbetween <> characters.. So.. for example I need to change

    <elem/en/t>Elements / are / good</elem/en/t>

    into

    <element>Elements / are / good</element>

     in other words, stripping out / 's except for the end tag (and the end tag doesn't necessarily have to be on the same line)

    I'm reading in the file per line in perl, so I need to assume the tag won't start with a / otherwise there wouldn't be a way to tell a closing tag from an invalid one.

    I've been playing around with things like

     s#\(<[^/]*)/([^/]*)>#$1$2#g;

    Filed under:
View Complete Thread