Hello,
I've been trying to write a regex that would split any sql select query and capture different clauses to the groups. That is, there should be 5 groups (eg. top, select, from, where, order) each containing corresponding text (the group is empty if the clause is missing).
For example, lets try and split this query:
SELECT * FROM dbo.dvProblems JOIN (select * from Actions where name = 'as') as tbl_act on dvProblems.ID = tbl_act.ProblemID WHERE (SELECT COUNT(ID) FROM ProblemStakeHolders WHERE ProblemID = dvProblems.ID AND EmployeeID='1')>0 ORDER BY ID
this should output the following groups:
top:
select: *
from: dbo.dvProblems JOIN (select * from Actions where name = 'as') as tbl_act on dvProblems.ID = tbl_act.ProblemID
where: (SELECT COUNT(ID) FROM ProblemStakeHolders WHERE ProblemID = dvProblems.ID AND EmployeeID='1')>0
order: ID
The expression should also match if there is no where or order by clauses, but select and from clauses are required.
I've ended up with the following expression:
^SELECT\s+(?:TOP\s+(?<top>\d*\s))?(?<select>.*?)\bFROM\b(?<from>.*?)(?:\bWHERE\b(?<where>.*?))?(?:\bORDER\sBY\b(?<order>.*))?$
but it doesn't work with subqueries in both from and where clauses. i guess i need to match only those FROM and WHERE statements that are not surrounded by brackets, but i don't know how to do it.
I'm working with .NET.
Any help would be greatly appreciated,
Donatas