问题
Microsoft has exposed the scriptdom API to parse and generate TSQL. I'm new to it and still playing with it. I want to know how to get the cross databases references from queries like this one.
UPDATE t3
SET description = 'abc'
FROM database1.dbo.table1 t1
INNER JOIN database2.dbo.table2 t2
ON (t1.id = t2.t1_id)
LEFT OUTER JOIN database3.dbo.table3 t3
ON (t3.id = t2.t3_id)
INNER JOIN database2.dbo.table4 t4
ON (t4.id = t2.t4_id)
What I want is a list of the references:
database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id
database2.dbo.table4.id = database2.dbo.table2.t4_id
However, for the last entry database2.dbo.table4.id = database2.dbo.table2.t4_id
, both of the columns from the 2 ends are from the same database database2
, this is not what I want. So my final required result is:
database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id
Is is possible to be implemented with scriptdom
?
回答1:
A robust implementation is not easy. For the limited problem as posed in this question, the solution is relatively simple -- stress "relatively". I assume the following:
- The query only has one level -- there are no UNIONs, subqueries, WITH expressions or other things that introduce new scopes for aliases (and this can get complicated very quickly).
- All identifiers in the query are fully qualified so there is no doubt what object it's referring to.
The solution strategy looks like this: we first visit the TSqlFragment
to make a list of all table aliases, then visit it again to get all equijoins, expanding aliases along the way. Using that list, we determine the list of equijoins that do not refer to the same database. In code:
var sql = @"
UPDATE t3
SET description = 'abc'
FROM database1.dbo.table1 t1
INNER JOIN database2.dbo.table2 t2
ON (t1.id = t2.t1_id)
LEFT OUTER JOIN database3.dbo.table3 t3
ON (t3.id = t2.t3_id)
INNER JOIN database2.dbo.table4 t4
ON (t4.id = t2.t4_id)
";
var parser = new TSql120Parser(initialQuotedIdentifiers: false);
IList<ParseError> errors;
TSqlScript script;
using (var reader = new StringReader(sql)) {
script = (TSqlScript) parser.Parse(reader, out errors);
}
// First resolve aliases.
var aliasResolutionVisitor = new AliasResolutionVisitor();
script.Accept(aliasResolutionVisitor);
// Then find all equijoins, expanding aliases along the way.
var findEqualityJoinVisitor = new FindEqualityJoinVisitor(
aliasResolutionVisitor.Aliases
);
script.Accept(findEqualityJoinVisitor);
// Now list all aliases where the left database is not the same
// as the right database.
foreach (
var equiJoin in
findEqualityJoinVisitor.EqualityJoins.Where(
j => !j.JoinsSameDatabase()
)
) {
Console.WriteLine(equiJoin.ToString());
}
Output:
database3.dbo.table3.id = database2.dbo.table2.t3_id
database1.dbo.table1.id = database2.dbo.table2.t1_id
AliasResolutionVisitor
is a simple thing:
public class AliasResolutionVisitor : TSqlFragmentVisitor {
readonly Dictionary<string, string> aliases = new Dictionary<string, string>();
public Dictionary<string, string> Aliases { get { return aliases; } }
public override void Visit(NamedTableReference namedTableReference ) {
Identifier alias = namedTableReference.Alias;
string baseObjectName = namedTableReference.SchemaObject.AsObjectName();
if (alias != null) {
aliases.Add(alias.Value, baseObjectName);
}
}
}
We simply go through all the named table references in the query and, if they have an alias, add this to a dictionary. Note that this will fail miserably if subqueries are introduced, because this visitor has no notion of scope (and indeed, adding scope to a visitor is much harder because the TSqlFragment
offers no way to annotate the parse tree or even walk it from a node).
The EqualityJoinVisitor
is more interesting:
public class FindEqualityJoinVisitor : TSqlFragmentVisitor {
readonly Dictionary<string, string> aliases;
public FindEqualityJoinVisitor(Dictionary<string, string> aliases) {
this.aliases = aliases;
}
readonly List<EqualityJoin> equalityJoins = new List<EqualityJoin>();
public List<EqualityJoin> EqualityJoins { get { return equalityJoins; } }
public override void Visit(QualifiedJoin qualifiedJoin) {
var findEqualityComparisonVisitor = new FindEqualityComparisonVisitor();
qualifiedJoin.SearchCondition.Accept(findEqualityComparisonVisitor);
foreach (
var equalityComparison in findEqualityComparisonVisitor.Comparisons
) {
var firstColumnReferenceExpression =
equalityComparison.FirstExpression as ColumnReferenceExpression
;
var secondColumnReferenceExpression =
equalityComparison.SecondExpression as ColumnReferenceExpression
;
if (
firstColumnReferenceExpression != null &&
secondColumnReferenceExpression != null
) {
string firstColumnResolved = resolveMultipartIdentifier(
firstColumnReferenceExpression.MultiPartIdentifier
);
string secondColumnResolved = resolveMultipartIdentifier(
secondColumnReferenceExpression.MultiPartIdentifier
);
equalityJoins.Add(
new EqualityJoin(firstColumnResolved, secondColumnResolved)
);
}
}
}
private string resolveMultipartIdentifier(MultiPartIdentifier identifier) {
if (
identifier.Identifiers.Count == 2 &&
aliases.ContainsKey(identifier.Identifiers[0].Value)
) {
return
aliases[identifier.Identifiers[0].Value] + "." +
identifier.Identifiers[1].Value;
} else {
return identifier.AsObjectName();
}
}
}
This hunts for QualifiedJoin
instances and, if we find them, we in turn examine the search condition to find all occurrences of equality comparisons. Note that this does work with nested search conditions: in Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz
, we will find both expressions.
How do we find them? Using another small visitor:
public class FindEqualityComparisonVisitor : TSqlFragmentVisitor {
List<BooleanComparisonExpression> comparisons =
new List<BooleanComparisonExpression>()
;
public List<BooleanComparisonExpression> Comparisons {
get { return comparisons; }
}
public override void Visit(BooleanComparisonExpression e) {
if (e.IsEqualityComparison()) comparisons.Add(e);
}
}
Nothing complicated here. It wouldn't be hard to fold this code into the other visitor, but I think this is clearer.
That's it, except for some helper code which I'll present without comment:
public class EqualityJoin {
readonly SchemaObjectName left;
public SchemaObjectName Left { get { return left; } }
readonly SchemaObjectName right;
public SchemaObjectName Right { get { return right; } }
public EqualityJoin(
string qualifiedObjectNameLeft, string qualifiedObjectNameRight
) {
var parser = new TSql120Parser(initialQuotedIdentifiers: false);
IList<ParseError> errors;
using (var reader = new StringReader(qualifiedObjectNameLeft)) {
left = parser.ParseSchemaObjectName(reader, out errors);
}
using (var reader = new StringReader(qualifiedObjectNameRight)) {
right = parser.ParseSchemaObjectName(reader, out errors);
}
}
public bool JoinsSameDatabase() {
return left.Identifiers[0].Value == right.Identifiers[0].Value;
}
public override string ToString() {
return String.Format("{0} = {1}", left.AsObjectName(), right.AsObjectName());
}
}
public static class MultiPartIdentifierExtensions {
public static string AsObjectName(this MultiPartIdentifier multiPartIdentifier) {
return string.Join(".", multiPartIdentifier.Identifiers.Select(i => i.Value));
}
}
public static class ExpressionExtensions {
public static bool IsEqualityComparison(this BooleanExpression expression) {
return
expression is BooleanComparisonExpression &&
((BooleanComparisonExpression) expression).ComparisonType == BooleanComparisonType.Equals
;
}
}
As I mentioned before, this code is quite brittle. It assumes queries have a particular form, and it could fail (quite badly, by giving misleading results) if they don't. A major open challenge would be to extend it so it can handle scopes and unqualified references correctly, as well as the other weirdness a T-SQL script can feature, but I think it's a useful starting point nevertheless.
回答2:
Perhaps another way to attempt this is to execute your query as:
SET SHOWPLAN_XML ON;
UPDATE t3
SET description = 'abc'
FROM database1.dbo.table1 t1
INNER JOIN database2.dbo.table2 t2
ON (t1.id = t2.t1_id)
LEFT OUTER JOIN database3.dbo.table3 t3
ON (t3.id = t2.t3_id)
INNER JOIN database2.dbo.table4 t4
ON (t4.id = t2.t4_id)
This returns an XML query plan. In the XML you will find the join conditions under a RelOp node. For example for a hash join loop you will see something like:
<RelOp NodeId="7" PhysicalOp="Hash Match" LogicalOp="Inner Join" EstimateRows="1" EstimateIO="0" EstimateCPU="0.0177716" AvgRowSize="15" EstimatedTotalSubtreeCost="0.0243408" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row">
.. some stuff cut from here
<Hash>
..
<ProbeResidual>
<ScalarOperator ScalarString="[database2].[dbo].[table4].[Id] as [t4].[Id]=[database2].[dbo].[table2].[t4_Id] as [t2].[t4_Id]">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database2]" Schema="[dbo]" Table="[table4]" Alias="[t4]" Column="Id" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t4_Id" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
For a nested loop something along the lines of:
<NestedLoops Optimized="0">
<Predicate>
<ScalarOperator ScalarString="[database3].[dbo].[table3].[Id] as [t3].[Id]=[database2].[dbo].[table2].[t3_id] as [t2].[t3_id]">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database3]" Schema="[dbo]" Table="[table3]" Alias="[t3]" Column="Id" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t3_id" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
</Predicate>
Perhaps you could then process this in C# to extract all joins then compare the databases held in the column references.
Apologies for the formatting.
来源:https://stackoverflow.com/questions/27240983/how-to-extract-cross-databases-references-using-scriptdom-api