How to extract cross databases references using scriptdom API

后端 未结 2 927
执念已碎
执念已碎 2021-01-14 07:16

Microsoft has exposed the scriptdom API to parse and generate TSQL. I\'m new to it and still playing with it. I want to know how to get the cross databases references from q

相关标签:
2条回答
  • 2021-01-14 07:27

    Perhaps another way to attempt this is to execute your query as:

    SET SHOWPLAN_XML ON;
    UPDATE  t3
    SET     description = 'abc'
    FROM    database1.dbo.table1 t1
            INNER JOIN database2.dbo.table2 t2
                ON (t1.id = t2.t1_id)
            LEFT OUTER JOIN database3.dbo.table3 t3
                ON (t3.id = t2.t3_id)
            INNER JOIN database2.dbo.table4 t4
                ON (t4.id = t2.t4_id)
    

    This returns an XML query plan. In the XML you will find the join conditions under a RelOp node. For example for a hash join loop you will see something like:

    <RelOp NodeId="7" PhysicalOp="Hash Match" LogicalOp="Inner Join" EstimateRows="1" EstimateIO="0" EstimateCPU="0.0177716" AvgRowSize="15" EstimatedTotalSubtreeCost="0.0243408" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row">
    .. some stuff cut from here
      <Hash>
    ..
    <ProbeResidual>
      <ScalarOperator ScalarString="[database2].[dbo].[table4].[Id] as [t4].[Id]=[database2].[dbo].[table2].[t4_Id] as [t2].[t4_Id]">
       <Compare CompareOp="EQ">
         <ScalarOperator>
           <Identifier>
             <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table4]" Alias="[t4]" Column="Id" />
           </Identifier>
         </ScalarOperator>
         <ScalarOperator>
           <Identifier>
             <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t4_Id" />
           </Identifier>
         </ScalarOperator>
       </Compare>
     </ScalarOperator>
    

    For a nested loop something along the lines of:

    <NestedLoops Optimized="0">
    <Predicate>
      <ScalarOperator ScalarString="[database3].[dbo].[table3].[Id] as [t3].[Id]=[database2].[dbo].[table2].[t3_id] as [t2].[t3_id]">
        <Compare CompareOp="EQ">
          <ScalarOperator>
            <Identifier>
              <ColumnReference Database="[database3]" Schema="[dbo]" Table="[table3]" Alias="[t3]" Column="Id" />
            </Identifier>
          </ScalarOperator>
          <ScalarOperator>
            <Identifier>
              <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t3_id" />
            </Identifier>
          </ScalarOperator>
        </Compare>
      </ScalarOperator>
    </Predicate>
    

    Perhaps you could then process this in C# to extract all joins then compare the databases held in the column references.

    Apologies for the formatting.

    0 讨论(0)
  • 2021-01-14 07:44

    A robust implementation is not easy. For the limited problem as posed in this question, the solution is relatively simple -- stress "relatively". I assume the following:

    • The query only has one level -- there are no UNIONs, subqueries, WITH expressions or other things that introduce new scopes for aliases (and this can get complicated very quickly).
    • All identifiers in the query are fully qualified so there is no doubt what object it's referring to.

    The solution strategy looks like this: we first visit the TSqlFragment to make a list of all table aliases, then visit it again to get all equijoins, expanding aliases along the way. Using that list, we determine the list of equijoins that do not refer to the same database. In code:

    var sql = @"
      UPDATE  t3
      SET     description = 'abc'
      FROM    database1.dbo.table1 t1
          INNER JOIN database2.dbo.table2 t2
            ON (t1.id = t2.t1_id)
          LEFT OUTER JOIN database3.dbo.table3 t3
            ON (t3.id = t2.t3_id)
          INNER JOIN database2.dbo.table4 t4
            ON (t4.id = t2.t4_id)
    
    ";                
    
    var parser = new TSql120Parser(initialQuotedIdentifiers: false);
    IList<ParseError> errors;
    TSqlScript script;
    using (var reader = new StringReader(sql)) {
      script = (TSqlScript) parser.Parse(reader, out errors);
    }
    // First resolve aliases.
    var aliasResolutionVisitor = new AliasResolutionVisitor();
    script.Accept(aliasResolutionVisitor);
    
    // Then find all equijoins, expanding aliases along the way.
    var findEqualityJoinVisitor = new FindEqualityJoinVisitor(
      aliasResolutionVisitor.Aliases
    );
    script.Accept(findEqualityJoinVisitor);
    
    // Now list all aliases where the left database is not the same
    // as the right database.
    foreach (
      var equiJoin in 
      findEqualityJoinVisitor.EqualityJoins.Where(
        j => !j.JoinsSameDatabase()
      )
    ) {
      Console.WriteLine(equiJoin.ToString());
    }
    

    Output:

    database3.dbo.table3.id = database2.dbo.table2.t3_id
    database1.dbo.table1.id = database2.dbo.table2.t1_id
    

    AliasResolutionVisitor is a simple thing:

    public class AliasResolutionVisitor : TSqlFragmentVisitor {
      readonly Dictionary<string, string> aliases = new Dictionary<string, string>();
      public Dictionary<string, string> Aliases { get { return aliases; } }
    
      public override void Visit(NamedTableReference namedTableReference ) {
        Identifier alias = namedTableReference.Alias;
        string baseObjectName = namedTableReference.SchemaObject.AsObjectName();
        if (alias != null) {
          aliases.Add(alias.Value, baseObjectName);
        }
      }
    }
    

    We simply go through all the named table references in the query and, if they have an alias, add this to a dictionary. Note that this will fail miserably if subqueries are introduced, because this visitor has no notion of scope (and indeed, adding scope to a visitor is much harder because the TSqlFragment offers no way to annotate the parse tree or even walk it from a node).

    The EqualityJoinVisitor is more interesting:

    public class FindEqualityJoinVisitor : TSqlFragmentVisitor {
      readonly Dictionary<string, string> aliases;
      public FindEqualityJoinVisitor(Dictionary<string, string> aliases) {
        this.aliases = aliases;
      }
    
      readonly List<EqualityJoin> equalityJoins = new List<EqualityJoin>();
      public List<EqualityJoin> EqualityJoins { get { return equalityJoins; } }
    
      public override void Visit(QualifiedJoin qualifiedJoin) {
        var findEqualityComparisonVisitor = new FindEqualityComparisonVisitor();
        qualifiedJoin.SearchCondition.Accept(findEqualityComparisonVisitor);
        foreach (
          var equalityComparison in findEqualityComparisonVisitor.Comparisons
        ) {
          var firstColumnReferenceExpression = 
            equalityComparison.FirstExpression as ColumnReferenceExpression
          ;
          var secondColumnReferenceExpression = 
            equalityComparison.SecondExpression as ColumnReferenceExpression
          ;
          if (
            firstColumnReferenceExpression != null && 
            secondColumnReferenceExpression != null
          ) {
            string firstColumnResolved = resolveMultipartIdentifier(
              firstColumnReferenceExpression.MultiPartIdentifier
            );
            string secondColumnResolved = resolveMultipartIdentifier(
              secondColumnReferenceExpression.MultiPartIdentifier
            );
            equalityJoins.Add(
              new EqualityJoin(firstColumnResolved, secondColumnResolved)
            );
          }
        }
      }
    
      private string resolveMultipartIdentifier(MultiPartIdentifier identifier) {
        if (
          identifier.Identifiers.Count == 2 && 
          aliases.ContainsKey(identifier.Identifiers[0].Value)
        ) {
          return 
            aliases[identifier.Identifiers[0].Value] + "." + 
            identifier.Identifiers[1].Value;
        } else {
          return identifier.AsObjectName();
        }
      }
    }
    

    This hunts for QualifiedJoin instances and, if we find them, we in turn examine the search condition to find all occurrences of equality comparisons. Note that this does work with nested search conditions: in Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz, we will find both expressions.

    How do we find them? Using another small visitor:

    public class FindEqualityComparisonVisitor : TSqlFragmentVisitor {
      List<BooleanComparisonExpression> comparisons = 
        new List<BooleanComparisonExpression>()
      ;
      public List<BooleanComparisonExpression> Comparisons { 
        get { return comparisons; } 
      }
    
      public override void Visit(BooleanComparisonExpression e) {
        if (e.IsEqualityComparison()) comparisons.Add(e);
      }
    }
    

    Nothing complicated here. It wouldn't be hard to fold this code into the other visitor, but I think this is clearer.

    That's it, except for some helper code which I'll present without comment:

    public class EqualityJoin {
      readonly SchemaObjectName left;
      public SchemaObjectName Left { get { return left; } }
    
      readonly SchemaObjectName right;
      public SchemaObjectName Right { get { return right; } }
    
      public EqualityJoin(
        string qualifiedObjectNameLeft, string qualifiedObjectNameRight
      ) {
        var parser = new TSql120Parser(initialQuotedIdentifiers: false);
        IList<ParseError> errors;
        using (var reader = new StringReader(qualifiedObjectNameLeft)) {
          left = parser.ParseSchemaObjectName(reader, out errors);
        }
        using (var reader = new StringReader(qualifiedObjectNameRight)) {
          right = parser.ParseSchemaObjectName(reader, out errors);
        }
      }
    
      public bool JoinsSameDatabase() {
        return left.Identifiers[0].Value == right.Identifiers[0].Value;
      }
    
      public override string ToString() {
        return String.Format("{0} = {1}", left.AsObjectName(), right.AsObjectName());
      }
    }
    
    public static class MultiPartIdentifierExtensions {
      public static string AsObjectName(this MultiPartIdentifier multiPartIdentifier) {
        return string.Join(".", multiPartIdentifier.Identifiers.Select(i => i.Value));
      }
    }
    
    public static class ExpressionExtensions {
      public static bool IsEqualityComparison(this BooleanExpression expression) {
        return 
          expression is BooleanComparisonExpression && 
          ((BooleanComparisonExpression) expression).ComparisonType == BooleanComparisonType.Equals
        ;
      }
    }
    

    As I mentioned before, this code is quite brittle. It assumes queries have a particular form, and it could fail (quite badly, by giving misleading results) if they don't. A major open challenge would be to extend it so it can handle scopes and unqualified references correctly, as well as the other weirdness a T-SQL script can feature, but I think it's a useful starting point nevertheless.

    0 讨论(0)
提交回复
热议问题