How to extract cross databases references using scriptdom API

问题

Microsoft has exposed the scriptdom API to parse and generate TSQL. I'm new to it and still playing with it. I want to know how to get the cross databases references from queries like this one.

UPDATE  t3
SET     description = 'abc'
FROM    database1.dbo.table1 t1
        INNER JOIN database2.dbo.table2 t2
            ON (t1.id = t2.t1_id)
        LEFT OUTER JOIN database3.dbo.table3 t3
            ON (t3.id = t2.t3_id)
        INNER JOIN database2.dbo.table4 t4
            ON (t4.id = t2.t4_id)

What I want is a list of the references:

database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id
database2.dbo.table4.id = database2.dbo.table2.t4_id

However, for the last entry database2.dbo.table4.id = database2.dbo.table2.t4_id, both of the columns from the 2 ends are from the same database database2, this is not what I want. So my final required result is:

database1.dbo.table1.id = database2.dbo.table2.t1_id
database3.dbo.table3.id = database2.dbo.table2.t3_id

Is is possible to be implemented with scriptdom?

回答1:

A robust implementation is not easy. For the limited problem as posed in this question, the solution is relatively simple -- stress "relatively". I assume the following:

The query only has one level -- there are no UNIONs, subqueries, WITH expressions or other things that introduce new scopes for aliases (and this can get complicated very quickly).
All identifiers in the query are fully qualified so there is no doubt what object it's referring to.

The solution strategy looks like this: we first visit the TSqlFragment to make a list of all table aliases, then visit it again to get all equijoins, expanding aliases along the way. Using that list, we determine the list of equijoins that do not refer to the same database. In code:

var sql = @"
  UPDATE  t3
  SET     description = 'abc'
  FROM    database1.dbo.table1 t1
      INNER JOIN database2.dbo.table2 t2
        ON (t1.id = t2.t1_id)
      LEFT OUTER JOIN database3.dbo.table3 t3
        ON (t3.id = t2.t3_id)
      INNER JOIN database2.dbo.table4 t4
        ON (t4.id = t2.t4_id)

";                

var parser = new TSql120Parser(initialQuotedIdentifiers: false);
IList<ParseError> errors;
TSqlScript script;
using (var reader = new StringReader(sql)) {
  script = (TSqlScript) parser.Parse(reader, out errors);
}
// First resolve aliases.
var aliasResolutionVisitor = new AliasResolutionVisitor();
script.Accept(aliasResolutionVisitor);

// Then find all equijoins, expanding aliases along the way.
var findEqualityJoinVisitor = new FindEqualityJoinVisitor(
  aliasResolutionVisitor.Aliases
);
script.Accept(findEqualityJoinVisitor);

// Now list all aliases where the left database is not the same
// as the right database.
foreach (
  var equiJoin in 
  findEqualityJoinVisitor.EqualityJoins.Where(
    j => !j.JoinsSameDatabase()
  )
) {
  Console.WriteLine(equiJoin.ToString());
}

Output:

database3.dbo.table3.id = database2.dbo.table2.t3_id
database1.dbo.table1.id = database2.dbo.table2.t1_id

AliasResolutionVisitor is a simple thing:

public class AliasResolutionVisitor : TSqlFragmentVisitor {
  readonly Dictionary<string, string> aliases = new Dictionary<string, string>();
  public Dictionary<string, string> Aliases { get { return aliases; } }

  public override void Visit(NamedTableReference namedTableReference ) {
    Identifier alias = namedTableReference.Alias;
    string baseObjectName = namedTableReference.SchemaObject.AsObjectName();
    if (alias != null) {
      aliases.Add(alias.Value, baseObjectName);
    }
  }
}

We simply go through all the named table references in the query and, if they have an alias, add this to a dictionary. Note that this will fail miserably if subqueries are introduced, because this visitor has no notion of scope (and indeed, adding scope to a visitor is much harder because the TSqlFragment offers no way to annotate the parse tree or even walk it from a node).

The EqualityJoinVisitor is more interesting:

public class FindEqualityJoinVisitor : TSqlFragmentVisitor {
  readonly Dictionary<string, string> aliases;
  public FindEqualityJoinVisitor(Dictionary<string, string> aliases) {
    this.aliases = aliases;
  }

  readonly List<EqualityJoin> equalityJoins = new List<EqualityJoin>();
  public List<EqualityJoin> EqualityJoins { get { return equalityJoins; } }

  public override void Visit(QualifiedJoin qualifiedJoin) {
    var findEqualityComparisonVisitor = new FindEqualityComparisonVisitor();
    qualifiedJoin.SearchCondition.Accept(findEqualityComparisonVisitor);
    foreach (
      var equalityComparison in findEqualityComparisonVisitor.Comparisons
    ) {
      var firstColumnReferenceExpression = 
        equalityComparison.FirstExpression as ColumnReferenceExpression
      ;
      var secondColumnReferenceExpression = 
        equalityComparison.SecondExpression as ColumnReferenceExpression
      ;
      if (
        firstColumnReferenceExpression != null && 
        secondColumnReferenceExpression != null
      ) {
        string firstColumnResolved = resolveMultipartIdentifier(
          firstColumnReferenceExpression.MultiPartIdentifier
        );
        string secondColumnResolved = resolveMultipartIdentifier(
          secondColumnReferenceExpression.MultiPartIdentifier
        );
        equalityJoins.Add(
          new EqualityJoin(firstColumnResolved, secondColumnResolved)
        );
      }
    }
  }

  private string resolveMultipartIdentifier(MultiPartIdentifier identifier) {
    if (
      identifier.Identifiers.Count == 2 && 
      aliases.ContainsKey(identifier.Identifiers[0].Value)
    ) {
      return 
        aliases[identifier.Identifiers[0].Value] + "." + 
        identifier.Identifiers[1].Value;
    } else {
      return identifier.AsObjectName();
    }
  }
}

This hunts for QualifiedJoin instances and, if we find them, we in turn examine the search condition to find all occurrences of equality comparisons. Note that this does work with nested search conditions: in Bar JOIN Foo ON Bar.Quux = Foo.Quux AND Bar.Baz = Foo.Baz, we will find both expressions.

How do we find them? Using another small visitor:

public class FindEqualityComparisonVisitor : TSqlFragmentVisitor {
  List<BooleanComparisonExpression> comparisons = 
    new List<BooleanComparisonExpression>()
  ;
  public List<BooleanComparisonExpression> Comparisons { 
    get { return comparisons; } 
  }

  public override void Visit(BooleanComparisonExpression e) {
    if (e.IsEqualityComparison()) comparisons.Add(e);
  }
}

Nothing complicated here. It wouldn't be hard to fold this code into the other visitor, but I think this is clearer.

That's it, except for some helper code which I'll present without comment:

public class EqualityJoin {
  readonly SchemaObjectName left;
  public SchemaObjectName Left { get { return left; } }

  readonly SchemaObjectName right;
  public SchemaObjectName Right { get { return right; } }

  public EqualityJoin(
    string qualifiedObjectNameLeft, string qualifiedObjectNameRight
  ) {
    var parser = new TSql120Parser(initialQuotedIdentifiers: false);
    IList<ParseError> errors;
    using (var reader = new StringReader(qualifiedObjectNameLeft)) {
      left = parser.ParseSchemaObjectName(reader, out errors);
    }
    using (var reader = new StringReader(qualifiedObjectNameRight)) {
      right = parser.ParseSchemaObjectName(reader, out errors);
    }
  }

  public bool JoinsSameDatabase() {
    return left.Identifiers[0].Value == right.Identifiers[0].Value;
  }

  public override string ToString() {
    return String.Format("{0} = {1}", left.AsObjectName(), right.AsObjectName());
  }
}

public static class MultiPartIdentifierExtensions {
  public static string AsObjectName(this MultiPartIdentifier multiPartIdentifier) {
    return string.Join(".", multiPartIdentifier.Identifiers.Select(i => i.Value));
  }
}

public static class ExpressionExtensions {
  public static bool IsEqualityComparison(this BooleanExpression expression) {
    return 
      expression is BooleanComparisonExpression && 
      ((BooleanComparisonExpression) expression).ComparisonType == BooleanComparisonType.Equals
    ;
  }
}

As I mentioned before, this code is quite brittle. It assumes queries have a particular form, and it could fail (quite badly, by giving misleading results) if they don't. A major open challenge would be to extend it so it can handle scopes and unqualified references correctly, as well as the other weirdness a T-SQL script can feature, but I think it's a useful starting point nevertheless.

回答2:

Perhaps another way to attempt this is to execute your query as:

SET SHOWPLAN_XML ON;
UPDATE  t3
SET     description = 'abc'
FROM    database1.dbo.table1 t1
        INNER JOIN database2.dbo.table2 t2
            ON (t1.id = t2.t1_id)
        LEFT OUTER JOIN database3.dbo.table3 t3
            ON (t3.id = t2.t3_id)
        INNER JOIN database2.dbo.table4 t4
            ON (t4.id = t2.t4_id)

This returns an XML query plan. In the XML you will find the join conditions under a RelOp node. For example for a hash join loop you will see something like:

<RelOp NodeId="7" PhysicalOp="Hash Match" LogicalOp="Inner Join" EstimateRows="1" EstimateIO="0" EstimateCPU="0.0177716" AvgRowSize="15" EstimatedTotalSubtreeCost="0.0243408" Parallel="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row">
.. some stuff cut from here
  <Hash>
..
<ProbeResidual>
  <ScalarOperator ScalarString="[database2].[dbo].[table4].[Id] as [t4].[Id]=[database2].[dbo].[table2].[t4_Id] as [t2].[t4_Id]">
   <Compare CompareOp="EQ">
     <ScalarOperator>
       <Identifier>
         <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table4]" Alias="[t4]" Column="Id" />
       </Identifier>
     </ScalarOperator>
     <ScalarOperator>
       <Identifier>
         <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t4_Id" />
       </Identifier>
     </ScalarOperator>
   </Compare>
 </ScalarOperator>

For a nested loop something along the lines of:

<NestedLoops Optimized="0">
<Predicate>
  <ScalarOperator ScalarString="[database3].[dbo].[table3].[Id] as [t3].[Id]=[database2].[dbo].[table2].[t3_id] as [t2].[t3_id]">
    <Compare CompareOp="EQ">
      <ScalarOperator>
        <Identifier>
          <ColumnReference Database="[database3]" Schema="[dbo]" Table="[table3]" Alias="[t3]" Column="Id" />
        </Identifier>
      </ScalarOperator>
      <ScalarOperator>
        <Identifier>
          <ColumnReference Database="[database2]" Schema="[dbo]" Table="[table2]" Alias="[t2]" Column="t3_id" />
        </Identifier>
      </ScalarOperator>
    </Compare>
  </ScalarOperator>
</Predicate>

Perhaps you could then process this in C# to extract all joins then compare the databases held in the column references.

Apologies for the formatting.

来源：https://stackoverflow.com/questions/27240983/how-to-extract-cross-databases-references-using-scriptdom-api

标签

sql-server

tsql

parsing

foreign-keys

scriptdom