Follow-up Dax Drop Duplicates

吃可爱长大的小学妹 提交于 2021-02-11 06:58:26

问题


Similiar to a question asked here,

Given, this table, I want to only keep the records where the email appears first.

email firstname Lastname Address City Zip
ABC@XYZ.com Scott Johnson A Z 1111
ABC@XYZ.com Bill Johnson B Y 2222
ABC@XYZ.com Ted Smith C X 3333
DEF@QRP.com Steve Williams D W 4444
XYZ@LMN.com Sam Samford E U 5555
XYZ@LMN.com David Beals F V 6666
DEF@QRP.com Stephen Jackson G T 7777
TUV@DEF.com Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Bill</td>
      <td>Johnson</td>
      <td>B</td>
      <td>Y</td>
      <td>2222</td>
    </tr>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Ted</td>
      <td>Smith</td>
      <td>C</td>
      <td>X</td>
      <td>3333</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>David</td>
      <td>Beals</td>
      <td>F</td>
      <td>V</td>
      <td>6666</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Stephen</td>
      <td>Jackson</td>
      <td>G</td>
      <td>T</td>
      <td>7777</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

Expected output table:

email firstname Lastname Address City Zip
ABC@XYZ.com Scott Johnson A Z 1111
DEF@QRP.com Steve Williams D W 4444
XYZ@LMN.com Sam Samford E U 5555
TUV@DEF.com Seven Alberts H S 8888

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>email</th>
      <th>firstname</th>
      <th>Lastname</th>
      <th>Address</th>
      <th>City</th>
      <th>Zip</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ABC@XYZ.com</td>
      <td>Scott</td>
      <td>Johnson</td>
      <td>A</td>
      <td>Z</td>
      <td>1111</td>
    </tr>
    <tr>
      <td>DEF@QRP.com</td>
      <td>Steve</td>
      <td>Williams</td>
      <td>D</td>
      <td>W</td>
      <td>4444</td>
    </tr>
    <tr>
      <td>XYZ@LMN.com</td>
      <td>Sam</td>
      <td>Samford</td>
      <td>E</td>
      <td>U</td>
      <td>5555</td>
    </tr>
    <tr>
      <td>TUV@DEF.com</td>
      <td>Seven</td>
      <td>Alberts</td>
      <td>H</td>
      <td>S</td>
      <td>8888</td>
    </tr>
  </tbody>
</table>

回答1:


There is no inherent ordering of a table in DAX, so in order to take the first row you need to add an index column or define an ordering on the table somehow.

For this answer, I'll assume that you've added an index column somehow (in the query editor or with a DAX calculated column).

You can create a filtered table as follows:

FilteredTable1 =
FILTER (
    Table1,
    Table1[Index]
        = CALCULATE ( MIN ( Table1[Index] ), ALLEXCEPT ( Table1, Table1[email] ) )
)

For each row in Table1, this checks if the index is minimal over all the rows with the same email.




回答2:


Assuming that we added an Index column with non duplicate values, it's possible to reduce the number of context transitions to only one per Email by preparing an Indexes table containing the indexes to be selected, and then apply this Indexes table as a filter using TREATAS.

T Index Unique = 
VAR Indexes =
    SELECTCOLUMNS(
        ALL( 'T Index'[Email] ),
        "MinIndex", CALCULATE( MIN( 'T Index'[Index] ) )
    )
RETURN
    CALCULATETABLE( 'T Index', TREATAS( Indexes, 'T Index'[Index] ) )

If instead we have non-unique column across the different Emails but unique per each email, like a timestamp, we can prepare a filter table containing the email and the timestamp

For instance with a T Date table like the following

The calculated table becomes

T Date Unique = 
VAR EmailDate =
    ADDCOLUMNS(
        ALL( 'T Date'[Email] ),
        "MinDate", CALCULATE( MIN( 'T Date'[Date] ) )
    )
RETURN
    CALCULATETABLE( 'T Date', TREATAS( EmailDate, 'T Date'[Email], 'T Date'[Date] ) )



来源:https://stackoverflow.com/questions/65397214/follow-up-dax-drop-duplicates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!