Remove duplicates values based on multiple column with a condition in query editor Power BI

北城余情 提交于 2020-06-15 23:45:52

问题


I am new to power bi and would require your help to sort out below issue which I am facing.

Basically I am taking three columns into consideration as below:

enter image description here

Question: I would like to remove duplicate values from above table based on conditon " Equal value for "Time" ,"ID" and Absolute difference in "Time spent" is lower or equal than 1" as you can see in the image Rows highlighted falls in this category.

I would like to get these below rows removed based upon condition.

enter image description here

Question: I would like to remove duplicate values from above table based on conditon " Equal value for "Time" ,"ID" and Absolute difference in "Time spent" is lower or equal than 1" as you can see in the image Rows highlighted falls in this category.

I would like to get these below rows removed based upon condition.

enter image description here

I am able to perform this in excel by making us of a fourth column with formulae =IF(AND(A3=A2,B3=B2,ABS(F3-F2)<1),"problem",0) and then filtering out the rows marked as probelm. Please help!!

Regards

Mahi


回答1:


I bet the suggestion from @Alexis Olson works just fine, but since you specifically mentioned the Query Editor, here's how I would do it there:


  1. Have your data loaded like below, and just accept the changes made under Changed Type:

Don't worry about the other steps under the Query Settings. We'll get to that eventually.

  1. Select Add Column and click Index Column, so that you get this:

  1. Select Add Column, click Custom Column and insert this little formula in the appearing dialog box Table.AddColumn(#"Added Index", "Custom", each #"Added Index"[Time Spent]{[Index]}-#"Added Index"[Time Spent]{[Index]-1}):

  1. Click OK, and make sure that you're getting this:

  1. I think this step is a little weird, but you'll have to click 'Table' there in the column:

  1. You will get an Error message in the first row, but you can remove that by right-clicking that column, and clicking Remove Errors:

  1. Now you can click the drop-down menu in the Custom Column, select Number Filter and Does Not Equal

  1. And insert 0, or select 0 from the drop-down menu in the dialog box:

  1. This is it, your required numbers should now be filtered away:

Note, however, that this procedure comes at a cost since you're losing the first value due to the first step in the indexing. If the rest of this is something you can use, I can see if we can fix that last little part as well.




回答2:


You can pick a representative [Time Spent] value from each unique set of rows by taking a max or min over the list of "duplicate" values. Here's the formula for such a custom column, which I'll call [Min Time]:

= List.Min(
      Table.SelectRows(#"Previous Step",
          (C) => (C[Time] = [Time] and
                  C[ID] = [ID] and
                  Number.Abs(C[Time Spent] - [Time Spent]) < 1)
      )[Time Spent])

Once you have this custom column, you can group by [Time], [ID], and [Min Time] to roll up the duplicates and then rename the [Min Time] column to [Time Spent].



来源:https://stackoverflow.com/questions/51798712/remove-duplicates-values-based-on-multiple-column-with-a-condition-in-query-edit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!