Power Query: how to add one to a column when a specific values appear in an other column

蓝咒 提交于 2020-05-08 14:25:48

问题


I have an ID column and I am looking for ways to increment my IDs each time a specific item appears in my Geography column (ItalyZ, ItalyM, UKY or UKM) is found.

The ID of ItalyZ starts at 0 and ends at 4000.

The ID of ItalyB starts at 4000 and ends at 8000.

The ID of UKY starts at 0 and ends at 4000.

The ID of UKM starts at 4000 and ends at 8000.

However, I am refreshing my file, and I will thus have from time to time new arrivals of "geographies" without the origins or first IDs. These boundaries/ranges are only known beginning and ends.

Here is a sample of my data:

  |---------------------|------------------|    
  |       ID            |   Geography      |
  |---------------------|------------------|
  |    AB0000           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4041           |      ItalyB      |
  |---------------------|------------------|
  |    BC0000           |      UKY         |
  |---------------------|------------------|
  |    BC4001           |      UKM         |
  |---------------------|------------------|
  |    NULL             |      ItalyZ      |
  |---------------------|------------------|
  |    NULL             |      ItalyZ      |
  |---------------------|------------------|
  |    NULL             |      UKY         |
  |---------------------|------------------|
  |    NULL             |      UKM         |
  |---------------------|------------------|  

Here is my expected output :

  |---------------------|------------------|    
  |       ID            |   Geography      |
  |---------------------|------------------|
  |    AB0000           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4041           |      ItalyB      |
  |---------------------|------------------|
  |    BC0000           |      UKY         |
  |---------------------|------------------|
  |    BC4001           |      UKM         |
  |---------------------|------------------|
  |    AB0001           |      ItalyZ      |
  |---------------------|------------------|
  |    AB0001           |      ItalyZ      |
  |---------------------|------------------|
  |    AB4042           |      UKY         |
  |---------------------|------------------|
  |    BC0001           |      UKM         |
  |---------------------|------------------|  

I have been trying many various ways and trying to adapt running total solutions. I have also been trying to break apart my file in four different ones in order not to have an If function alternating between cases, and thus making it simpler, like this in my power query:

 #"Added Custom2" = Table.AddColumn(#"Reordered Columns", "Sum", each if [Geography] = "UKM" then [Number AB range below 4000] + 1 
else if [Geography] = "UKY" then [Number AB range above 4000] + 1 
else if [Geography] = "ItalyB" then [Number BC range above 5000]
else [Number BC range below 5000] + 1)

But absolutely nothing works. This maddening.


回答1:


As in my other answer, here's a simplified problem ignoring the ID letter prefixes you have.

ID,  Group | NewID
-----------|------
4,     A   | 4
7,     A   | 7
300,   B   | 300
525,   C   | 525
null,  A   | 10
9,     A   | 9
null,  A   | 11
null,  B   | 301
null,  C   | 526
null,  A   | 12
null,  B   | 302

Starting from the left part of the table, we want to compute the new column NewID.

In this answer, I will write a custom function that's written recursively using the List.Generate function.

From the linked documentation the function is set up like this

List.Generate(
    initial as function,                    /*Set all your initial variables*/
    condition as function,                  /*Stopping criteria.*/
    next as function,                       /*Define how to update at each step.*/
    optional selector as nullable function  /*Pick output element.*/
) as list

Define a function that takes a column potentially containing nulls and fills the nulls incrementally from the maximal non-null value:

(Column as list) as list =>
let
    Generate =
    List.Generate(
        () => [x = Column{0}, i = 0, n = List.Max(Column)],
        each [i] < List.Count(Column),
        each [
            i = [i] + 1,
            x = if Column{i} = null then [n] + 1 else Column{i},
            n = if Column{i} = null then [n] + 1 else [n]
            ],     
        each [x]
    )
in
    Generate

When you define the function, it looks like this and can be re-used in any other queries:

You can use it by choosing an existing column in an existing table and clicking the Invoke button.

This will create a new list in your Queries pane named Invoked Function that is that function applied to the column you selected.

You can also create a blank query and pass it a list. For example, FilterNulls({4,7,null,9,null,null}) returns {4,7,10,9,11,12}.

Here's what that looks like in the query editor.


What we really want to do is use this function as a column transformation in a group by operation and then expand:

let
    Source = <Data Table Source Here>,
    #"Grouped Rows" = Table.Group(Source, {"Group"}, {{"FillNulls", each FillNulls([ID]), type list}}),
    #"Expanded FillNulls" = Table.ExpandListColumn(#"Grouped Rows", "FillNulls")
in
    #"Expanded FillNulls"

Here's what it looks like after grouping but before expanding:

Notice what the function is doing. We're applying the function FillNulls on the ID column for each separate Group.


This is a similar amount of steps and complexity as the other answer but uses a function that is constructed in a recursive way that may be more familiar to you.




回答2:


I will answer a further simplified problem as I don't want to address the ID letter prefixes.

Let's say we have the following table (I've included :

ID,  Group
-----------
0,     A
1,     A
300,   B
525,   C
null,  A
null,  B
null,  B
null,  C

And want to generate a new column NewID that will replace ID.

ID,  Group, NewID
------------------
0,     A,   0
1,     A,   1
300,   B,   300
525,   C,   525
null,  A,   2
null,  B,   301
null,  B,   302
null,  C,   526

Here's a method that uses Table.AddIndexColumn:

let
    Source = <First Table Above>,
    #"Grouped Rows" = Table.Group(Source, {"Group"}, {{"ID", each List.Max([ID]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", (C) => Table.AddIndexColumn(Table.SelectRows(Source, each _[Group] = C[Group]),"NewID",C[ID],1)),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"NewID"}, {"NewID"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"ID"})
in
    #"Removed Columns"

First, we group by Group to find the maximal ID per Group:

Then we add a new column where each row in the column is a table defined by filtering the original table down to just the current group and then adding an index column starting at the maximal ID we just found. This is the most complex step.

From here, we expand the Custom table column (choosing the column we don't already have) and remove the old ID column. We're now where we need to be short of any sorting or column typing we choose to do.


Edit: I've made a mistake in the above. Note that the NewID for Group A is 1,2,3 instead of 0,1,2 I was trying for.

To fix this for this simple example, you can use List.Min instead of List.Max in the group-by step.

For a more complex example, you may need to add an index column to the source table so you can merge back to it after expanding and only use the new NewID for previously null ID values since we can't guarantee they are contiguous.

Here's the code:

let
    Source = <First Table Above>,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Group"}, {{"ID", each List.Max([ID]), type number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", (C) => Table.AddIndexColumn(Table.SelectRows(Table.Sort(#"Added Index",{"ID"}), each _[Group] = C[Group]),"NewID",C[ID]+1,1)),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Index", "NewID"}, {"Index", "NewID"}),
    #"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, #"Expanded Custom", {"Index"}, "Expanded Custom", JoinKind.LeftOuter),
    #"Expanded Expanded Custom" = Table.ExpandTableColumn(#"Merged Queries", "Expanded Custom", {"NewID"}, {"NewID"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Expanded Custom", "ReplaceID", each if [ID] = null then [NewID] else [ID]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"ID", "NewID"})
in
    #"Removed Columns"

The complex step changes just a bit:

(C) => Table.AddIndexColumn(
           Table.SelectRows(
               Table.Sort(#"Added Index", {"ID"}),
               each _[Group] = C[Group]
           ),
           "NewID", C[ID] + 1, 1
       )

The difference is that we need to add a sort so the nulls come after all the already assigned ID values and start indexing the nulls at C[ID] + 1 instead of just C[ID].


Here's a version that has fewer steps (no group by, expand, or merge) but a bit more complicated function:

let
    Source = <First Table Above>,    
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", (C) => Table.SelectRows(#"Added Index", each _[Group] = C[Group])),
    #"Added NewID" = Table.AddColumn(#"Added Custom", "NewID", (C) => if C[ID] = null then Table.SelectRows(Table.AddIndexColumn(Table.SelectRows(C[Custom], each _[ID] = null), "NewID", List.Max(C[Custom][ID])+1,1), each _[Index] = C[Index]){0}[NewID] else C[ID]),
    #"Removed Columns" = Table.RemoveColumns(#"Added NewID",{"Custom"})
in
    #"Removed Columns"

The first added Custom column is just the indexed source table filtered to the current Group. Then we add the NewID column defined as:

(Read from the inside out.)

(C) =>
  if C[ID] = null
  then Table.SelectRows(
           Table.AddIndexColumn(
               Table.SelectRows(C[Custom], each _[ID] = null),
               "NewID", List.Max(C[Custom][ID]) + 1, 1
           ),
           each _[Index] = C[Index]
       ){0}[NewID]
  else C[ID]

Similar to before, we take the group subtable Custom, just pick the null ID rows and index them starting from the max non-null ID plus one. This still leaves us with a table, so we just want the row in this subtable that corresponds to the Index from the whole table. We use {0}[NewID] to extract the value from the cell in the first (only) row in the table in the [NewID] column. For non-null ID values, the else clause just leaves them as they were.



来源:https://stackoverflow.com/questions/61488154/power-query-how-to-add-one-to-a-column-when-a-specific-values-appear-in-an-othe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!