How to resolve duplicate column names in excel file with Alteryx?

半腔热情 提交于 2019-12-12 06:37:27

问题


I have a wide excel file with price data, looking like this

Product | 2015-08-01 | 2015-09-01 | 2015-09-01 | 2015-10-01
ABC     | 13         | 12         | 15         | 14
CDE     | 69         | 70         | 71         | 67
FGH     | 25         | 25         | 26         | 27

The date 2015-09-01 can be found twice, which in the context is valid but obviously messes up my workflow. It can be understood that the first value is the minimum price, the second one the maximum price. If there is only one column, min and max are the same.

Is there a way to resolve this issue?

An idea I had was the following: I also have cells that contain a value like "38 - 42", again indicating min and max. I resolved this by spliting it based on a Regex expression. What could be a solution is to join two columns that have the same header, to afterwards split the values according to my rules. That however would require me to detect dynamically if the headers are duplicates.

Is that something that is possible in Alteryx or is there an easier solution for this problem?

And of course asking the supplier of the file to change it is not really an option, unfortunatelly.

Thanks

EDIT: Just got another idea: I transpose the table to have the format

Product | Date | Price Low | Price High

So if I could check for duplicates in that table and somehow merge these records into one, that would do the trick as well.

EDIT2: Since I seem to haven't made that clear, my final result should look like the transposed table in EDIT1. If there is only one value it should go in "Price Low" (and then I will probably copy it to "Price High" anyway. If there are two values they should go in the according columns. @Poornima's suggestion resolves the duplicate issue in a more sophisticated form than putting a "_2" behind the column name, but doesn't put the value in the required column.


回答1:


If this format works for you:

Product | Date | Price Low | Price High

Then:
- Transpose with Product as a key field
- Use a select tool to truncate your Name field to 10 characters. This will remove any _2 values that Alteryx has automatically renamed.
- Summarize:
Group by Product
Group by Name
Then apply Min and Max operations to value.

Result is:

Product  |  Name       |  Min_Value  |  Max_Value  
ABC      |  2015-08-01 |  13         |  13
ABC      |  2015-09-01 |  12         |  15
ABC      |  2015-10-01 |  14         |  14



回答2:


For this problem, you can leverage the native Excel (.xlsx) driver available in Alteryx 9.1. If multiple columns in Excel use the same string, then they are renamed by the native driver with an underscore at the end e.g., 2015-09-01, 2015-09-01_1. By leveraging this, we can reformat the data in three steps:

  1. As you suggested, we start by transposing the data so that we can leverage the column headers.
  2. We can then write a formula with the Formula Tool that evaluates whether the column header for the date is the first or the last one based on the header length.
  3. The final step would be to bring the data back into the same format as before, which can be via the Crosstab Tool.

You can review the configurations for each of these tools here. The end result would be as follows.

Hope this helps.

Regards,

Poornima



来源:https://stackoverflow.com/questions/33161337/how-to-resolve-duplicate-column-names-in-excel-file-with-alteryx

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!