Pandas concat failing

后端未结

关注

 4  596

无人及你

I am trying to concat dataframes based on the foll. 2 csv files:

df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0

df_b: https://www.dropbo

相关标签:

4条回答

温柔的废话

2020-12-03 10:30
I believe that this error occurs if the following two conditions are met:
1. The data frames have different columns. (i.e. (df1.columns == df2.columns) is False
2. The columns has a repeated value.
Basically if you concat dataframes with columns [A,B,C] and [B,C,D] it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C] it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.

If your dataframes are such that df1.columns == df2.columns then it will work anyway. So you can join [B,B,C] to [B,B,C], but not to [C,B,B], as if the columns are identical it probably just uses the integer indexes or something.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤街浪徒

2020-12-03 10:34
You can get around this issue with a 'manual' concatenation, in this case your
```
list_of_dfs = [df_a, df_b]
```
And instead of running
```
giant_concat_df = pd.concat(list_of_dfs,0)
```
You can use turn all of the dataframes to a list of dictionaries and then make a new data frame from these lists (merged with chain)
```
from itertools import chain
list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs]    
giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-03 10:38
The answers here did not solve my issue, but this answer did.

The Issue was duplicated columns in one or both DataFrames.

Here's a duplicated column fix(as per answer above):
```
df = df.loc[:,~df.columns.duplicated()]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-12-03 10:53
Unfortunately, the source files are already unavailable, so I can't check my solution in your case. In my case the error occurred when:
1. Data frames have two columns with the same name (I've had ID and id columns, which I then converted to lower case, so they become the same)
2. Value types of the same-named columns are different
Here is an example which gives me the error in question:
```
df1 = pd.DataFrame(data=[
    ['a', 'b', 'id', 1],
    ['a', 'b', 'id', 2]
], columns=['A', 'B', 'id', 'id'])

df2 = pd.DataFrame(data=[
    ['b', 'c', 'id', 1],
    ['b', 'c', 'id', 2]
], columns=['B', 'C', 'id', 'id'])
pd.concat([df1, df2])
>>> AssertionError: Number of manager items must equal union of block items
 # manager items: 4, # tot_items: 5
```
Removing / renaming one of the columns makes this code work.
0 讨论(0)
发布评论:

提交评论
- 加载中...