I have input data like this.
NAME | PLACE | DATE
A | X | 2020-04-30
B | Y | 2019-04-30
i want to duplicate 5 times and change d
Here is a way to do it:
df_out = df.reindex(df.index.repeat(6))
df_out['DATE'] += pd.Series([pd.DateOffset(years=i)
for i in df_out.groupby('AME').cumcount()],
index=df_out.index)
df_out.reset_index(drop=True)
Output:
AME PLACE DATE
0 A X 2020-04-30
1 A X 2021-04-30
2 A X 2022-04-30
3 A X 2023-04-30
4 A X 2024-04-30
5 A X 2025-04-30
6 B Y 2019-04-30
7 B Y 2020-04-30
8 B Y 2021-04-30
9 B Y 2022-04-30
10 B Y 2023-04-30
11 B Y 2024-04-30
I think there isn't possible to repeat in the way you want. but maybe this approach help you:
first_clm = (["A"] * 6)
first_clm.extend(["B"] * 6)
scnd_clm = (["X"] * 6)
scnd_clm.extend(["Y"] * 6)
third_clm = ["20%s-04-30" % i for i in range(20,26)]
third_clm.extend(["20%s-04-30" % i for i in range(19,25)])
pd.DataFrame({"NAME": first_clm, "PLACE": scnd_clm, "DATE":third_clm})
Let's try this, convert single date to an array of dates
for the given range & will utilize DataFrame.explode to transform each element of a list-like to a row.
import pandas as pd
df = pd.DataFrame({
"Name": ["A", "B"],
"Place": ["X", "Y"],
"Date": ["2020-04-30", "2020-04-30"]
})
expand = 5
print(
df.assign(
Date=pd.to_datetime(df.Date)
.apply(lambda x: [x.replace(x.year + i) for i in range(0, expand + 1)])
).explode("Date").reset_index(drop=True)
)
Name Place Date
0 A X 2020-04-30
1 A X 2021-04-30
2 A X 2022-04-30
3 A X 2023-04-30
4 A X 2024-04-30
5 A X 2025-04-30
6 B Y 2020-04-30
7 B Y 2021-04-30
8 B Y 2022-04-30
9 B Y 2023-04-30
10 B Y 2024-04-30
11 B Y 2025-04-30
Use:
df['Date'] = pd.to_datetime(df['Date'])
y = np.array([pd.offsets.DateOffset(years=_) for _ in np.tile(range(6), len(df.index))])
df = df.reindex(df.index.repeat(6)).assign(Date=lambda x: x['Date'] + y)
Details:
Create a np.array of DateOffset objects that needs to be added to the Date
column to get the desired year offset.
print(y)
array([<DateOffset: years=0>, <DateOffset: years=1>,
<DateOffset: years=2>, <DateOffset: years=3>,
<DateOffset: years=4>, <DateOffset: years=5>,
<DateOffset: years=0>, <DateOffset: years=1>,
<DateOffset: years=2>, <DateOffset: years=3>,
<DateOffset: years=4>, <DateOffset: years=5>], dtype=object)
Use reindex
to reindex the dataframe as required and use assign to add the Date
with the years.
print(df)
Name Place Date
0 A X 2020-04-30
0 A X 2021-04-30
0 A X 2022-04-30
0 A X 2023-04-30
0 A X 2024-04-30
0 A X 2025-04-30
1 B Y 2019-04-30
1 B Y 2020-04-30
1 B Y 2021-04-30
1 B Y 2022-04-30
1 B Y 2023-04-30
1 B Y 2024-04-30