How to iterate over columns of pandas dataframe to run regression

后端未结

关注

 8  1802

I\'m sure this is simple, but as a complete newbie to python, I\'m having trouble figuring out how to iterate over variables in a pandas dataframe and run a regress

相关标签:

8条回答

夕颜

2021-01-29 18:08
You can index dataframe columns by the position using ix.
```
df1.ix[:,1]
```
This returns the first column for example. (0 would be the index)
```
df1.ix[0,]
```
This returns the first row.
```
df1.ix[:,1]
```
This would be the value at the intersection of row 0 and column 1:
```
df1.ix[0,1]
```
and so on. So you can enumerate() returns.keys(): and use the number to index the dataframe.
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-01-29 18:08
Based on the accepted answer, if an index corresponding to each column is also desired:
```
for i, column in enumerate(df):
    print i, df[column]
```
The above df[column] type is Series, which can simply be converted into numpy ndarrays:
```
for i, column in enumerate(df):
    print i, np.asarray(df[column])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

[愿得一人]

2021-01-29 18:10

You can use iteritems():

for name, values in df.iteritems():
    print('{name}: {value}'.format(name=name, value=values[0]))

0 讨论(0)

野的像风

2021-01-29 18:24

I'm a bit late but here's how I did this. The steps:

Create a list of all columns
Use itertools to take x combinations
Append each result R squared value to a result dataframe along with excluded column list
Sort the result DF in descending order of R squared to see which is the best fit.

This is the code I used on DataFrame called aft_tmt. Feel free to extrapolate to your use case..

import pandas as pd
# setting options to print without truncating output
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

import statsmodels.formula.api as smf
import itertools

# This section gets the column names of the DF and removes some columns which I don't want to use as predictors.
itercols = aft_tmt.columns.tolist()
itercols.remove("sc97")
itercols.remove("sc")
itercols.remove("grc")
itercols.remove("grc97")
print itercols
len(itercols)

# results DF
regression_res = pd.DataFrame(columns = ["Rsq", "predictors", "excluded"])

# excluded cols
exc = []

# change 9 to the number of columns you want to combine from N columns.
#Possibly run an outer loop from 0 to N/2?
for x in itertools.combinations(itercols, 9):
    lmstr = "+".join(x)
    m = smf.ols(formula = "sc ~ " + lmstr, data = aft_tmt)
    f = m.fit()
    exc = [item for item in x if item not in itercols]
    regression_res = regression_res.append(pd.DataFrame([[f.rsquared, lmstr, "+".join([y for y in itercols if y not in list(x)])]], columns = ["Rsq", "predictors", "excluded"]))

regression_res.sort_values(by="Rsq", ascending = False)

0 讨论(0)

醉梦人生

2021-01-29 18:25

Using list comprehension, you can get all the columns names (header):

[column for column in df]

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2021-01-29 18:29
```
for column in df:
    print(df[column])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页