How do you programmatically retrieve the number of columns in a pandas dataframe? I was hoping for something like:
df.num_columns
Surprised I haven't seen this yet, so without further ado, here is:
df.columns.size
Like so:
import pandas as pd
df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
len(df.columns)
3
In order to include the number of row index "columns" in your total shape I would personally add together the number of columns df.columns.size
with the attribute pd.Index.nlevels
/pd.MultiIndex.nlevels
:
Set up dummy data
import pandas as pd
flat_index = pd.Index([0, 1, 2])
multi_index = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), names=["letter", "id"])
columns = ["cat", "dog", "fish"]
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat_df = pd.DataFrame(data, index=flat_index, columns=columns)
multi_df = pd.DataFrame(data, index=multi_index, columns=columns)
# Show data
# -----------------
# 3 columns, 4 including the index
print(flat_df)
cat dog fish
id
0 1 2 3
1 4 5 6
2 7 8 9
# -----------------
# 3 columns, 5 including the index
print(multi_df)
cat dog fish
letter id
a 1 1 2 3
2 4 5 6
b 1 7 8 9
Writing our process as a function:
def total_ncols(df, include_index=False):
ncols = df.columns.size
if include_index is True:
ncols += df.index.nlevels
return ncols
print("Ignore the index:")
print(total_ncols(flat_df), total_ncols(multi_df))
print("Include the index:")
print(total_ncols(flat_df, include_index=True), total_ncols(multi_df, include_index=True))
This prints:
Ignore the index:
3 3
Include the index:
4 5
If you want to only include the number of indices if the index is a pd.MultiIndex
, then you can throw in an isinstance
check in the defined function.
As an alternative, you could use df.reset_index().columns.size
to achieve the same result, but this won't be as performant since we're temporarily inserting new columns into the index and making a new index before getting the number of columns.
This worked for me len(list(df)).
There are multiple option to get column number and column information such as:
let's check them.
local_df = pd.DataFrame(np.random.randint(1,12,size=(2,6)),columns =['a','b','c','d','e','f']) 1. local_df.shape[1] --> Shape attribute return tuple as (row & columns) (0,1).
local_df.info() --> info Method will return detailed information about data frame and it's columns such column count, data type of columns, Not null value count, memory usage by Data Frame
len(local_df.columns) --> columns attribute will return index object of data frame columns & len function will return total available columns.
local_df.head(0) --> head method with parameter 0 will return 1st row of df which actually nothing but header.
Assuming number of columns are not more than 10. For loop fun: li_count =0 for x in local_df: li_count =li_count + 1 print(li_count)
Alternative:
df.shape[1]
(df.shape[0]
is the number of rows)