How do I retrieve the number of columns in a Pandas data frame?

后端 未结 9 2126
心在旅途
心在旅途 2021-01-29 18:06

How do you programmatically retrieve the number of columns in a pandas dataframe? I was hoping for something like:

df.num_columns
相关标签:
9条回答
  • 2021-01-29 18:45

    Surprised I haven't seen this yet, so without further ado, here is:

    df.columns.size

    0 讨论(0)
  • 2021-01-29 18:50

    Like so:

    import pandas as pd
    df = pd.DataFrame({"pear": [1,2,3], "apple": [2,3,4], "orange": [3,4,5]})
    
    len(df.columns)
    3
    
    0 讨论(0)
  • 2021-01-29 18:54

    In order to include the number of row index "columns" in your total shape I would personally add together the number of columns df.columns.size with the attribute pd.Index.nlevels/pd.MultiIndex.nlevels:

    Set up dummy data

    import pandas as pd
    
    flat_index = pd.Index([0, 1, 2])
    multi_index = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), names=["letter", "id"])
    
    columns = ["cat", "dog", "fish"]
    
    data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    flat_df = pd.DataFrame(data, index=flat_index, columns=columns)
    multi_df = pd.DataFrame(data, index=multi_index, columns=columns)
    
    # Show data
    # -----------------
    # 3 columns, 4 including the index
    print(flat_df)
        cat  dog  fish
    id                
    0     1    2     3
    1     4    5     6
    2     7    8     9
    
    # -----------------
    # 3 columns, 5 including the index
    print(multi_df)
               cat  dog  fish
    letter id                
    a      1     1    2     3
           2     4    5     6
    b      1     7    8     9
    

    Writing our process as a function:

    def total_ncols(df, include_index=False):
        ncols = df.columns.size
        if include_index is True:
            ncols += df.index.nlevels
        return ncols
    
    print("Ignore the index:")
    print(total_ncols(flat_df), total_ncols(multi_df))
    
    print("Include the index:")
    print(total_ncols(flat_df, include_index=True), total_ncols(multi_df, include_index=True))
    

    This prints:

    Ignore the index:
    3 3
    
    Include the index:
    4 5
    

    If you want to only include the number of indices if the index is a pd.MultiIndex, then you can throw in an isinstance check in the defined function.

    As an alternative, you could use df.reset_index().columns.size to achieve the same result, but this won't be as performant since we're temporarily inserting new columns into the index and making a new index before getting the number of columns.

    0 讨论(0)
  • 2021-01-29 19:08

    This worked for me len(list(df)).

    0 讨论(0)
  • 2021-01-29 19:09

    There are multiple option to get column number and column information such as:
    let's check them.

    local_df = pd.DataFrame(np.random.randint(1,12,size=(2,6)),columns =['a','b','c','d','e','f']) 1. local_df.shape[1] --> Shape attribute return tuple as (row & columns) (0,1).

    1. local_df.info() --> info Method will return detailed information about data frame and it's columns such column count, data type of columns, Not null value count, memory usage by Data Frame

    2. len(local_df.columns) --> columns attribute will return index object of data frame columns & len function will return total available columns.

    3. local_df.head(0) --> head method with parameter 0 will return 1st row of df which actually nothing but header.

    Assuming number of columns are not more than 10. For loop fun: li_count =0 for x in local_df: li_count =li_count + 1 print(li_count)

    0 讨论(0)
  • 2021-01-29 19:10

    Alternative:

    df.shape[1]
    

    (df.shape[0] is the number of rows)

    0 讨论(0)
提交回复
热议问题