get datatype of column using pyspark

前端 未结 6 413
野的像风
野的像风 2021-01-31 15:58

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ).

I a

6条回答
  •  隐瞒了意图╮
    2021-01-31 16:42

    import pandas as pd
    pd.set_option('max_colwidth', -1) # to prevent truncating of columns in jupyter
    
    def count_column_types(spark_df):
        """Count number of columns per type"""
        return pd.DataFrame(spark_df.dtypes).groupby(1, as_index=False)[0].agg({'count':'count', 'names': lambda x: " | ".join(set(x))}).rename(columns={1:"type"})
    

    Example output in jupyter notebook for a spark dataframe with 4 columns:

    count_column_types(my_spark_df)
    

提交回复
热议问题