Julia | DataFrame | Replacing missing Values

前端 未结 4 1504
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-20 13:53

How can we replace missing values with 0.0 for a column in a DataFrame?

4条回答
  •  暖寄归人
    2021-02-20 14:15

    There are a few different approaches to this problem (valid for Julia 1.x):

    Base.replace!

    Probably the easiest approach is to use replace! or replace from base Julia. Here is an example with replace!:

    julia> using DataFrames
    
    julia> df = DataFrame(x = [1, missing, 3])
    3×1 DataFrame
    │ Row │ x       │
    │     │ Int64⍰  │
    ├─────┼─────────┤
    │ 1   │ 1       │
    │ 2   │ missing │
    │ 3   │ 3       │
    
    julia> replace!(df.x, missing => 0);
    
    julia> df
    3×1 DataFrame
    │ Row │ x      │
    │     │ Int64⍰ │
    ├─────┼────────┤
    │ 1   │ 1      │
    │ 2   │ 0      │
    │ 3   │ 3      │
    

    However, note that at this point the type of column x still allows missing values:

    julia> typeof(df.x)
    Array{Union{Missing, Int64},1}
    

    This is also indicated by the question mark following Int64 in column x when the data frame is printed out. You can change this by using disallowmissing! (from the DataFrames.jl package):

    julia> disallowmissing!(df, :x)
    3×1 DataFrame
    │ Row │ x     │
    │     │ Int64 │
    ├─────┼───────┤
    │ 1   │ 1     │
    │ 2   │ 0     │
    │ 3   │ 3     │
    

    Alternatively, if you use replace (without the exclamation mark) as follows, then the output will already disallow missing values:

    julia> df = DataFrame(x = [1, missing, 3]);
    
    julia> df.x = replace(df.x, missing => 0);
    
    julia> df
    3×1 DataFrame
    │ Row │ x     │
    │     │ Int64 │
    ├─────┼───────┤
    │ 1   │ 1     │
    │ 2   │ 0     │
    │ 3   │ 3     │
    

    Base.ismissing with logical indexing

    You can use ismissing with logical indexing to assign a new value to all missing entries of an array:

    julia> df = DataFrame(x = [1, missing, 3]);
    
    julia> df.x[ismissing.(df.x)] .= 0;
    
    julia> df
    3×1 DataFrame
    │ Row │ x      │
    │     │ Int64⍰ │
    ├─────┼────────┤
    │ 1   │ 1      │
    │ 2   │ 0      │
    │ 3   │ 3      │
    

    Base.coalesce

    Another approach is to use coalesce:

    julia> df = DataFrame(x = [1, missing, 3]);
    
    julia> df.x = coalesce.(df.x, 0);
    
    julia> df
    3×1 DataFrame
    │ Row │ x     │
    │     │ Int64 │
    ├─────┼───────┤
    │ 1   │ 1     │
    │ 2   │ 0     │
    │ 3   │ 3     │
    

    DataFramesMeta

    Both replace and coalesce can be used with the @transform macro from the DataFramesMeta.jl package:

    julia> using DataFramesMeta
    
    julia> df = DataFrame(x = [1, missing, 3]);
    
    julia> @transform(df, x = replace(:x, missing => 0))
    3×1 DataFrame
    │ Row │ x     │
    │     │ Int64 │
    ├─────┼───────┤
    │ 1   │ 1     │
    │ 2   │ 0     │
    │ 3   │ 3     │
    
    julia> df = DataFrame(x = [1, missing, 3]);
    
    julia> @transform(df, x = coalesce.(:x, 0))
    3×1 DataFrame
    │ Row │ x     │
    │     │ Int64 │
    ├─────┼───────┤
    │ 1   │ 1     │
    │ 2   │ 0     │
    │ 3   │ 3     │
    

    Additional documentation

    • Julia manual
    • Julia manual - function reference
    • DataFrames.jl manual

提交回复
热议问题