missing-data | 易学教程

Zeros as missing cases in R

阅读更多关于 Zeros as missing cases in R

问题 I have a csv with millions of cases that look like this: Case_1,11,17481,172,4436,8,4436 Case_2,11,1221,680,55200,1776,55200 Case_3,16,6647,6449,579967,1,579967 Case_4,22,0,0,0,0,0 In this case, Case_4 is missing data, since it has a bunch of zeros in it (there are hundreds of these in the file). I'm very new to R, and I was wondering if there is an efficient way of deleting these kinds of missing data from the file? Thanks. 回答1: Use the na.strings argument when reading in your file. df <-

Julia: creating a method for Any vector with missing values

阅读更多关于 Julia: creating a method for Any vector with missing values

问题 I would like to create a function that deals with missing values. However, when I tried to specify the missing type Array{Missing, 1}, it errors. function f(x::Array{<:Number, 1}) # do something complicated println("no missings.") println(sum(x)) end function f(x::Array{Missing, 1}) x = collect(skipmissing(x)) # do something complicated println("removed missings.") f(x) end f([2, 3, 5]) f([2, 3, 5, missing]) I understand that my type is not Missing but Array{Union{Missing, Int64},1} When I

How do I detect and re-insert missing data?

阅读更多关于 How do I detect and re-insert missing data?

问题 I have a missing row in a data table which describes a function from time , sid , and s.c to count : > dates.dt[1001:1011] sid s.c count time 1: missing CLICK 104192 2013-05-25 10:00:00 2: missing SHARE 7694 2013-05-25 10:00:00 3: present CLICK 99573 2013-05-25 10:00:00 4: present SHARE 89302 2013-05-25 10:00:00 5: missing CLICK 28 2013-05-25 11:00:00 6: present CLICK 25 2013-05-25 11:00:00 7: present SHARE 15 2013-05-25 11:00:00 8: missing CLICK 104544 2013-05-25 12:00:00 9: missing SHARE

Is it possible to get plot from panda dataframe includes missing data by Heatmap with especial color?

阅读更多关于 Is it possible to get plot from panda dataframe includes missing data by Heatmap with especial color?

问题 I was wondering if I can get all plots of columns in panda dataframe in one-window via heatmap in 24x20 self-made matrix-model-square which I designed to map every 480 values of each column(which means 1-cycle) by mapping them inside of it through all cycles. The challenging point is I want to show missing data by using especial color which is out of color range of colormap cmap ='coolwarm' I already tried by using df = df.replace([np.inf, -np.inf], np.nan) make sure that all inf convert to

How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

阅读更多关于 How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

问题 Apologies for the somewhat cumbersome question, but I am currently working on a mental health study. For one of the mental health screening tools there are 15 variables, each of which can have values of 0-3. The total score for each row/participant is then assigned by taking the sum of these 15 variables. The documentation for this tool states that if more than 20% of the values for a particular row/participant are missing, the total score should be taken as missing also, however if fewer

Missing params in Ajax Post request in Laravel

阅读更多关于 Missing params in Ajax Post request in Laravel

问题 I am trying to make an Ajax post request and pass params to use them in a query, but my params are always empty. Here is my code: $.ajaxSetup({ headers: { 'X-CSRF-TOKEN': $('meta[name="csrf-token"]').attr('content') } }); function searchPatient(){ var params = { 'name' : $("#input-search-name").val(), 'lastname' : $("#input-search-lastname").val() } console.log($('meta[name="csrf-token"]').attr('content')); $.ajax({ data : params, url : '{{ route("searchPatient") }}', contentType:

Progression of non-missing values that have missing values in-between

阅读更多关于 Progression of non-missing values that have missing values in-between

问题 To continue on a previous topic: Finding non-missing values between missing values I would like to also find whether the value before the missing value is smaller, equal to or larger than the one after the missing. To use the same example from before: df = structure(list(FirstYStage = c(NA, 3.2, 3.1, NA, NA, 2, 1, 3.2, 3.1, 1, 2, 5, 2, NA, NA, NA, NA, 2, 3.1, 1), SecondYStage = c(NA, 3.1, 3.1, NA, NA, 2, 1, 4, 3.1, 1, NA, 5, 3.1, 3.2, 2, 3.1, NA, 2, 3.1, 1), ThirdYStage = c(NA, NA, 3.1, NA,

How to get measures of model fit (AIC, F-statistics) in zelig for multiply imputed data?

阅读更多关于 How to get measures of model fit (AIC, F-statistics) in zelig for multiply imputed data?

问题 Following up on an earlier post, I am interested in learning how to get the usual measures of the relative quality of a statistical model in zelig for regression using multiply imputed data (created with Amelia). require(Zelig) require(Amelia) data(freetrade) #Imputation of missing data a.out <- amelia(freetrade, m=5, ts="year", cs="country") # Regression model z.out <- zelig(polity~tariff+gdp.pc, model="ls", data=a.out$imputations) summary(z.out) Model: ls Number of multiply imputed data

Pandas: filling missing values iterating through a groupby object

阅读更多关于 Pandas: filling missing values iterating through a groupby object

问题 I have the folowing dataset: d = {'player': ['1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3'], 'session': ['a', 'a', 'b', np.nan, 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'e', 'e', np.nan, 'e', 'f', 'f', 'g', np.nan, 'g'], 'date': ['2018-01-01 00:19:05', '2018-01-01 00:21:07', '2018-01-01 00:22:07', '2018-01-01 00:22:15','2018-01-01 00:25:09', '2018-01-01 00:25:11', '2018-01-01 00:27:28', '2018-01-01 00:29:29', '2018-01-01 00:30:35', '2018-01-01 00

back fill missing data with a label for a window of a time

阅读更多关于 back fill missing data with a label for a window of a time

问题 I want to backfill each column based on time (1 day ,2 day) with different label. here is the code: from datetime import datetime, timedelta import pandas as pd import numpy as np import random np.random.seed(11) date_today = datetime.now() ndays = 15 df = pd.DataFrame({'date': [date_today + timedelta(days=x) for x in range(ndays)], 'test': pd.Series(np.random.randn(ndays)), 'test2':pd.Series(np.random.randn(ndays))}) df = df.set_index('date') df = df.mask(np.random.random(df.shape) < .7)