missing-data | 易学教程

Changing a continuous scale from decimal to percents

阅读更多关于 Changing a continuous scale from decimal to percents

问题 The scale for penetration is listed as a decimal (.5 and down), but I am having a problem changing it to a percent. I tried to format it in my data as a percentage using this code penetration_levels$Penetration<-sprintf("%.1f %%", 100*penetration_levels$Penetration) which worked from a format sense, but when I tried to graph the plot I got an error saying penetration was used as a discrete, not continuous scale. To fix that, used this code to format it as a numeric variable penetration_levels

NaN in the expected values, even though masked, introduces NaN in weight matrix

阅读更多关于 NaN in the expected values, even though masked, introduces NaN in weight matrix

问题 Trying to deal with missing data, I wrote the following model and ran it. The output is given below. Why does the training step on NaN expected values, which are masked by loss_0_where_nan (and the history shows that the loss is indeed evaluated to 0.0 ), nonetheless introduce NaN weights in the weight matrices of both hidden and max_min_pred ? I first thought this might be some weighting of individual parameter learning with output values, which I thought might be specific to the Adadelta

Sample a single row, per column, with substantial missing data

阅读更多关于 Sample a single row, per column, with substantial missing data

问题 As an example of my data frame, which I will call df1 , I have GROUP1 with three rows of data, and GROUP2 with two rows of data. I have three variables, X1, X2, and X3: GROUP X1 X2 X3 GROUP1 A NA NA GROUP1 NA NA T GROUP1 C T G GROUP2 NA NA C GROUP2 G NA T I am halfway to my answer, based on a previous question and answer (Sample a single row, per column, within a subset of a data frame in R, while following conditions) except I am having issues using characters. I would like to sample a

OpenBUGS: missing value in Bernoulli distribution

阅读更多关于 OpenBUGS: missing value in Bernoulli distribution

问题 I'm trying to model the observation "time" as random variable with OpenBUGS via R (R2OpenBUGS). If all the observation times are available (no NA's) everything works, but if I set one of the times to NA, nothing happens. I tested the same code with WinBUGS, and I get trap error 'NIL dereference (read)'. So my question is that is there something really wrong in my code, or is my model too weird for BUGS? My model is like this: model{ for(i in 1:k){ obs[i] ~ dbern(p) #is the observation done at

How to create missing values in table in R?

阅读更多关于 How to create missing values in table in R?

问题 I have 40 pairs of birds with each male and female in the pair scored for their colour. The colour score is a categorical variable with a value range of 1 to 9. I would like to create a table with the number of each combination (1/1, 1/2, 1/3, ... 9/7, 9/8, 9/9). My problem is that there are some combinations that do not exist in my data when I try to create the table (in these cases I would like zeros for the missing values). Below is the data and sample code. I am pretty sure the answer

Randomly insert NA's values in a pandas dataframe

阅读更多关于 Randomly insert NA's values in a pandas dataframe

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 3 years ago . How can I randomly insert np.nan 's in a DataFrame ? Let's say I want 10% null values inside my DataFrame. My data looks like this : df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'b', 'c', 'd', 'e'], columns=['one', 'two', 'three']) one two three a 0.695132 1.044791 -1.059536 b -1.075105 0.825776 1.899795 c -0.678980 0.051959 -0.691405 d -0.182928 1.455268 -1.032353 e 0

Changing or Turning off _FillValues

阅读更多关于 Changing or Turning off _FillValues

问题 I want to either turn off the filling or change the _FillValue to None/NaN in the NetCDF file. How do you do this? I have tried looking it up and nobody talks about it. When I output a variable such as longitude, this is what I get: float32 lons(lons) units: degree_east unlimited dimensions: current shape = (720,) filling on, default _FillValue of 9.969209968386869e+36 used I have also tried masking, but it still gives me the information above. Here is some code I have: lati = numpy.arange(

How do you make a heat map and cluster with NA values?

阅读更多关于 How do you make a heat map and cluster with NA values?

问题 I am trying to make a heat map using my data however struggle to code it properly. My matrix is filled with log(x+1) values, this way I don't encounter log(0) errors however due to the nature of my data I have a bunch of 0 values and they mask any sort of trends the heat map could be showing. Because of that I want to colour any 0 values grey or black and then the rest of my data colour along a blue-white-red spectrum. Here is the coding I am using, RHeatmap <- read.delim("~/Desktop/RHeatmap

Exporting ints with missing values to csv in Pandas

阅读更多关于 Exporting ints with missing values to csv in Pandas

问题 When saving a Pandas DataFrame to csv, some integers are getting converted in floats. It happens where a column of floats has missing values ( np.nan ). Is there a simple way to avoid it? (Especially in an automatic way - I often deal with many columns of various data types.) For example import pandas as pd import numpy as np df = pd.DataFrame([[1,2],[3,np.nan],[5,6]], columns=["a","b"], index=["i_1","i_2","i_3"]) df.to_csv("file.csv") yields ,a,b i_1,1,2.0 i_2,3, i_3,5,6.0 What I would like

R: remove multiple rows based on missing values in fewer rows

阅读更多关于 R: remove multiple rows based on missing values in fewer rows

问题 I have an R data frame with data from multiple subjects, each tested several times. To perform statistics on the set, there is a factor for subject ("id") and a row for each observation (given by factor "session"). I.e. print(allData) id session measure 1 1 7.6 2 1 4.5 3 1 5.5 1 2 7.1 2 2 NA 3 2 4.9 In the above example, is there a simple way to remove all rows with id==2, given that the "measure" column contains NA in one of the rows where id==2? More generally, since I actually have a lot