missing-data

Changing a continuous scale from decimal to percents

元气小坏坏 提交于 2019-12-11 00:29:25
问题 The scale for penetration is listed as a decimal (.5 and down), but I am having a problem changing it to a percent. I tried to format it in my data as a percentage using this code penetration_levels$Penetration<-sprintf("%.1f %%", 100*penetration_levels$Penetration) which worked from a format sense, but when I tried to graph the plot I got an error saying penetration was used as a discrete, not continuous scale. To fix that, used this code to format it as a numeric variable penetration_levels

NaN in the expected values, even though masked, introduces NaN in weight matrix

£可爱£侵袭症+ 提交于 2019-12-10 21:41:00
问题 Trying to deal with missing data, I wrote the following model and ran it. The output is given below. Why does the training step on NaN expected values, which are masked by loss_0_where_nan (and the history shows that the loss is indeed evaluated to 0.0 ), nonetheless introduce NaN weights in the weight matrices of both hidden and max_min_pred ? I first thought this might be some weighting of individual parameter learning with output values, which I thought might be specific to the Adadelta

Sample a single row, per column, with substantial missing data

我们两清 提交于 2019-12-10 21:07:39
问题 As an example of my data frame, which I will call df1 , I have GROUP1 with three rows of data, and GROUP2 with two rows of data. I have three variables, X1, X2, and X3: GROUP X1 X2 X3 GROUP1 A NA NA GROUP1 NA NA T GROUP1 C T G GROUP2 NA NA C GROUP2 G NA T I am halfway to my answer, based on a previous question and answer (Sample a single row, per column, within a subset of a data frame in R, while following conditions) except I am having issues using characters. I would like to sample a

OpenBUGS: missing value in Bernoulli distribution

ⅰ亾dé卋堺 提交于 2019-12-10 16:50:42
问题 I'm trying to model the observation "time" as random variable with OpenBUGS via R (R2OpenBUGS). If all the observation times are available (no NA's) everything works, but if I set one of the times to NA, nothing happens. I tested the same code with WinBUGS, and I get trap error 'NIL dereference (read)'. So my question is that is there something really wrong in my code, or is my model too weird for BUGS? My model is like this: model{ for(i in 1:k){ obs[i] ~ dbern(p) #is the observation done at

How to create missing values in table in R?

前提是你 提交于 2019-12-10 14:27:21
问题 I have 40 pairs of birds with each male and female in the pair scored for their colour. The colour score is a categorical variable with a value range of 1 to 9. I would like to create a table with the number of each combination (1/1, 1/2, 1/3, ... 9/7, 9/8, 9/9). My problem is that there are some combinations that do not exist in my data when I try to create the table (in these cases I would like zeros for the missing values). Below is the data and sample code. I am pretty sure the answer

Randomly insert NA's values in a pandas dataframe

一个人想着一个人 提交于 2019-12-10 12:42:47
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 3 years ago . How can I randomly insert np.nan 's in a DataFrame ? Let's say I want 10% null values inside my DataFrame. My data looks like this : df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'b', 'c', 'd', 'e'], columns=['one', 'two', 'three']) one two three a 0.695132 1.044791 -1.059536 b -1.075105 0.825776 1.899795 c -0.678980 0.051959 -0.691405 d -0.182928 1.455268 -1.032353 e 0

Changing or Turning off _FillValues

ぐ巨炮叔叔 提交于 2019-12-10 12:30:50
问题 I want to either turn off the filling or change the _FillValue to None/NaN in the NetCDF file. How do you do this? I have tried looking it up and nobody talks about it. When I output a variable such as longitude, this is what I get: float32 lons(lons) units: degree_east unlimited dimensions: current shape = (720,) filling on, default _FillValue of 9.969209968386869e+36 used I have also tried masking, but it still gives me the information above. Here is some code I have: lati = numpy.arange(

How do you make a heat map and cluster with NA values?

大兔子大兔子 提交于 2019-12-10 08:42:51
问题 I am trying to make a heat map using my data however struggle to code it properly. My matrix is filled with log(x+1) values, this way I don't encounter log(0) errors however due to the nature of my data I have a bunch of 0 values and they mask any sort of trends the heat map could be showing. Because of that I want to colour any 0 values grey or black and then the rest of my data colour along a blue-white-red spectrum. Here is the coding I am using, RHeatmap <- read.delim("~/Desktop/RHeatmap

Exporting ints with missing values to csv in Pandas

倾然丶 夕夏残阳落幕 提交于 2019-12-10 01:31:25
问题 When saving a Pandas DataFrame to csv, some integers are getting converted in floats. It happens where a column of floats has missing values ( np.nan ). Is there a simple way to avoid it? (Especially in an automatic way - I often deal with many columns of various data types.) For example import pandas as pd import numpy as np df = pd.DataFrame([[1,2],[3,np.nan],[5,6]], columns=["a","b"], index=["i_1","i_2","i_3"]) df.to_csv("file.csv") yields ,a,b i_1,1,2.0 i_2,3, i_3,5,6.0 What I would like

R: remove multiple rows based on missing values in fewer rows

梦想与她 提交于 2019-12-08 09:41:42
问题 I have an R data frame with data from multiple subjects, each tested several times. To perform statistics on the set, there is a factor for subject ("id") and a row for each observation (given by factor "session"). I.e. print(allData) id session measure 1 1 7.6 2 1 4.5 3 1 5.5 1 2 7.1 2 2 NA 3 2 4.9 In the above example, is there a simple way to remove all rows with id==2, given that the "measure" column contains NA in one of the rows where id==2? More generally, since I actually have a lot