missing-data | 易学教程

Endpoint /tags/tag/media/recent is not showing all related posts

阅读更多关于 Endpoint /tags/tag/media/recent is not showing all related posts

问题 I am using the tag-Endpoint without OAuth to get all posts containing the Hasthag "#hierfuereuch". It works and returns nearly all matching posts, except some of the posts from the account http://instagram.com/antenne1de . This is also the account the client-id is registered to. This is the API-Call I am doing via PHP: https://api.instagram.com/v1/tags/hierfuereuch/media/recent?client_id=XXXXXXXXXXXXXX This post is in the result-list: http://instagram.com/p/j8JTepLcS-/ But this post is not:

How to replace missing values with group mode in Pandas?

阅读更多关于 How to replace missing values with group mode in Pandas?

问题 I follow the method in this post to replace missing values with the group mode, but encounter the "IndexError: index out of bounds". df['SIC'] = df.groupby('CIK').SIC.apply(lambda x: x.fillna(x.mode()[0])) I guess this is probably because some groups have all missing values and do not have a mode. Is there a way to get around this? Thank you! 回答1: mode is quite difficult, given that there really isn't any agreed upon way to deal with ties. Plus it's typically very slow. Here's one way that

interactions terms in multiple imputations (Amelia or other mi packages)

阅读更多关于 interactions terms in multiple imputations (Amelia or other mi packages)

问题 I have a question about interaction terms in multiple imputations. My understanding is that the imputation model is supposed to include all information that is used in the later analysis including any transformations or interactions of variables (the Amelia user guide also makes this statement). But when I include the interaction term int=x1*x2 in the imputation, the imputed value for int is not equal to x1*x2 . For example, when I have a binary variable x2 and a continuous variable x1 , int

How can I get missing values recorded as NULL when importing from csv

阅读更多关于 How can I get missing values recorded as NULL when importing from csv

问题 I have multiple, large, csv files, each of which has missing values in many places. When I import the csv file into SQLite, I would like to have the missing values recorded as NULL for the reason that another application expects missing data to be indicated by NULL. My current method does not produce the desired result. An example CSV file (test.csv) is: 12|gamma|17|delta 67||19|zeta 96|eta||theta 98|iota|29| The first line is complete; each of the other lines has (or is meant to show!) a

FB Graph / FQL: Current_location of friends is sometimes reading Null when FB page shows a location

阅读更多关于 FB Graph / FQL: Current_location of friends is sometimes reading Null when FB page shows a location

问题 I am attempting to pull the current location of all of a user's friends on facebook, and am running into a curious problem wherein some of those friends are reading out NULL when I can see on their actual Facebook pages that it says "Lives in , ." The difficult part of this error is that it only happens on probably ~30% of cases. On the remaining cases, it pulls all of the correct information, which tells me that the permissions are probably set up correctly. To be specific, the FQL code I am

Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

阅读更多关于 Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

问题 I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150. library("ZeligMultilevel") ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed", data=a.out$imputations) summary(ML.model.0) This code produces the following error code: Error in object[[1]]$result$call : $ operator not defined for this S4 class If I run a OLS regression, it works: model.0 <- zelig(dv~1, model="ls", data=a.out

Replace Nulls in DataFrame with Max in Row

阅读更多关于 Replace Nulls in DataFrame with Max in Row

问题 Is there a way (more efficient than using a for loop) to replace all the nulls in a Pandas' DataFrame with the max value in its respective row. 回答1: I guess that is what you are looking for: import pandas as pd df = pd.DataFrame({'a': [1, 2, 0], 'b': [3, 0, 10], 'c':[0, 5, 34]}) a b c 0 1 3 0 1 2 0 5 2 0 10 34 You can use apply , iterate over all rows and replace 0 by the maximal number of the row by using the replace function which gives you the expected output: df.apply(lambda row: row

Implementation of sklearn.impute.IterativeImputer

阅读更多关于 Implementation of sklearn.impute.IterativeImputer

问题 Consider data which contains some nan below: Column-1 Column-2 Column-3 Column-4 Column-5 0 NaN 15.0 63.0 8.0 40.0 1 60.0 51.0 NaN 54.0 31.0 2 15.0 17.0 55.0 80.0 NaN 3 54.0 43.0 70.0 16.0 73.0 4 94.0 31.0 94.0 29.0 53.0 5 99.0 52.0 77.0 91.0 58.0 6 84.0 19.0 36.0 NaN 97.0 7 41.0 91.0 62.0 67.0 68.0 8 44.0 38.0 27.0 53.0 37.0 9 58.0 NaN 63.0 57.0 28.0 10 66.0 68.0 89.0 36.0 47.0 11 7.0 81.0 5.0 99.0 16.0 12 43.0 55.0 64.0 88.0 NaN 13 8.0 90.0 91.0 44.0 4.0 14 29.0 52.0 94.0 71.0 47.0 15 22.0

Identify missing values in a sequence / perform asymmetric difference between two lists

阅读更多关于 Identify missing values in a sequence / perform asymmetric difference between two lists

问题 Using R, I want to efficiently identify which values in a sequence are missing. I've written the below example of how I do it. There must be a better way. Can someone help? data.list=c(1,2,4,5,7,8,9) full.list=seq(from = 1, to = 10, by =1) output <- c() for(i in 1:length(full.list)){ holder1 <- as.numeric(any(data.list == i)) output[i] <- holder1 } which(output == 0) 回答1: Another possible solution setdiff(full.list,data.list) 回答2: full.list[!full.list %in% data.list] 回答3: Another option using

How does multinom() treat NA values by default?

阅读更多关于 How does multinom() treat NA values by default?

问题 When I am running multinom() , say Y ~ X1 + X2 + X3 , if for one particular row X1 is NA (i.e. missing), but Y , X2 and X3 all have a value, would this entire row be thrown out (like it does in SAS)? How are missing values treated in multinom() ? 回答1: Here is a simple example (from ?multinom from the nnet package) to explore the different na.action : > library(nnet) > library(MASS) > example(birthwt) > (bwt.mu <- multinom(low ~ ., bwt)) Intentionally create a NA value: > bwt[1,"age"]<-NA #