missing-data

Endpoint /tags/tag/media/recent is not showing all related posts

天涯浪子 提交于 2019-12-23 05:14:18
问题 I am using the tag-Endpoint without OAuth to get all posts containing the Hasthag "#hierfuereuch". It works and returns nearly all matching posts, except some of the posts from the account http://instagram.com/antenne1de . This is also the account the client-id is registered to. This is the API-Call I am doing via PHP: https://api.instagram.com/v1/tags/hierfuereuch/media/recent?client_id=XXXXXXXXXXXXXX This post is in the result-list: http://instagram.com/p/j8JTepLcS-/ But this post is not:

How to replace missing values with group mode in Pandas?

时光毁灭记忆、已成空白 提交于 2019-12-23 03:19:14
问题 I follow the method in this post to replace missing values with the group mode, but encounter the "IndexError: index out of bounds". df['SIC'] = df.groupby('CIK').SIC.apply(lambda x: x.fillna(x.mode()[0])) I guess this is probably because some groups have all missing values and do not have a mode. Is there a way to get around this? Thank you! 回答1: mode is quite difficult, given that there really isn't any agreed upon way to deal with ties. Plus it's typically very slow. Here's one way that

interactions terms in multiple imputations (Amelia or other mi packages)

拥有回忆 提交于 2019-12-22 10:59:39
问题 I have a question about interaction terms in multiple imputations. My understanding is that the imputation model is supposed to include all information that is used in the later analysis including any transformations or interactions of variables (the Amelia user guide also makes this statement). But when I include the interaction term int=x1*x2 in the imputation, the imputed value for int is not equal to x1*x2 . For example, when I have a binary variable x2 and a continuous variable x1 , int

How can I get missing values recorded as NULL when importing from csv

柔情痞子 提交于 2019-12-22 08:24:24
问题 I have multiple, large, csv files, each of which has missing values in many places. When I import the csv file into SQLite, I would like to have the missing values recorded as NULL for the reason that another application expects missing data to be indicated by NULL. My current method does not produce the desired result. An example CSV file (test.csv) is: 12|gamma|17|delta 67||19|zeta 96|eta||theta 98|iota|29| The first line is complete; each of the other lines has (or is meant to show!) a

FB Graph / FQL: Current_location of friends is sometimes reading Null when FB page shows a location

≡放荡痞女 提交于 2019-12-22 05:58:19
问题 I am attempting to pull the current location of all of a user's friends on facebook, and am running into a curious problem wherein some of those friends are reading out NULL when I can see on their actual Facebook pages that it says "Lives in , ." The difficult part of this error is that it only happens on probably ~30% of cases. On the remaining cases, it pulls all of the correct information, which tells me that the permissions are probably set up correctly. To be specific, the FQL code I am

Multi-level regression model on multiply imputed data set in R (Amelia, zelig, lme4)

不想你离开。 提交于 2019-12-22 05:25:22
问题 I am trying to run a multi-level model on multiply imputed data (created with Amelia); the sample is based on a clustered sample with group = 24, N= 150. library("ZeligMultilevel") ML.model.0 <- zelig(dv~1 + tag(1|group), model="ls.mixed", data=a.out$imputations) summary(ML.model.0) This code produces the following error code: Error in object[[1]]$result$call : $ operator not defined for this S4 class If I run a OLS regression, it works: model.0 <- zelig(dv~1, model="ls", data=a.out

Replace Nulls in DataFrame with Max in Row

谁说胖子不能爱 提交于 2019-12-21 16:48:35
问题 Is there a way (more efficient than using a for loop) to replace all the nulls in a Pandas' DataFrame with the max value in its respective row. 回答1: I guess that is what you are looking for: import pandas as pd df = pd.DataFrame({'a': [1, 2, 0], 'b': [3, 0, 10], 'c':[0, 5, 34]}) a b c 0 1 3 0 1 2 0 5 2 0 10 34 You can use apply , iterate over all rows and replace 0 by the maximal number of the row by using the replace function which gives you the expected output: df.apply(lambda row: row

Implementation of sklearn.impute.IterativeImputer

风流意气都作罢 提交于 2019-12-21 06:55:42
问题 Consider data which contains some nan below: Column-1 Column-2 Column-3 Column-4 Column-5 0 NaN 15.0 63.0 8.0 40.0 1 60.0 51.0 NaN 54.0 31.0 2 15.0 17.0 55.0 80.0 NaN 3 54.0 43.0 70.0 16.0 73.0 4 94.0 31.0 94.0 29.0 53.0 5 99.0 52.0 77.0 91.0 58.0 6 84.0 19.0 36.0 NaN 97.0 7 41.0 91.0 62.0 67.0 68.0 8 44.0 38.0 27.0 53.0 37.0 9 58.0 NaN 63.0 57.0 28.0 10 66.0 68.0 89.0 36.0 47.0 11 7.0 81.0 5.0 99.0 16.0 12 43.0 55.0 64.0 88.0 NaN 13 8.0 90.0 91.0 44.0 4.0 14 29.0 52.0 94.0 71.0 47.0 15 22.0

Identify missing values in a sequence / perform asymmetric difference between two lists

▼魔方 西西 提交于 2019-12-21 04:31:36
问题 Using R, I want to efficiently identify which values in a sequence are missing. I've written the below example of how I do it. There must be a better way. Can someone help? data.list=c(1,2,4,5,7,8,9) full.list=seq(from = 1, to = 10, by =1) output <- c() for(i in 1:length(full.list)){ holder1 <- as.numeric(any(data.list == i)) output[i] <- holder1 } which(output == 0) 回答1: Another possible solution setdiff(full.list,data.list) 回答2: full.list[!full.list %in% data.list] 回答3: Another option using

How does multinom() treat NA values by default?

旧巷老猫 提交于 2019-12-20 07:35:42
问题 When I am running multinom() , say Y ~ X1 + X2 + X3 , if for one particular row X1 is NA (i.e. missing), but Y , X2 and X3 all have a value, would this entire row be thrown out (like it does in SAS)? How are missing values treated in multinom() ? 回答1: Here is a simple example (from ?multinom from the nnet package) to explore the different na.action : > library(nnet) > library(MASS) > example(birthwt) > (bwt.mu <- multinom(low ~ ., bwt)) Intentionally create a NA value: > bwt[1,"age"]<-NA #