data-manipulation

Sub Value and Add new column pandas

泄露秘密 提交于 2019-12-13 04:32:29
问题 I am trying to read few files from a path as extension to my previous question The answer given by Jianxun Definitely makes sense but I am getting a key error. very very new to pandas and not able to fix error. Note: I use Python 2.7 and Pandas 0.16 File_1.csv Ids,12:00:00 2341,9865 7352,8969 File_2.csv Ids,12:45:00 1234,9865 8435,8969 Master.csv Ids,00:00:00,00:30:00,00:45:00 1234,1000,500,100 8435,5243,300,200 2341,563,400,400 7352,345,500,600 Programs: import pandas as pd import numpy as

Get location data from list comparison and plot

若如初见. 提交于 2019-12-13 02:59:22
问题 In my code, the user inputs a text file which is saved as the variable "emplaced_animals_data." This variable has four columns (Animal ID, X location, Y location, and Z location) and the number of rows varies depending on which text file is uploaded. I then have another list (listed_animals) which contains animals that we want to gather location data about from the emplaced_animals_data. So far, I have created a new variable for each item in the listed_animals list. I want to be able to

Clustering for Categorical and Numerical data

核能气质少年 提交于 2019-12-12 18:06:53
问题 I have a collection of alerts and I want to group it based on similarity/distance. As we have non-numeric data, How can i perform clustering for this kind of data. set.seed(42) data.frame(Host1 = rep("del",10), Host2 = c(rep("cpp",4), rep("sscp",3), rep("portal",3)), Host3 = c(rep("web",5), rep("apache",3), rep("app",2)), Host4 = c(sample(3,8, replace = TRUE), rep("con",2)), Date1 = abs(round(1:10 + rnorm(10),2))) Host1 Host2 Host3 Host4 Date1 1 del cpp web 3 1.40 2 del cpp web 3 1.89 3 del

Pathways: Manipulate list of events in parent-child 'nodes' in R

醉酒当歌 提交于 2019-12-12 07:21:46
问题 I am interested in visualizing pathways patients have based on a pre-specified list of events (e.g. diagnosis, surgery, treatment1, treatment2, death). A test data set might look like this: df <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), Event = structure(c(2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L), .Label = c("death", "diagnosis", "surgery", "treatment1", "treatment2"), class = "factor"), date =

Handling missing data in R

送分小仙女□ 提交于 2019-12-12 04:59:05
问题 I'm facing a ridiculous situation. To tackle missing data issue, I used this code: fixed_data <- fetch_data[-which(! complete.cases(train_sample)),] train_index <- sample(1:nrow(fixed_data), size = .7*nrow(fixed_data)) train_sample <- fixed_data[train_index, ] test_sample <- fixed_data[-train_index,] Then I check the rows of portioned data to make sure there's no missing value, but there's still missing value! length(which(! complete.cases(fixed_data))) 回答1: I changed the code to fixed_data <

Combine different tables in a list in R

こ雲淡風輕ζ 提交于 2019-12-12 04:56:49
问题 update: Code below seems to work I'm not entire sure to how this question, so I apologise if this is worded badly. I tried looking for "combine different elements of a list using apply" but that doesn't seem to work. Anyways, as the result of scraping a website, I have two vectors giving identifying information and a list that contains a number of different tables. A simplified version looks something like this: respondents <- c("A", "B") questions <- c("question1", "question2") df1 <- data

Tabulate a data frame in R

余生颓废 提交于 2019-12-12 04:49:43
问题 I wanted to tabulate data so that a factor variable becomes columns and keep value from another variable in cell. So I tried, a=rep(1:3,3) d<-rep(1:3, each=3) b=rnorm(9) c=runif(9) dt<-data.frame(a,d,b,c) a d b c 1 1 1 0.3819762 0.5199602 2 2 1 0.3896063 0.9144730 3 3 1 2.4356972 0.2888464 4 1 2 1.2697016 0.9831191 5 2 2 -1.9844689 0.2046947 6 3 2 0.3473766 0.4766178 7 1 3 -1.5461235 0.6187189 8 2 3 1.0829027 0.9089551 9 3 3 -0.1305324 0.6326141 I looked for data.table , plyr , reshape2 but

Manipulating seperated species quantity data into a species abundance matrix

浪子不回头ぞ 提交于 2019-12-12 02:07:25
问题 I was hoping somebody could help with some data manipulation in R, i'm struggling to get this to work as the data is currently in a slightly odd format. I need a species abundance table in order to run some of the functions in vegan. However when I collected the data I inputted it in a way which is not very compatable as I had to keep species collected from the same site seperated by date and other factors which was necessary for another program. So my data currently looks like this: df <-

Make dummy target that checks the age of an existing file?

我是研究僧i 提交于 2019-12-11 17:45:57
问题 I'm using make to control the data flow in a statistical analysis. If have my raw data in a directory ./data/raw_data_files , and I've got a data manipulation script that creates cleaned data cache at ./cache/clean_data . The make rule is something like: cache/clean_data: scripts/clean_data I do not want to touch the data in ./data/ , either with make, or any of my data munging scripts. Is there any way in make to create a dependency for the cache/clean_data that just checks whether specific

How to match and extract data using multiple criteria from 2 worksheets?

你说的曾经没有我的故事 提交于 2019-12-11 16:48:19
问题 I have 2 worksheets, Sheet1 and Sheet2 , Sheet1 is empty save for product numbers. I have to extract the data from Sheet2 into Sheet1 to give a clearer overview of it. In Sheet1 the regions are differentiated as AP (Asia Pacific), EMEA (Europe & Middle East) and NA (North America), and in Sheet2 they are differentiated as IN (India), DE (Germany) and US (USA) My sheets look like follows: Sheet1 ` Air |Ocean Number AP EMEA NA |AP EMEA NA 1 | 2 | 3 | 4 | ` Sheet2 NUMBER GEO_CODE FREIGHT_TYPE