data-manipulation | 易学教程

Sub Value and Add new column pandas

阅读更多关于 Sub Value and Add new column pandas

问题 I am trying to read few files from a path as extension to my previous question The answer given by Jianxun Definitely makes sense but I am getting a key error. very very new to pandas and not able to fix error. Note: I use Python 2.7 and Pandas 0.16 File_1.csv Ids,12:00:00 2341,9865 7352,8969 File_2.csv Ids,12:45:00 1234,9865 8435,8969 Master.csv Ids,00:00:00,00:30:00,00:45:00 1234,1000,500,100 8435,5243,300,200 2341,563,400,400 7352,345,500,600 Programs: import pandas as pd import numpy as

Get location data from list comparison and plot

阅读更多关于 Get location data from list comparison and plot

问题 In my code, the user inputs a text file which is saved as the variable "emplaced_animals_data." This variable has four columns (Animal ID, X location, Y location, and Z location) and the number of rows varies depending on which text file is uploaded. I then have another list (listed_animals) which contains animals that we want to gather location data about from the emplaced_animals_data. So far, I have created a new variable for each item in the listed_animals list. I want to be able to

Clustering for Categorical and Numerical data

阅读更多关于 Clustering for Categorical and Numerical data

问题 I have a collection of alerts and I want to group it based on similarity/distance. As we have non-numeric data, How can i perform clustering for this kind of data. set.seed(42) data.frame(Host1 = rep("del",10), Host2 = c(rep("cpp",4), rep("sscp",3), rep("portal",3)), Host3 = c(rep("web",5), rep("apache",3), rep("app",2)), Host4 = c(sample(3,8, replace = TRUE), rep("con",2)), Date1 = abs(round(1:10 + rnorm(10),2))) Host1 Host2 Host3 Host4 Date1 1 del cpp web 3 1.40 2 del cpp web 3 1.89 3 del

Pathways: Manipulate list of events in parent-child 'nodes' in R

阅读更多关于 Pathways: Manipulate list of events in parent-child 'nodes' in R

问题 I am interested in visualizing pathways patients have based on a pre-specified list of events (e.g. diagnosis, surgery, treatment1, treatment2, death). A test data set might look like this: df <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"), Event = structure(c(2L, 3L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L), .Label = c("death", "diagnosis", "surgery", "treatment1", "treatment2"), class = "factor"), date =

Handling missing data in R

阅读更多关于 Handling missing data in R

问题 I'm facing a ridiculous situation. To tackle missing data issue, I used this code: fixed_data <- fetch_data[-which(! complete.cases(train_sample)),] train_index <- sample(1:nrow(fixed_data), size = .7*nrow(fixed_data)) train_sample <- fixed_data[train_index, ] test_sample <- fixed_data[-train_index,] Then I check the rows of portioned data to make sure there's no missing value, but there's still missing value! length(which(! complete.cases(fixed_data))) 回答1: I changed the code to fixed_data <

Combine different tables in a list in R

阅读更多关于 Combine different tables in a list in R

问题 update: Code below seems to work I'm not entire sure to how this question, so I apologise if this is worded badly. I tried looking for "combine different elements of a list using apply" but that doesn't seem to work. Anyways, as the result of scraping a website, I have two vectors giving identifying information and a list that contains a number of different tables. A simplified version looks something like this: respondents <- c("A", "B") questions <- c("question1", "question2") df1 <- data

Tabulate a data frame in R

阅读更多关于 Tabulate a data frame in R

问题 I wanted to tabulate data so that a factor variable becomes columns and keep value from another variable in cell. So I tried, a=rep(1:3,3) d<-rep(1:3, each=3) b=rnorm(9) c=runif(9) dt<-data.frame(a,d,b,c) a d b c 1 1 1 0.3819762 0.5199602 2 2 1 0.3896063 0.9144730 3 3 1 2.4356972 0.2888464 4 1 2 1.2697016 0.9831191 5 2 2 -1.9844689 0.2046947 6 3 2 0.3473766 0.4766178 7 1 3 -1.5461235 0.6187189 8 2 3 1.0829027 0.9089551 9 3 3 -0.1305324 0.6326141 I looked for data.table , plyr , reshape2 but

Manipulating seperated species quantity data into a species abundance matrix

阅读更多关于 Manipulating seperated species quantity data into a species abundance matrix

问题 I was hoping somebody could help with some data manipulation in R, i'm struggling to get this to work as the data is currently in a slightly odd format. I need a species abundance table in order to run some of the functions in vegan. However when I collected the data I inputted it in a way which is not very compatable as I had to keep species collected from the same site seperated by date and other factors which was necessary for another program. So my data currently looks like this: df <-

Make dummy target that checks the age of an existing file?

阅读更多关于 Make dummy target that checks the age of an existing file?

问题 I'm using make to control the data flow in a statistical analysis. If have my raw data in a directory ./data/raw_data_files , and I've got a data manipulation script that creates cleaned data cache at ./cache/clean_data . The make rule is something like: cache/clean_data: scripts/clean_data I do not want to touch the data in ./data/ , either with make, or any of my data munging scripts. Is there any way in make to create a dependency for the cache/clean_data that just checks whether specific

How to match and extract data using multiple criteria from 2 worksheets?

阅读更多关于 How to match and extract data using multiple criteria from 2 worksheets?

问题 I have 2 worksheets, Sheet1 and Sheet2 , Sheet1 is empty save for product numbers. I have to extract the data from Sheet2 into Sheet1 to give a clearer overview of it. In Sheet1 the regions are differentiated as AP (Asia Pacific), EMEA (Europe & Middle East) and NA (North America), and in Sheet2 they are differentiated as IN (India), DE (Germany) and US (USA) My sheets look like follows: Sheet1 ` Air |Ocean Number AP EMEA NA |AP EMEA NA 1 | 2 | 3 | 4 | ` Sheet2 NUMBER GEO_CODE FREIGHT_TYPE