data-manipulation | 易学教程

bash? - combining files into CSVs

阅读更多关于 bash? - combining files into CSVs

问题 I know (see here) that you can use paste to combine multiple files into a .csv file if each file holds a column i.e.. paste -d "," column1.dat column2.dat column3.dat ... > myDat.csv will result in myDat.csv column1, column2, column3, ... c1-1, c2-1, c3-1, ... c1-2, c2-2, c3-2, ... ... ... ... (without the tabs. just inserted them to make it more readable) What if I have multiple measurements, instead? e.g. file1.dat has format <xvalue> <y1value> file2.dat has format <xvalue> <y2avlue> file3

Function to compute 3D gradient with unevenly spaced sample locations

阅读更多关于 Function to compute 3D gradient with unevenly spaced sample locations

问题 I have experimental observations in a volume: import numpy as np # observations are not uniformly spaced x = np.random.normal(0, 1, 10) y = np.random.normal(5, 2, 10) z = np.random.normal(10, 3, 10) xx, yy, zz = np.meshgrid(x, y, z, indexing='ij') # fake temperatures at those coords tt = xx*2 + yy*2 + zz*2 # sample distances dx = np.diff(x) dy = np.diff(y) dz = np.diff(z) grad = np.gradient(tt, [dx, dy, dz]) # returns error This gives me the error: ValueError: operands could not be broadcast

R bootstrap statistics by group for big data

阅读更多关于 R bootstrap statistics by group for big data

问题 I want to bootstrap a data set that has groups in it. A simple scenario would be bootstrapping simple means: data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]} boot(data, stat, R = 10) This gives me the error incorrect number of subscripts on matrix , because of by = "group" part. I managed to solve it using subsetting, but don't like this solution. Is there simpler way to make this

R data.table create new columns with standard names

阅读更多关于 R data.table create new columns with standard names

问题 I wanted to create new columns for my data.table based on ratio calculation. The names of my variables are slightly in a standard way so I think there must be a way to easily achieve this in data.table . However I am not able to get how to achieve this. Below is my sample data and code - set.seed(1200) ID <- seq(1001,1100) region <- sample(1:10,100,replace = T) Q21 <- sample(1:5,100,replace = T) Q22 <- sample(1:15,100,replace = T) Q24_LOC_1 <- sample(1:8,100,replace = T) Q24_LOC_2 <- sample(1

How to use pandas apply function on all columns of some rows of data frame

阅读更多关于 How to use pandas apply function on all columns of some rows of data frame

问题 I have a dataframe . I want to replace values of all columns of some rows to a default value. Is there a way to do this via pandas apply function Here is the dataframe import pandas as pd temp=pd.DataFrame({'a':[1,2,3,4,5,6],'b':[2,3,4,5,6,7],'c':['p','q','r','s','t','u']}) mylist=['p','t'] How to replace values in columns a and b to default value 0,where value of column c is in mylist Is there a way to do this using pandas functionality,avoiding for loops 回答1: Use isin to create a boolean

counting after and before change in value, within groups, generating new variables for each unique shift

阅读更多关于 counting after and before change in value, within groups, generating new variables for each unique shift

问题 I am working to count occurrences of unique values within my groups, id . I'm looking at TF . When TF changes I want to count both forward and backwards from that point. This counting should be stored in a new variable PM# , so that PM# holds both plus and minus to each unique shift in TF . From what I've gathered I need to use rle , but I am kinda stuck. I made this working example to illustrate my issue. I have this data df <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L

counting after and before change in value, within groups, generating new variables for each unique shift

阅读更多关于 counting after and before change in value, within groups, generating new variables for each unique shift

How to remove groups of observation with dplyr::filter()

阅读更多关于 How to remove groups of observation with dplyr::filter()

问题 For the following data ds <- read.table(header = TRUE, text =" id year attend 1 2007 1 1 2008 1 1 2009 1 1 2010 1 1 2011 1 8 2007 3 8 2008 NA 8 2009 3 8 2010 NA 8 2011 3 9 2007 2 9 2008 3 9 2009 3 9 2010 5 9 2011 5 10 2007 4 10 2008 4 10 2009 2 10 2010 NA 10 2011 NA ") ds<- ds %>% dplyr::mutate(time=year-2000) print(ds) How would I write a dplyr::filter() command to keep only the ids that don't have a single NA? So only subjects with ids 1 and 9 should stay after the filter. 回答1: Use filter

pandas merge on date column issue

阅读更多关于 pandas merge on date column issue

问题 I am trying to merge two dataframes on date column (tried both as type object or datetime.date , but fails to give desired merge output: import pandas as pd df1 = pd.DataFrame({'amt': {0: 1549367.9496070854, 1: 2175801.78219801, 2: 1915613.1629125737, 3: 1703063.8323954903, 4: 1770040.7987461537}, 'month': {0: '2015-02-01', 1: '2015-03-01', 2: '2015-04-01', 3: '2015-05-01', 4: '2015-06-01'}}) print(df1) amt month 0 1.549368e+06 2015-02-01 1 2.175802e+06 2015-03-01 2 1.915613e+06 2015-04-01 3

perl, removing elements from array in for loop

阅读更多关于 perl, removing elements from array in for loop

问题 will the following code always work in perl ? for loop iterating over @array { # do something if ($condition) { remove current element from @array } } Because I know in Java this results in some Exceptions, The above code is working for me for now, but I want to be sure that it will work for all cases in perl. Thanks 回答1: Well, it's said in the doc: If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So