data-manipulation

bash? - combining files into CSVs

南笙酒味 提交于 2019-12-23 04:53:36
问题 I know (see here) that you can use paste to combine multiple files into a .csv file if each file holds a column i.e.. paste -d "," column1.dat column2.dat column3.dat ... > myDat.csv will result in myDat.csv column1, column2, column3, ... c1-1, c2-1, c3-1, ... c1-2, c2-2, c3-2, ... ... ... ... (without the tabs. just inserted them to make it more readable) What if I have multiple measurements, instead? e.g. file1.dat has format <xvalue> <y1value> file2.dat has format <xvalue> <y2avlue> file3

Function to compute 3D gradient with unevenly spaced sample locations

本秂侑毒 提交于 2019-12-21 21:38:37
问题 I have experimental observations in a volume: import numpy as np # observations are not uniformly spaced x = np.random.normal(0, 1, 10) y = np.random.normal(5, 2, 10) z = np.random.normal(10, 3, 10) xx, yy, zz = np.meshgrid(x, y, z, indexing='ij') # fake temperatures at those coords tt = xx*2 + yy*2 + zz*2 # sample distances dx = np.diff(x) dy = np.diff(y) dz = np.diff(z) grad = np.gradient(tt, [dx, dy, dz]) # returns error This gives me the error: ValueError: operands could not be broadcast

R bootstrap statistics by group for big data

烂漫一生 提交于 2019-12-21 02:36:22
问题 I want to bootstrap a data set that has groups in it. A simple scenario would be bootstrapping simple means: data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]} boot(data, stat, R = 10) This gives me the error incorrect number of subscripts on matrix , because of by = "group" part. I managed to solve it using subsetting, but don't like this solution. Is there simpler way to make this

R data.table create new columns with standard names

☆樱花仙子☆ 提交于 2019-12-20 05:23:37
问题 I wanted to create new columns for my data.table based on ratio calculation. The names of my variables are slightly in a standard way so I think there must be a way to easily achieve this in data.table . However I am not able to get how to achieve this. Below is my sample data and code - set.seed(1200) ID <- seq(1001,1100) region <- sample(1:10,100,replace = T) Q21 <- sample(1:5,100,replace = T) Q22 <- sample(1:15,100,replace = T) Q24_LOC_1 <- sample(1:8,100,replace = T) Q24_LOC_2 <- sample(1

How to use pandas apply function on all columns of some rows of data frame

隐身守侯 提交于 2019-12-20 03:04:33
问题 I have a dataframe . I want to replace values of all columns of some rows to a default value. Is there a way to do this via pandas apply function Here is the dataframe import pandas as pd temp=pd.DataFrame({'a':[1,2,3,4,5,6],'b':[2,3,4,5,6,7],'c':['p','q','r','s','t','u']}) mylist=['p','t'] How to replace values in columns a and b to default value 0,where value of column c is in mylist Is there a way to do this using pandas functionality,avoiding for loops 回答1: Use isin to create a boolean

counting after and before change in value, within groups, generating new variables for each unique shift

南笙酒味 提交于 2019-12-20 02:13:36
问题 I am working to count occurrences of unique values within my groups, id . I'm looking at TF . When TF changes I want to count both forward and backwards from that point. This counting should be stored in a new variable PM# , so that PM# holds both plus and minus to each unique shift in TF . From what I've gathered I need to use rle , but I am kinda stuck. I made this working example to illustrate my issue. I have this data df <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L

counting after and before change in value, within groups, generating new variables for each unique shift

China☆狼群 提交于 2019-12-20 02:13:03
问题 I am working to count occurrences of unique values within my groups, id . I'm looking at TF . When TF changes I want to count both forward and backwards from that point. This counting should be stored in a new variable PM# , so that PM# holds both plus and minus to each unique shift in TF . From what I've gathered I need to use rle , but I am kinda stuck. I made this working example to illustrate my issue. I have this data df <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L

How to remove groups of observation with dplyr::filter()

别说谁变了你拦得住时间么 提交于 2019-12-18 12:19:02
问题 For the following data ds <- read.table(header = TRUE, text =" id year attend 1 2007 1 1 2008 1 1 2009 1 1 2010 1 1 2011 1 8 2007 3 8 2008 NA 8 2009 3 8 2010 NA 8 2011 3 9 2007 2 9 2008 3 9 2009 3 9 2010 5 9 2011 5 10 2007 4 10 2008 4 10 2009 2 10 2010 NA 10 2011 NA ") ds<- ds %>% dplyr::mutate(time=year-2000) print(ds) How would I write a dplyr::filter() command to keep only the ids that don't have a single NA? So only subjects with ids 1 and 9 should stay after the filter. 回答1: Use filter

pandas merge on date column issue

人盡茶涼 提交于 2019-12-18 08:59:22
问题 I am trying to merge two dataframes on date column (tried both as type object or datetime.date , but fails to give desired merge output: import pandas as pd df1 = pd.DataFrame({'amt': {0: 1549367.9496070854, 1: 2175801.78219801, 2: 1915613.1629125737, 3: 1703063.8323954903, 4: 1770040.7987461537}, 'month': {0: '2015-02-01', 1: '2015-03-01', 2: '2015-04-01', 3: '2015-05-01', 4: '2015-06-01'}}) print(df1) amt month 0 1.549368e+06 2015-02-01 1 2.175802e+06 2015-03-01 2 1.915613e+06 2015-04-01 3

perl, removing elements from array in for loop

ぃ、小莉子 提交于 2019-12-18 08:32:38
问题 will the following code always work in perl ? for loop iterating over @array { # do something if ($condition) { remove current element from @array } } Because I know in Java this results in some Exceptions, The above code is working for me for now, but I want to be sure that it will work for all cases in perl. Thanks 回答1: Well, it's said in the doc: If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So