reshape

Flatten list column in data frame with ID column

ぃ、小莉子 提交于 2019-12-19 12:29:13
问题 My data frame contains the output of a survey with a select multiple question type. Some cells have multiple values. df <- data.frame(a=1:3,b=I(list(1,1:2,1:3))) df a b 1 1 1 2 2 1, 2 3 3 1, 2, 3 I would like to flatten out the list to obtain the following output: df a b 1 1 1 2 2 1 3 2 2 4 3 1 5 3 2 6 3 3 should be easy but somehow I can't find the search terms. thanks. 回答1: You can just use unnest from "tidyr": library(tidyr) unnest(df, b) # a b # 1 1 1 # 2 2 1 # 3 2 2 # 4 3 1 # 5 3 2 # 6 3

Functions for creating and reshaping big data in R using the FF package

拈花ヽ惹草 提交于 2019-12-19 10:23:14
问题 I'm new to R and the FF package, and am trying to better understand how FF allows users to work with large datasets (>4Gb). I have spent a considerable amount of time trawling the web for tutorials, but the ones I could find generally go over my head. I learn best by doing, so as an exercise, I would like to know how to create a long-format time-series dataset, similar to R's in-built "Indometh" dataset, using arbitrary values. Then I would like to reshape it into wide format. Then I would

Convert unstructured csv file to a data frame

孤街醉人 提交于 2019-12-19 09:49:47
问题 I am learning R for text mining. I have a TV program schedule in form of CSV. The programs usually start at 06:00 AM and goes on until 05:00 AM the next day which is called a broadcast day. For example: the programs for 15/11/2015 start at 06:00 AM and ends at 05:00 AM the next day. Here is a sample code showing how the schedule looks like: read.table(textConnection("Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5

Python - Unnest cells in Pandas DataFrame

爱⌒轻易说出口 提交于 2019-12-19 06:30:11
问题 Suppose I have DataFrame df : a b c v f 3|4|5 v 2 6 v f 4|5 I'd like to produce this df : a b c v f 3 v f 4 v f 5 v 2 6 v f 4 v f 5 I know how to make this transformation in R, using tidyr package. Is there an easy way of doing this in pandas? 回答1: You could: import numpy as np df = df.set_index(['a', 'b']) df = df.astype(str) + '| ' # There's a space ' ' to match the replace later df = df.c.str.split('|', expand=True).stack().reset_index(-1, drop=True).replace(' ', np.nan).dropna().reset

How does numpy.swapaxes work?

风格不统一 提交于 2019-12-19 06:16:10
问题 I created a sample array: a = np.arange(18).reshape(9,2) On printing, I get this as output: [[ 0 1] [ 2 3] [ 4 5] [ 6 7] [ 8 9] [10 11] [12 13] [14 15] [16 17]] On executing this reshaping: b = a.reshape(2,3,3).swapaxes(0,2) I get: [[[ 0 9] [ 3 12] [ 6 15]] [[ 1 10] [ 4 13] [ 7 16]] [[ 2 11] [ 5 14] [ 8 17]]] I went through this question, but it does not solve my problem. Reshape an array in NumPy The documentation is not useful either. https://docs.scipy.org/doc/numpy/reference/generated

【tensorflow】static_rnn与dynamic_rnn的区别

断了今生、忘了曾经 提交于 2019-12-19 05:00:58
static_rnn和dynamic_rnn的区别主要在于实现不同。 static_rnn会把RNN展平,用空间换时间。 gpu会吃不消(个人测试结果) dynamic_rnn则是使用for或者while循环。 调用static_rnn实际上是生成了rnn按时间序列展开之后的图。打开tensorboard你会看到sequence_length个rnn_cell stack在一起,只不过这些cell是share weight的。因此,sequence_length就和图的拓扑结构绑定在了一起,因此也就限制了每个batch的sequence_length必须是一致。 调用dynamic_rnn不会将rnn展开,而是利用tf.while_loop这个api,通过Enter, Switch, Merge, LoopCondition, NextIteration等这些control flow的节点,生成一个可以执行循环的图(这个图应该还是静态图,因为图的拓扑结构在执行时是不会变化的)。在tensorboard上,你只会看到一个rnn_cell, 外面被一群control flow节点包围着。对于dynamic_rnn来说,sequence_length仅仅代表着循环的次数,而和图本身的拓扑没有关系,所以每个batch可以有不同sequence_length。 static_rnn 导包

split a Pandas series without a multiindex

五迷三道 提交于 2019-12-19 04:20:34
问题 I would like to take a Pandas Series with a single-level index and split on that index into a dataframe with multiple columns. For instance, for input: s = pd.Series(range(10,17), index=['a','a','b','b','c','c','c']) s a 10 a 11 b 12 b 13 c 14 c 15 c 16 dtype: int64 What I would like as an output is: a b c 0 10 12 14 1 11 13 15 2 NaN NaN 16 I cannot directly use the unstack command because it requires a multiindex and I only have a single-level index. I tried putting in a dummy index that all

Merge Multiple Data Frames by Row Names

只谈情不闲聊 提交于 2019-12-18 12:08:43
问题 I'm trying to merge multiple data frames by row names. I know how to do it with two: x = data.frame(a = c(1,2,3), row.names = letters[1:3]) y = data.frame(b = c(1,2,3), row.names = letters[1:3]) merge(x,y, by = "row.names") But when I try using the reshape package's merge_all() I'm getting an error. z = data.frame(c = c(1,2,3), row.names = letters[1:3]) l = list(x,y,z) merge_all(l, by = "row.names") Error in -ncol(df) : invalid argument to unary operator What's the best way to do this? 回答1:

melt to two variable columns

ε祈祈猫儿з 提交于 2019-12-18 10:27:33
问题 I have the following variables in a data frame: [1] "Type" "I.alt" "idx06" "idx07" "idx08" "farve1" "farve2" If I do: dm <- melt(d, id=c("Type","I.alt")) I get these variables: "Type" "I.alt" "variable" "value" Where "idx06", "idx07", "idx08", "farve1", "farve2" are represented in "variable". But what I really want is something like this: "Type" "I.alt" "variable" "value" "variable2" "value2" Where "farve1" and "farve2" are represented in variable2 and value2. The reason I want to do this, is

Convert R dataframe from long to wide format, but with unequal group sizes, for use with qcc

泪湿孤枕 提交于 2019-12-18 04:59:06
问题 I would like to convert a dataframe from long format to a wide format, but with unequal group sizes. The eventual use will be in 'qcc', which requires a data frame or a matrix with each row consisting of one group, using NA's in groups which have fewer samples. The following code will create an example dataset, as well as show manual conversion to the desired format. # This is an example of the initial data that I have # * 10 sample measurements, over 3 groups with 3, 2, and 5 elements