traminer

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

北城余情 提交于 2019-12-10 11:27:22
问题 I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards. (The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the

Looping across 10 columns at a time in R

喜夏-厌秋 提交于 2019-12-10 10:45:53
问题 I have a dataframe with 1000 columns. I am trying to loop over 10 columns at a time and use the seqdef() function from the TraMineR package to do sequence alignment across the data in those columns. Hence, I want to apply this function to columns 1-10 in the first go-around, and columns 11-20 in the second go-around, all the way up to 1000. This is the code I am using. library(TraMineR) by(df[, 1:10], seqdef(df)) However, this only loops over the first 10 and then stops. How do I loop it

How to use discrepancy analysis with TraMineR and aggregated sequence data?

北城余情 提交于 2019-12-09 01:08:57
问题 As I have a big dataset and only limited computational ressources, I want to make use of aggregated sequence objects for a discrepancy analysis using the R packages TraMineR and WeightedCluster . But I struggle to find the right syntax for doing so. In the example code below you find two discrepancy analyses, the first tree diagramm of the discrepancy analysis uses the original dataset, the second uses aggregated data (that is only unique sequences weighted by their frequencies).

How to get several columns from BigQuery?

情到浓时终转凉″ 提交于 2019-12-08 05:13:18
问题 I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following. SELECT type, created_at, repository_name FROM [githubarchive:github.timeline] WHERE (created_at CONTAINS '2012-') AND repository_owner="twitter" ORDER BY created_at, repository_name; This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column. However,

How to get several columns from BigQuery?

≯℡__Kan透↙ 提交于 2019-12-07 03:13:19
I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following. SELECT type, created_at, repository_name FROM [githubarchive:github.timeline] WHERE (created_at CONTAINS '2012-') AND repository_owner="twitter" ORDER BY created_at, repository_name; This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column. However, what I really want is to have all the events ("type") in columns, one column for each repository (

How to get the largest possible column sequence with the least possible row NAs from a huge matrix?

自闭症网瘾萝莉.ら 提交于 2019-12-06 04:12:56
I want to select columns from a data frame so that the resulting continuous column-sequences are as long as possible, while the number of rows with NAs is as small as possible, because they have to be dropped afterwards. (The reason I want to do this is, that I want to run TraMineR::seqsubm() to automatically get a matrix of transition costs (by transition probability) and later run cluster::agnes() on it. TraMineR::seqsubm() doesn't like NA states and cluster::agnes() with NA states in the matrix doesn't necessarily make much sense.) For that purpose I already wrote a working function that

Is it possible to make a graph with pattern fills using TraMineR and R base graphs?

喜欢而已 提交于 2019-12-04 16:50:33
enter image description hereA common problem 1 2 in the publication of a sequence analysis or generally of graphs with many categorical states is that they are not easily transferable to b/w paper publications. There are some tools, like Colorbrewer , which can help to make a well informed decision on grey scale colors. Nonetheless, the results are unsatisfactory if the color palette exceeds 5 or more shades of greys. Thus, it would be really helpful to add pattern fills to certain graph areas in these cases (although this is not recommended by the famous Edward Tufte). Would it be possible to

An “asymmetric” pairwise distance matrix

删除回忆录丶 提交于 2019-12-04 06:00:53
Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric , indicating that the distance from a to b is equal to the distance from b to a. I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix. No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment. The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance measure the discrepancy among sequences,

Creating a sequence object from SPELL data

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-03 14:19:57
This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 6 years ago . Learn more . I am trying to create a sequence object with seqdef using SPELL format. Here is an example of my data: spell <- structure(list(ID = c(1, 3, 3, 4, 5, 5, 6, 8, 9, 10, 11, 11, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19), status = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1, 1, 3, 1, 3, 3, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 1), time1 = c(1, 1,

Definition of sequence notation…(A), (A>B), and (A) - (A>B)

余生颓废 提交于 2019-11-30 18:02:15
问题 Hopefully a quick one .... Regarding the output from seqefsub() operations, please point me to a definition of the output notation. To be more specific, the parentheses in e.g. (A) means what; the greater than sign in (A>B) means what; and the hyphen in (A)-(A>B) means what. Section 10 of the excellent User Guide has examples, but I may have missed an unambiguous definition statement somewhere. To quote the example in Section 10.2 of the guide, what is the conceptual difference between