data-analysis

Difference of values that belong to same group but stored in two rows

别来无恙 提交于 2020-01-24 18:50:00
问题 I have a problem where i need to fetch 2 specific records with 2 different value and find the difference between their amount. This needs to be done for each device. Lets take the following table as example DevID reason amount DateTime -------------------------------------------------- 99 5 84 18-12-2016 18:10 99 0 35 18-12-2016 18:11 99 0 80 18-12-2016 18:12 99 0 34 18-12-2016 18:15 23 5 36 18-12-2016 18:16 23 4 22 18-12-2016 18:17 23 1 22 18-12-2016 18:18 23 2 22 18-12-2016 18:19 99 2 11 18

Difference of values that belong to same group but stored in two rows

╄→гoц情女王★ 提交于 2020-01-24 18:49:36
问题 I have a problem where i need to fetch 2 specific records with 2 different value and find the difference between their amount. This needs to be done for each device. Lets take the following table as example DevID reason amount DateTime -------------------------------------------------- 99 5 84 18-12-2016 18:10 99 0 35 18-12-2016 18:11 99 0 80 18-12-2016 18:12 99 0 34 18-12-2016 18:15 23 5 36 18-12-2016 18:16 23 4 22 18-12-2016 18:17 23 1 22 18-12-2016 18:18 23 2 22 18-12-2016 18:19 99 2 11 18

Violin plot for positive values with python

对着背影说爱祢 提交于 2020-01-24 12:29:06
问题 I find violin plots very informative and useful, I use python library 'seaborn'. However, when applied to positive values, they nearly always show negative values at the lower end. I find this really misleading, especially when working with real-life datasets. In the official documentation of seaborn https://seaborn.pydata.org/generated/seaborn.violinplot.html one can see examples with "total_bill" and "tip" which can not be negative. The violin plots show negative values, however. For

How to extract non NA values in a list or dict from a pandas dataframe

风流意气都作罢 提交于 2020-01-24 10:15:17
问题 I have a df like this, df, AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4,'CCC' : [True,False] * 2}) and df.where(df_mask) is AAA BBB CCC 0 4 NaN 100.0 1 5 NaN NaN 2 6 NaN -30.0 3 7 NaN NaN I am trying to extract the non null values like this. I tried, df[df.where(df_mask).notnull()].to_dict() but it gives all the values My expected output is, {'AAA': {0: 4, 1: 5, 2: 6, 3: 7}, 'CCC': {0: 100.0, 2: -30.0}} 回答1: Let's use

How to extract non NA values in a list or dict from a pandas dataframe

淺唱寂寞╮ 提交于 2020-01-24 10:14:52
问题 I have a df like this, df, AAA BBB CCC 0 4 10 100 1 5 20 50 2 6 30 -30 3 7 40 -50 df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4,'CCC' : [True,False] * 2}) and df.where(df_mask) is AAA BBB CCC 0 4 NaN 100.0 1 5 NaN NaN 2 6 NaN -30.0 3 7 NaN NaN I am trying to extract the non null values like this. I tried, df[df.where(df_mask).notnull()].to_dict() but it gives all the values My expected output is, {'AAA': {0: 4, 1: 5, 2: 6, 3: 7}, 'CCC': {0: 100.0, 2: -30.0}} 回答1: Let's use

How to groupby count across multiple columns in pandas

筅森魡賤 提交于 2020-01-23 12:08:14
问题 I have the following sample dataframe in Python pandas: +---+------+------+------+ | | col1 | col2 | col3 | +---+------+------+------+ | 0 | a | d | b | +---+------+------+------+ | 1 | a | c | b | +---+------+------+------+ | 2 | c | b | c | +---+------+------+------+ | 3 | b | b | c | +---+------+------+------+ | 4 | a | a | d | +---+------+------+------+ I would like to perform a count of all the 'a,' 'b,' 'c,' and 'd' values across columns 1-3 so that I would end up with a dataframe like

Random Number Generation to Memory from a Distribution using VBA

孤者浪人 提交于 2020-01-23 01:25:26
问题 I want to generate random numbers from a selected distribution in VBA (Excel 2007). I'm currently using the Analysis Toolpak with the following code: Application.Run "ATPVBAEN.XLAM!Random", "", A, B, C, D, E, F Where A = how many variables that are to be randomly generated B = number of random numbers generated per variable C = number corresponding to a distribution 1= Uniform 2= Normal 3= Bernoulli 4= Binomial 5= Poisson 6= Patterned 7= Discrete D = random number seed E = parameter of

Real-time peak detection in noisy sinusoidal time-series

。_饼干妹妹 提交于 2020-01-22 10:27:25
问题 I have been attempting to detect peaks in sinusoidal time-series data in real time , however I've had no success thus far. I cannot seem to find a real-time algorithm that works to detect peaks in sinusoidal signals with a reasonable level of accuracy. I either get no peaks detected, or I get a zillion points along the sine wave being detected as peaks. What is a good real-time algorithm for input signals that resemble a sine wave, and may contain some random noise? As a simple test case,

SQL: count all records with consecutive occurrence of same value for each device set and return the highest count

为君一笑 提交于 2020-01-21 10:24:20
问题 I want to find out how many times a particular value occured consecutively for a particular partition and then display the higher count for that partition. For Example if below is the table: Device ID speed DateTime -------------------------------------------------- 07777778999 34 18-12-2016 17:15 07777778123 15 18-12-2016 18:10 07777778999 34 19-12-2016 19:30 07777778999 34 19-12-2016 12:15 07777778999 20 19-12-2016 13:15 07777778999 20 20-12-2016 11:15 07777778123 15 20-12-2016 9:15

One of the omegaSem function arguments is an object not found

十年热恋 提交于 2020-01-16 09:09:34
问题 I am reformulating this question because it seems that the information that I provided before is not that clear. I'm new using R so I'm not able yet to identify the most common and simple errors. So I'm following a tutorial to perform a McDonald Omega analysis to estimate reliability on a psychometric test, here is the link just to make you sure about source of the information that I'm using: http://personality-project.org/r/psych/HowTo/omega.pdf To run that analyisis, we work with the "psych