information-theory

scipy.stats attribute `entropy` for continuous distributions doesn't work manually

≡放荡痞女 提交于 2021-01-28 12:15:25
问题 Each continuous distribution in scipy.stats comes with an attribute that calculates its differential entropy: .entropy . Unlike the normal distribution ( norm ) and others that have a closed-form solution for entropy, other distributions have to rely on numerical integration. Trying to find out which function the .entropy attribute is calling in those cases, I found a function called _entropy in scipy.stats._distn_infrastructure.py that does so with integrate.quad(pdf) (numerical integration)

What's the most that GZIP or DEFLATE can increase a file size?

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-02 03:01:57
问题 It's well known that GZIP or DEFLATE (or any compression mechanism) can increase file size sometimes. Is there a maximum (either percentage or constant) that a file can be increased? What is it? If a file is X bytes, and I'm going to gzip it, and I need to budget for file space in advance - what's the worst case scenario? UPDATE: There are two overheads: GZIP adds a header, typically 18 bytes but essentially arbitrarily long. What about DEFLATE? That can expand content by a multiplicative

Optimal way to compute pairwise mutual information using numpy

给你一囗甜甜゛ 提交于 2019-12-28 07:42:11
问题 For an m x n matrix, what's the optimal (fastest) way to compute the mutual information for all pairs of columns ( n x n )? By mutual information, I mean: I(X, Y) = H(X) + H(Y) - H(X,Y) where H(X) refers to the Shannon entropy of X . Currently I'm using np.histogram2d and np.histogram to calculate the joint (X,Y) and individual (X or Y) counts. For a given matrix A (e.g. a 250000 X 1000 matrix of floats), I am doing a nested for loop, n = A.shape[1] for ix = arange(n) for jx = arange(ix+1,n):

Finding conditional mutual information from 3 discrete variable

走远了吗. 提交于 2019-12-25 03:57:22
问题 I am trying to find conditional mutual information between three discrete random variable using pyitlib package for python with the help of the formula: I(X;Y|Z)=H(X|Z)+H(Y|Z)-H(X,Y|Z) The expected Conditional Mutual information value is= 0.011 My 1st code: import numpy as np from pyitlib import discrete_random_variable as drv X=[0,1,1,0,1,0,1,0,0,1,0,0] Y=[0,1,1,0,0,0,1,0,0,1,1,0] Z=[1,0,0,1,1,0,0,1,1,0,0,1] a=drv.entropy_conditional(X,Z) ##print(a) b=drv.entropy_conditional(Y,Z) ##print(b)

how to differentiate two very long strings in c++?

北城余情 提交于 2019-12-23 05:00:22
问题 I would like to solve Levenshtein_distance this problem where length of string is too huge . Edit2 : As Bobah said that title is miss leading , so i had updated the title of questoin . Initial title was how to declare 100000x100000 2-d integer in c++ ? Content was There is any way to declare int x[100000][100000] in c++. When i declare it globally then compiler produces error: size of array ‘x’ is too large . One method could be using map< pair< int , int > , int > mymap . But allocating and

Theory: Compression algorithm that makes some files smaller but none bigger?

瘦欲@ 提交于 2019-12-22 08:00:11
问题 I came across this question; "A lossless compression algorithm claims to guarantee to make some files smaller and no files larger. Is this; a) Impossible b) Possible but may run for an indeterminate amount of time, c) Possible for compression factor 2 or less, d) Possible for any compression factor?" I'm leaning towards (a), but couldn't give a solid explanation as to why. (I'll list the thoughts a friend and I came up with as a possible answer) 回答1: By the pigeon-hole principle, given a