bayesian

Large scale naïve Bayes classifier with top-k output

你说的曾经没有我的故事 提交于 2019-12-06 05:50:41
I need a library for naïve Bayes large scale, with millions of training examples and +100k binary features. It must be an online version (updatable after training). I also need top-k output, that is multiple classifications for a single instance. Accuracy is not very important. The purpose is an automatic text categorization application. Any suggestions for a good library is very appreciated. EDIT: The library should preferably be in Java. If a learning algorithm other than naïve Bayes is also acceptable, then check out Vowpal Wabbit (C++), which has the reputation of being one of the best

What is a relatively simple way to determine the probability that a sentence is in English?

只谈情不闲聊 提交于 2019-12-05 23:23:07
问题 I have a number of strings (collections of characters) that represent sentences in different languages, say: Hello, my name is George. Das brot ist gut. ... etc. I want to assign each of them scores (from 0 .. 1) indicating the likelihood that they are English sentences. Is there an accepted algorithm (or Python library) from which to do this? Note: I don't care if the grammar of the English sentence is perfect. 回答1: A bayesian classifier would be a good choice for this task: >>> from

Bayesian Correlation with PyMC3

一笑奈何 提交于 2019-12-05 21:45:04
I'm trying to convert this example of Bayesian correlation for PyMC2 to PyMC3, but get completely different results. Most importantly, the mean of the multivariate Normal distribution quickly goes to zero, whereas it should be around 400 (as it is for PyMC2). Consequently, the estimated correlation quickly goes towards 1, which is wrong as well. The full code is available in this notebook for PyMC2 and in this notebook for PyMC3 . The relevant code for PyMC2 is def analyze(data): # priors might be adapted here to be less flat mu = pymc.Normal('mu', 0, 0.000001, size=2) sigma = pymc.Uniform(

Ranking Contest Results of Images with 5-Star Ratings

你。 提交于 2019-12-05 21:23:39
I run a calendar photo contest that uses a 5-star rating system which ranks the images according to their average rating. However, I would like to factor in the total number of votes a photo receives to get a more accurate ranking. For example, I do not want an image with 1 5-star vote (Avg rating: 5) getting ranked above an image with 10 5-star votes and 1 4-star vote (Avg rating: 4.9). I know this topic has been raised before, but I can't seem to find a straightforward answer to apply to my particular situation. The Evan Miller site goes way over my head… I'm just looking for a simple

Bayesian Linear Regression with PyMC3 and a large dataset - bracket nesting level exceeded maximum and slow performance

空扰寡人 提交于 2019-12-05 21:14:53
I would like to use a Bayesian multivariate linear regression to estimate the strength of players in team sports (e.g. ice hockey, basketball or soccer). For that purpose, I create a matrix, X, containing the players as columns and the matches as rows. For each match the player entry is either 1 (player plays in the home team), -1 (player plays in the away team) or 0 (player does not take part in this game). The dependent variable Y is defined as the scoring differences for both teams in each match (Score_home_team - Score_away_team). Thus, the number of parameters will be quite large for one

Looking for open source naive Bayesian Classifier in C# for a Twitter sentiment analysis project [closed]

落爺英雄遲暮 提交于 2019-12-05 15:44:33
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I've found a similar project here: Sentiment analysis for Twitter in Python . However, I'm working on C# and need to use a naive

Bayesian Probabilistic Matrix Factorization (BPMF) with PyMC3: PositiveDefiniteError using `NUTS`

我与影子孤独终老i 提交于 2019-12-05 13:00:50
This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . I've implemented the Bayesian Probabilistic Matrix Factorization algorithm using pymc3 in Python. I also implemented it's precursor, Probabilistic Matrix Factorization (PMF). See my previous question for a reference to the data used here. I'm having trouble drawing MCMC samples using the NUTS sampler. I initialize the model parameters using the MAP from PMF, and the hyperparameters using Gaussian random draws sprinkled around 0. However, I get a PositiveDefiniteError when

Highest Density Interval (HDI) for Posterior Distribution Pystan

拈花ヽ惹草 提交于 2019-12-05 07:08:55
问题 I am seeing that in Pystan, an HDI function can be used to provide a 95% credible interval surrounding the posterior distribution. However, they say it will only work for unimodal distributions. If my model may have a multimodal distribution (up to 4 peaks), is there a way I can find the HDI in Pystan? Thanks! 回答1: I wouldn't consider this a Stan/PyStan-specific issue. The Highest Density Interval is by definition a single interval and therefore inappropriate for characterizing multimodal

PYMC3 Seasonal Variables

前提是你 提交于 2019-12-05 06:04:20
I'm relatively new to PYMC3 and I'm trying to implement a Bayesian Structure Time Series (BSTS) without regressors, for instance the model fit here in R. The model is as follows: I can implement the local linear trend using a GaussianRandomWalk as follows: delta = pymc3.GaussianRandomWalk('delta',mu=0,sd=1,shape=99) mu = pymc3.GaussianRandomWalk('mu',mu=delta,sd=1,shape=100) However, I'm at a loss for how to encode the seasonal variable (tau) in PYMC3. Do I need to roll a custom random walk class or is there some other trick? You can use w = pm.Normal('w', sd=sigma_tau, shape=S) tau = w - tt

Toy R code on Bayesian inference for mean of a normal distribution [data of snowfall amount]

余生长醉 提交于 2019-12-05 02:56:02
问题 I have a number of snowfall observations: x <- c(98.044, 107.696, 146.050, 102.870, 131.318, 170.434, 84.836, 154.686, 162.814, 101.854, 103.378, 16.256) and I was told that it follows normal distribution with known standard deviation at 25.4 but unknown mean mu . I have to make inference on mu using Bayesian Formula. This is information on prior of mu mean of snow | 50.8 | 76.2 | 101.6 | 127.0 | 152.4 | 177.8 --------------------------------------------------------------- probability | 0.1 |