data-analysis

Python - take out the data inside cell of dataframe to another cells

家住魔仙堡 提交于 2020-05-31 04:03:54
问题 This is the data in single cell of dataframe with 14 columns. Cell is the element of column. There are 45k+ this kind of cells, to do it manually is a hell. one cell data I'd like to do with this cell 3 things: move text part with address, state, zip - to another column; delete the hooks () of cell; separate for 2 columns longitude and latitude. How it's possible to do? 回答1: Here's a simple, working example with 2 data points: text1 = """30881 EKLUTNA LAKE RD CHUGIAK, AK 99567 (61.4478, -149

zeroinflatedpoisson model in python

a 夏天 提交于 2020-05-30 08:02:53
问题 I want to use python3 to build a zeroinflatedpoisson model. I found in library statsmodel the function statsmodels.discrete.count_model.ZeroInflatePoisson . I just wonder how to use it. It seems I should do: ZIFP(Y_train,X_train).fit() . But when I wanted to do prediction using X_test . It told me the length of X_test doesn't fit X_train . Or is there another package to fit this model? Here is the code I used: X1 = [random.randint(0,1) for i in range(200)] X2 = [random.randint(1,2) for i in

Histogram fitting with python

試著忘記壹切 提交于 2020-05-13 19:21:32
问题 I've been surfing but haven't found the correct method to do the following. I have a histogram done with matplotlib: hist, bins, patches = plt.hist(distance, bins=100, normed='True') From the plot, I can see that the distribution is more or less an exponential (Poisson distribution). How can I do the best fitting , taking into account my hist and bins arrays? UPDATE I am using the following approach: x = np.float64(bins) # Had some troubles with data types float128 and float64 hist = np

Python Pandas - Concat two data frames with different number of rows and columns

前提是你 提交于 2020-04-30 09:10:28
问题 I have two data frames with different row numbers and columns. Both tables has few common columns including "Customer ID". Both tables look like this with a size of 11697 rows × 15 columns and 385839 rows × 6 columns respectively. Customer ID might be repeating in second table. I want to concat both of the tables and want to merge similar columns using Customer ID. How can I do that with python PANDAS. One table looks like this - and the other one looks like this - I am using below code - pd

Need help pulling JSON data with RSocrata from a website API

偶尔善良 提交于 2020-04-18 05:49:23
问题 I need help drafting code that pulls public data directly from a website that is in Socrata format. Here is a link: https://data.cityofchicago.org/Administration-Finance/Current-Employee-Names-Salaries-and-Position-Title/xzkq-xp2w There is an API endpoint: https://data.cityofchicago.org/resource/xzkq-xp2w.json After the data is uploaded, null values in the "Annual Salary" should be replaced with 50000. 回答1: We can use the RSocrata package library(RSocrata) url <- "https://data.cityofchicago

Masking Using Pixel Statistics

£可爱£侵袭症+ 提交于 2020-03-23 08:55:33
问题 I'm trying to mask bad pixels in a dataset taken from a detector. In my attempt to come up with a general way to do this so I can run the same code across different images, I tried a few different methods, but none of them ended up working. I'm pretty new with coding and data analysis in Python, so I could use a hand putting things in terms that the computer will understand. As an example, consider the matrix A = np.array([[3,5,50],[30,2,6],[25,1,1]]) What I'm wanting to do is set any element

How do I visualize n-dimensional features?

感情迁移 提交于 2020-02-05 07:58:12
问题 I have two matrices A and B . The size of A is 200*1000 double (here: 1000 represents 1000 different features). Matrix A belongs to group 1, where I use ones(200,1) as the label vector. The size of B is also 200*1000 double (here: 1000 also represents 1000 different features). Matrix B belongs to group 2, where I use -1*ones(200,1) as the label vector. My question is how do I visualize matrices A and B so that I can clearly distinguish them based on the given groups? 回答1: I'm assuming each

Calculate the entropy of a list of 2D points in Matlab

半城伤御伤魂 提交于 2020-01-30 11:28:05
问题 I have a list of points in an array like this points = [[1,2];[2,5];[7,1]...[x,y]] The x is between 0 and 1020 and y is between 0 and 1920. How can I calculate the entropy of the points array in Matlab? Many thanks! 回答1: I assume you want to consider each [x,y] point as one data point. Let us define some exemplary data: A = [[1,2];[2,5];[7,1];[1,2]]; First we give equal points equal identifiers, we can do this using [~,~,ic] = unique(A, 'rows'); Then we compute the frequency and with that the

Apply function to all items in a list Python

一个人想着一个人 提交于 2020-01-30 09:22:04
问题 I am trying to apply a function to a list. The function takes a value and produces another. for example: myCoolFunction(75) would produce a new value So far I am using this: x = 0 newValues = [] for value in my_list: x = x + 1 newValues.append(myCoolFunction(value)) print(x) I am working with around 125,000 values and the speed at which this is operating does not seem very efficient. Is there a more pythonic way to apply the function to the values? 回答1: You can use map approach: list(map

pandas mean calculation over a column in a csv

落爺英雄遲暮 提交于 2020-01-25 00:34:11
问题 I have some data in a csv file as show below(only partial data is shown here). SourceID local_date local_time Vge BSs PC hour Type 7208 8/01/2015 11:00:19 15.4 87 +BC_MSG 11 MAIN 11060 8/01/2015 11:01:56 14.9 67 +AB_MSG 11 MAIN 3737 8/01/2015 11:02:09 15.4 88 +AB_MSG 11 MAIN 9683 8/01/2015 11:07:19 14.9 69 +AB_MSG 11 MAIN 9276 8/01/2015 11:07:52 15.4 88 +AB_MSG 11 MAIN 7754 8/01/2015 11:09:26 14.7 62 +AF_MSG 11 MAIN 11111 8/01/2015 11:10:06 15.2 80 +AF_MSG 11 MAIN 9276 8/01/2015 11:10:52 15.4