data-analysis

Processing a very very big data set in python - memory error

只愿长相守 提交于 2019-12-30 06:38:05
问题 I'm trying to process data obtained from a csv file using csv module in python. there are about 50 columns & 401125 rows in this. I used the following code chunk to put that data into a list csv_file_object = csv.reader(open(r'some_path\Train.csv','rb')) header = csv_file_object.next() data = [] for row in csv_file_object: data.append(row) I can get length of this list using len(data) & it returns 401125. I can even get each individual record by calling list indices. But when I try to get the

How do I encircle different data sets in scatter plot? [closed]

梦想的初衷 提交于 2019-12-29 07:16:58
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago . How do I encircle different data sets in scatter plot? What I'm looking for is something like this: Also, how do I thereafter fill in the circle with a (shaded) colour? 回答1: You may get the path that incoporates all points via a convex hull scipy.spatial.ConvexHull . import matplotlib.pyplot as plt

How do I encircle different data sets in scatter plot? [closed]

戏子无情 提交于 2019-12-29 07:16:12
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago . How do I encircle different data sets in scatter plot? What I'm looking for is something like this: Also, how do I thereafter fill in the circle with a (shaded) colour? 回答1: You may get the path that incoporates all points via a convex hull scipy.spatial.ConvexHull . import matplotlib.pyplot as plt

Dropping cell if it is NaN in a Dataframe in python

我怕爱的太早我们不能终老 提交于 2019-12-25 18:44:28
问题 I have a dataframe like this. Project 4 Project1 Project2 Project3 0 NaN laptio AB NaN 1 NaN windows ten NaN 0 one NaN NaN 1 two NaN NaN I want to delete NaN values from Project 4 column My desired output should be, df, Project 4 Project1 Project2 Project3 0 one laptio AB NaN 1 two windows ten NaN 0 NaN NaN NaN 1 NaN NaN 回答1: If your data frame's index is just standard 0 to n ordered integers, you can pop the Project4 column to a series, drop the NaN values, reset the index, and then merge it

CParserError: Error tokenizing data

家住魔仙堡 提交于 2019-12-25 07:18:04
问题 I'm having some trouble reading a csv file import pandas as pd df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2) I get pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5 and when I add sep=None to df I get another error Error: line contains NULL byte I tried adding unicode='utf-8' , I even tried CSV reader and nothing works with this file the csv file is totally fine, I checked it and i see nothing wrong with it Here are the errors I get:

How to plot selected input from selectInput() function in shiny?

半腔热情 提交于 2019-12-25 03:29:14
问题 I am using the following two Input functions code in shiny: selectInput("categoryVisu", label="SELECT CATEGORY", choices = list("Full" = "full", "Fact" = "fact", "Fact Positive" = "factpos", selected = "full", multiple = TRUE) and selectInput("investerVisu", label="SELECT INVESTOR", choices = list("Informed" = "inf", "Noise" = "noise"), selected = "inf", multiple = TRUE) My task is now, if the user select for example "Full" and "Informed" than my code should take the column "InformedFull"

After generating dummy variables?

无人久伴 提交于 2019-12-25 02:08:51
问题 I am trying to change the category variables into dummy variables. "season","holiday","workingday","weather","temp","atemp","humidity","windspeed", "registered","count","hour","dow" are all variables. Here is my code: #dummy library(dummies) #set up new dummy variables data.new = data.frame(data) data.new = cbind(data.new,dummy(data.new$season, sep = "_")) data.new = cbind(data.new,dummy(data.new$holiday, sep = "_")) data.new = cbind(data.new,dummy(data.new$weather, sep = "_")) data.new =

Making new variables for every group of observation in R

橙三吉。 提交于 2019-12-25 00:43:20
问题 I have 11 variables in my dataframe. The first is unique identifier of observation (a plane). The second one is a number from 1 to 21 representing flight of a given plane. The rest of the variables are time, velocity, distance, etc. What I want to do is make new variables for every group (number) of flight e.g. time_1 , time_2 ,..., velocity_1 , velocity_2 , etc. and consequently, reduce the number of observations (the repeating ones). I don't really have idea how to start. I was thinking

My objective is to predict the next 3 events of each id_num based on their previous events

限于喜欢 提交于 2019-12-25 00:34:20
问题 I am new to data science and I am working on a model that kind of looks like the sample data shown below. However in the orginal data there are many id_num and Events . My objective is to predict the next 3 events of each id_num based on their previous Events . Please help me in solving this or regarding the method to be used for solving, using R programming. 回答1: The simplest "prediction" is to assume that the sequence of letters will repeat for each id_num . I hope this is in line what the

Python: Read and write the file of complex and reapeating format

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-24 22:32:29
问题 To begin with, sorry for poor Engish. I have a file with repeating format. Such as 326 Iteration: 0 #Bonds: 10 1 6 7 14 54 70 77 0 0 0 0 0 1 0.693 0.632 0.847 0.750 0.644 0.000 0.000 0.000 0.000 0.000 3.566 0.000 0.028 2 6 3 6 15 55 0 0 0 0 0 0 1 0.925 0.920 0.909 0.892 0.000 0.000 0.000 0.000 0.000 0.000 3.645 0.000 -0.040 3 6 2 8 10 52 0 0 0 0 0 0 1 0.925 0.910 0.920 0.898 0.000 0.000 0.000 0.000 0.000 0.000 3.653 0.000 0.000 ... 324 8 323 0 0 0 0 0 0 0 0 0 100 0.871 0.000 0.000 0.000 0.000