data-science

Python Pandas - Concat two data frames with different number of rows and columns

前提是你 提交于 2020-04-30 09:10:28
问题 I have two data frames with different row numbers and columns. Both tables has few common columns including "Customer ID". Both tables look like this with a size of 11697 rows × 15 columns and 385839 rows × 6 columns respectively. Customer ID might be repeating in second table. I want to concat both of the tables and want to merge similar columns using Customer ID. How can I do that with python PANDAS. One table looks like this - and the other one looks like this - I am using below code - pd

Converting string to date in numpy unpack

喜夏-厌秋 提交于 2020-04-17 20:27:40
问题 I'm learning how to extract data from links and then proceeding to graph them. For this tutorial, I was using the yahoo dataset of a stock. The code is as follows import matplotlib.pyplot as plt import numpy as np import urllib import matplotlib.dates as mdates import datetime def bytespdate2num(fmt, encoding='utf-8'): strconverter = mdates.strpdate2num(fmt) def bytesconverter(b): s = b.decode(encoding) return strconverter(s) return bytesconverter def graph_data(stock): stock_price_url =

Insert list in pandas dataframe cell

a 夏天 提交于 2020-04-16 03:27:22
问题 I have a dictionary where each key has a list of values. Length of the list associated with each key is different. I want to convert the dictionary into a pandas dataframe with two columns 'Key' and 'Values'. Each row having one dictionary key in the 'Key' column and the list of values associated with it in 'Values' column. The dataframe will look as follows: mapping_dict = {'A':['a', 'b', 'c', 'd'], 'B':['aa', 'bb', 'cc']} df = Key Value 0 A ['a', 'b', 'c', 'd'] 1 B ['aa', 'bb', 'cc'] I

Difference between Standard scaler and MinMaxScaler

走远了吗. 提交于 2020-03-18 05:13:35
问题 What is the difference between MinMaxScaler and standard scaler. MMS= MinMaxScaler(feature_range = (0, 1)) ( Used in Program1) sc = StandardScaler() ( In another program they used Standard scaler and not minMaxScaler) 回答1: From ScikitLearn site: StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean and standard deviation which shrink the range of the feature values as shown in the left figure below. Note

Difference between Standard scaler and MinMaxScaler

妖精的绣舞 提交于 2020-03-18 05:13:07
问题 What is the difference between MinMaxScaler and standard scaler. MMS= MinMaxScaler(feature_range = (0, 1)) ( Used in Program1) sc = StandardScaler() ( In another program they used Standard scaler and not minMaxScaler) 回答1: From ScikitLearn site: StandardScaler removes the mean and scales the data to unit variance. However, the outliers have an influence when computing the empirical mean and standard deviation which shrink the range of the feature values as shown in the left figure below. Note

Cannot import name 'CRS' from 'pyproj' for using the osmnx library

谁都会走 提交于 2020-02-25 03:42:08
问题 I have used a fresh anaconda install to download and install all the required modules for osnmx library but I got the following error: 回答1: I have had the same issue and turned out that it did not like the latest release of osmnx (0.11.3). It could be that that version is unstable as its new (9th January 2020). I have sort out the issue by uninstalling the osmnx 0.11.3 conda uninstall osmnx and forcing to install the osmnx 0.11 version pip install osmnx==0.11 来源: https://stackoverflow.com

Not getting better results after using gridsearchCV(), rather getting better manually

拈花ヽ惹草 提交于 2020-02-25 02:08:27
问题 I was trying to learn working of gridsearchCV, by testing it on Knearistneighbors. When I assigned n_neighbors = 9 my classifier gave a score of 0.9122807017543859 but when I used gridsearchCV while giving it n_neighbors = 9, in the list,I get the score of 0.8947368421052632. What could possibly be the reason? Any effort is appreciated. Here's my code from sklearn import datasets import pandas as pd import numpy as np from sklearn.model_selection import train_test_split as splitter import

Difference pandas.DateTimeIndex without a frequency

冷暖自知 提交于 2020-02-23 09:25:09
问题 An irregular time series data is stored in a pandas.DataFrame . A DatetimeIndex has been set. I need the time difference between consecutive entries in the index. I thought it would be as simple as data.index.diff() but got AttributeError: 'DatetimeIndex' object has no attribute 'diff' I tried data.index - data.index.shift(1) but got ValueError: Cannot shift with no freq I do not want to infer or enforce a frequency first before doing this operation. There are large gaps in the time series

Difference pandas.DateTimeIndex without a frequency

坚强是说给别人听的谎言 提交于 2020-02-23 09:24:26
问题 An irregular time series data is stored in a pandas.DataFrame . A DatetimeIndex has been set. I need the time difference between consecutive entries in the index. I thought it would be as simple as data.index.diff() but got AttributeError: 'DatetimeIndex' object has no attribute 'diff' I tried data.index - data.index.shift(1) but got ValueError: Cannot shift with no freq I do not want to infer or enforce a frequency first before doing this operation. There are large gaps in the time series

How to get the unique pairs from the given data frame column with file handling?

谁都会走 提交于 2020-02-16 06:32:50
问题 sample data from dataframe: Pairs (8, 8), (8, 8), (8, 8), (8, 8), (8, 8) (6, 7), (7, 7), (7, 7), (7, 6), (6, 7) (2, 12), (12, 3), (3, 4), (4, 12), (12, 12) ``` new_col = [] for e in content.Pairs: new_col.append(list(dict.fromkeys(e))) content['Unique'] = new_col ``` output expected is unique pairs from Pair column like this: (8, 8),(6, 7),(7, 6),(7, 7),(2, 12) so on what I am getting is this result when trying the above code: Unique ['8', ''] ['6', '7', ''] ['2', '12', '3', '4', ''] what is