data-science

ExponentialSmoothing - What prediction method to use for this date plot?

久未见 提交于 2021-02-10 05:55:20
问题 I currently have these data points of date vs cumulative sum. I want to predict the cumulative sum for future dates using python. What prediction method should I use? My dates series are in this format: ['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]' I tried spline but seems like spline can't handle date-time series I tried Exponential Smoothing for time series forecasting but the result is incorrect. I don't understand what predict

ExponentialSmoothing - What prediction method to use for this date plot?

孤街浪徒 提交于 2021-02-10 05:54:30
问题 I currently have these data points of date vs cumulative sum. I want to predict the cumulative sum for future dates using python. What prediction method should I use? My dates series are in this format: ['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]' I tried spline but seems like spline can't handle date-time series I tried Exponential Smoothing for time series forecasting but the result is incorrect. I don't understand what predict

How to avoid out of memory python?

半世苍凉 提交于 2021-02-10 05:33:08
问题 I'm new to python and ubuntu. i got killed after running python code. The file I'm using for the code is around 2.7 GB and I have 16 GB RAM with one tera hard ... what should I do to avoid this problem because I'm searching and found it seems to be out of memory problem I used this command free -mh I got total used free shared buff/cache available Mem: 15G 2.5G 9.7G 148M 3.3G 12G Swap: 4.0G 2.0G 2.0G the code link I tried Link import numpy as np import matplotlib.pyplot as plt class

Subsample size in scikit-learn RandomForestClassifier

走远了吗. 提交于 2021-02-09 08:21:11
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

三世轮回 提交于 2021-02-09 08:20:55
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

半世苍凉 提交于 2021-02-09 08:19:04
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Subsample size in scikit-learn RandomForestClassifier

。_饼干妹妹 提交于 2021-02-09 08:19:03
问题 How is it possible to control the size of the subsample used for the training of each tree in the forest? According to the documentation of scikit-learn: A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default

Convert Scala code to Pyspark :Word2Vec Scala Tranform Routine

泄露秘密 提交于 2021-02-08 10:01:31
问题 I want to translate following routine from class [Word2VecModel]https://github.com/apache/spark/blob/branch-2.3/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala into pyspark. override def transform(dataset: Dataset[_]): DataFrame = { transformSchema(dataset.schema, logging = true) val vectors = wordVectors.getVectors .mapValues(vv => Vectors.dense(vv.map(_.toDouble))) .map(identity) // mapValues doesn't return a serializable map (SI-7005) val bVectors = dataset.sparkSession

How to ensure neural net performance comparability?

﹥>﹥吖頭↗ 提交于 2021-02-08 07:23:18
问题 For my thesis i am trying to evaluate the impact of different parameters on my active learning object detector with tensorflow (v 1.14). Therefore i am using the faster_rcnn_inception_v2_coco standard config from the model zoo and a fixed random.seed(1). To make sure i have a working baseline experiment i tried to run the object detector two times with the same dataset, learning time, poolingsize and so forth. Anyhow the two plotted graphs after 20 active learning cycles are quite different

How to filter out positional data based on distance from a known reference trajectory?

假如想象 提交于 2021-02-08 07:22:49
问题 I have a 87288-point dataset that I need to filter. The filtering fields for the dataset are a X position and a Y position, as latitude and longitude. Plotted the data looks like this: The problem is , I only need data along a certain path, which is known in advance. Something like this: I already know how to filter data in a Pandas DF, but given the path is not linear, I need an effective strategy to clear out all the noisy data with a certain degree of precision (since the dataset is so