问题
I downloaded a TESLA stock from www.nasdaq.com; and after I downloaded the CSV file I realized that I need convert the CSV by using Microsoft Excel 2016. I use the Data Tab; and click Text to Columns. The header is clear now, they are: date, close, volume, open, high, low. Please see the csv file here. LinK: https://drive.google.com/open?id=1cirQi47U4uumvA14g6vOmgsXbV-YvS4l
Preview (The CSV data is from 02/02/2017 until 02/02/2018):
1. date | close | volume | open | high | low |
2. 02/02/2018 | 343.75 | 3696157 | 348.44 | 351.95 | 340.51|
3. 01/02/2018 | 349.25 | 4187440 | 351.00 | 359.66 | 348.63|
The challenge for me is to create a data pointout of each month as close to the first of the month as possible. I filter in the excel file and this is the data what I get.
- date | close
- 01/02/2018 | 349.25
- 02/01/2018 | 320.53
- 01/12/2017 | 306.53
- 01/11/2017 | 321.08
- 02/10/2017 | 341.53
- 01/09/2017 | 355.40
- 01/08/2017 | 319.57
- 03/07/2017 | 352.62
- 01/06/2017 | 340.37
- 01/05/2017 | 322.83
- 03/04/2017 | 298.52
- 01/03/2017 | 250.02
- 02/02/2017 | 251.55
If I create a Data Point, it becomes like this which is need to create a graph. To display the graph of the original data and the “smoothed data” with simple exponential smoothing or sometimes it is called single exponential smoothing. This is more about Time Series Forecasting which uses python-ggplot.
- x | y
- 01/02/2018 | 349.25
- 02/01/2018 | 320.53
- 01/12/2017 | 306.53
- 01/11/2017 | 321.08
- 02/10/2017 | 341.53
- 01/09/2017 | 355.40
- 01/08/2017 | 319.57
- 03/07/2017 | 352.62
- 01/06/2017 | 340.37
- 01/05/2017 | 322.83
- 03/04/2017 | 298.52
- 01/03/2017 | 250.02
- 02/02/2017 | 251.55
The python program which I wrote is:
# -*- coding: utf-8 -*-
"""
Created on Sat Feb 3 13:20:28 2018
@author: johannesbambang
"""
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
my_data = pd.read_csv('C:\TESLA Exponential Smoothing\TSLA.csv',dayfirst=True,index_col=0)
my_data.plot()
plt.show()
My question is what should I improve in my python program? Any help will be great. Thank you in advance.
回答1:
Use Simple Exponential Smoothing in Python.
Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past, the smallest weights are associated with the oldest observations:
'''simple exponential smoothing go back to last N values
y_t = a * y_t + a * (1-a)^1 * y_t-1 + a * (1-a)^2 * y_t-2 + ... + a*(1-a)^n *
y_t-n'''
def exponential_smoothing(panda_series, alpha_value):
ouput=sum([alpha_value * (1 - alpha_value) ** i * x for i, x in
enumerate(reversed(panda_series))])
return ouput
panda_series=mydata.y
smoothing_number=exponential_smoothing(panda_series,0.6) # use a=0.6 or 0.5 your choice, which gives less rms error
estimated_values=testdata.copy() # replace testdata with your test dataset
estimated_values['SES'] = smoothing_number
error=sqrt(mean_squared_error(testdata.y, estimated_values.SES))
print(error)
回答2:
What about statsmodels ExponentialSmoothing?
statsmodels package has a lot of tools for time series analysis in python.
from statsmodels.tsa.api import ExponentialSmoothing
Also, take a look in this article about time series analysis in python:
https://www.analyticsvidhya.com/blog/2018/02/time-series-forecasting-methods/
来源:https://stackoverflow.com/questions/48604184/python-simple-exponential-smoothing