Time Series Analysis of Holt Model and the ARIMA Model Facing Covid-19

Background: Since the first appearance of the novel coronavirus in Wuhan in December 2019, it has quickly swept the world and become a major security incident facing humanity today. While the novel coronavirus threatens people’s lives and safety, the economies of various countries have also been severely damaged. Due to the epidemic, a large number of enterprises have faced closures, employment has become more difficult, and people’s lives have been greatly affected. Therefore, to establish a time series model for Hubei Province, where the novel coronavirus first broke out, and the United States, where the epidemic is most severe, to analyze the spreading trend and shortterm forecast of the new coronavirus, which will help countries better understand the development trend of the epidemic and make more adequate preparation and timely intervention and treatment to prevent the further spread of the virus. Methods: For the data collected from Hubei Province, including cumulative diagnoses, cumulative deaths, and cumulative cures, we use SPSS to establish the time series model. Since there is no problem of missing data values, we define days as the time variable, remove outliers, and set the width of the confidence interval to 95% for prediction, then use SPSS’s expert modeler to find the best-fit model for each sequence. ACF, PACF graphs of the residuals, and Q-tests are used to determine whether the residuals are white noise sequences and to check whether the model is a suitable model. Holt model is used for the cumulative number of diagnoses, and ARIMA (1,2,0) model is used for cumulative cures and deaths. Similarly, we also collect data for the US, including the cumulative number of diagnoses, cumulative deaths, and cumulative cures. For the three groups mentioned above, ARIMA (2,2,6) model, ARIMA (0,2,0) model, and ARIMA (0,2,1) model are used respectively. Findings: From our modeling of the data, the time series diagrams of the real the fitted data almost overlap, so the fitting effect of the Holt model and the ARIMA model we use is very suitable. We compare the predicted values with the real values of the same period and find that the epidemic situation in Hubei Province has basically ended after May, but the epidemic situation in the United States has become more severe after May, so the Holt model and the ARIMA model are also very appropriate in predicting the epidemic situation in short-term. Interpretation: Because the Chinese government has always put the safety of people ’s lives in the first place, when the epidemic broke out, it decisively closed the city of Hubei Province. One side is in trouble, all sides support, they concentrate all resources of whole country to save Hubei Province at the expense of the economy only in order to save more people. Now we can clearly see that the epidemic has been controlled in China and the whole country is developing in a good direction. The situation in the United States, on the other hand, is also influenced by the social environment.


Introduction
Since December 2019, many cases of cough, dyspnea, And now in May in our country is almost under control,

Abstract
Background: Since the fi rst appearance of the novel coronavirus in Wuhan in December 2019, it has quickly swept the world and become a major security incident facing humanity today. While the novel coronavirus threatens people's lives and safety, the economies of various countries have also been severely damaged. Due to the epidemic, a large number of enterprises have faced closures, employment has become more diffi cult, and people's lives have been greatly affected. Therefore, to establish a time series model for Hubei Province, where the novel coronavirus fi rst broke out, and the United States, where the epidemic is most severe, to analyze the spreading trend and short-term forecast of the new coronavirus, which will help countries better understand the development trend of the epidemic and make more adequate preparation and timely intervention and treatment to prevent the further spread of the virus.

Methods:
For the data collected from Hubei Province, including cumulative diagnoses, cumulative deaths, and cumulative cures, we use SPSS to establish the time series model. Since there is no problem of missing data values, we defi ne days as the time variable, remove outliers, and set the width of the confi dence interval to 95% for prediction, then use SPSS's expert modeler to fi nd the best-fi t model for each sequence. ACF, PACF graphs of the residuals, and Q-tests are used to determine whether the residuals are white noise sequences and to check whether the model is a suitable model. Holt model is used for the cumulative number of diagnoses, and ARIMA (1,2,0) model is used for cumulative cures and deaths. Similarly, we also collect data for the US, including the cumulative number of diagnoses, cumulative deaths, and cumulative cures. For the three groups mentioned above, ARIMA (2,2,6) model, ARIMA (0,2,0) model, and ARIMA (0,2,1) model are used respectively.
Findings: From our modeling of the data, the time series diagrams of the real the fi tted data almost overlap, so the fi tting effect of the Holt model and the ARIMA model we use is very suitable. We compare the predicted values with the real values of the same period and fi nd that the epidemic situation in Hubei Province has basically ended after May, but the epidemic situation in the United States has become more severe after May, so the Holt model and the ARIMA model are also very appropriate in predicting the epidemic situation in short-term.
Citation: Mingzhe  Not only that, but in later cases we also found a proportion of asymptomatic infections, which is defi ned as the absence of relevant clinical symptoms (e.g., self-perceived or clinically recognizable symptoms and signs such as fever, cough, sore

Data
The data of Hubei Province is derived from the Health Commission of Hubei Province on its offi cial platform from 20 January, 2020. The Hubei Province data collected in this paper includes cumulative deaths, cumulative cures and cumulative diagnoses from 20 January, 2020 to 28 April, 2020 [3].
The data of the United States comes from the domestic data platform, the Real-time Big Data Report of the Epidemic. The data collected in this paper includes cumulative deaths, cumulative cures and cumulative number of diagnoses from 29 February, 2020 to 28 April, 2020 [4].

The model
Through the collected data, we conduct a time series analysis of the novel coronavirus [5][6][7]. Because there is no data missing, we import the data into SPSS, defi ne the day as the time variable, remove the outliers, and make time series graphs. The most suitable fi tting models are automatically found by the expert modeler, which include the cumulative deaths, cumulative cures and cumulative diagnoses. The explanation of abbreviations used in time series analysis are listed below.
As the Table 1 shows, the ACF and the PACF are used to check whether the model is fi tted. The ACF is the autocorrelation coeffi cient and does not control for other variables. The partial autocorrelation coeffi cient PACF, on the other hand, is the autocorrelation coeffi cient calculated after controlling for Auto Regressive. The AR model is a statistical a method for processing time series that predicts the current period of x t using the previous periods of the same variable, e.g., x, x 1 to x t-1 . performance, and assuming they are a linear relationship. since this is a development from linear regression in regression analysis, except that instead of using x to predict y, x predicts x (itself).

MA(q)
Moving Average. The MA model is one of the model parametric spectral analysis methods and is a commonly used model in modern spectral estimation. q-order moving average model (MA(q)) autocorrelation coeffi cients have q-order truncation.

ARIMA(p, d, q)
Autoregressive Integrated Moving Average. The ARIMA model, also known as the differential integrated moving average autoregressive model, is one of the time series prediction analysis methods. In ARIMA (p, d, q), AR is "auto regressive", p is the number of autoregressive terms; MA is "sliding average", q is the number of autoregressive terms. The number of sliding mean terms, d the number of times (in steps) the difference was made to make it a smooth sequence.

ACF
Autocorrelation Function. Autocorrelation, also called serial correlation, is the correlation of a signal with itself at different points in time. Informally, it is a function of the similarity between two observations on the time difference between them.
PACF Partial Autocorrelation Function. Rather than fi nding the correlation between a lag like ACF and the current, Partial Autocorrelation Function fi nds the correlation between the residual and the next lag value. Thus, if there is any hidden information in the residuals that can be modeled by the next lag, we may get a good correlation, and We will use the next lag as a feature when modeling.

TS Model-based method for estimation in hubei province
Based on the given data of cumulative number of diagnoses in Hubei Province, we use the expert modeler to process the data and obtain the Holt model [8,9], to describe it. The corresponding equation set shown below. (1 )( The related mathematical symbols used above are listed in the following Table 3.
Finally, we fi nd that .
In order to determine that the selected Holt model can correctly describe the cumulative number of diagnoses, we use white noise [10], for residual test.
From Figure 1, the ACF and PACF graphs of the residuals can be seen that the autocorrelation coeffi cients and partial correlation coeffi cients of all lag orders are not signifi cantly different from 0.From Figure 2, it can also be seen that the P value obtained by the Q test is 1, that is, we cannot reject the null hypothesis and think that the residual is a white noise sequence, so the Holt model can describe the cumulative number of diagnoses well.
From the analysis above, we can conclude that the Holt model can describe the cumulative number of diagnoses well.
Similarly, the expert modeler evaluates that the cumulative number of cures conforms the ARIMA (1, 2, 0) model [11]. The related equations are demonstrated followed.
The related mathematical symbols used above are listed in the following Table 4.
The lag operator used in equation , indicates the difference. The Table 5 below shows the specifi c functions of lag operator.
We also use white noise to test this model for residuals. As Next, we use white noise for the residual test. From Figure   4, the ACF and PACF graphs of the residuals can be seen that the autocorrelation coeffi cients and partial correlation coeffi cients of all lag orders are not signifi cantly different from 0. we can fi nd that the ARIMA (1, 2, 0) model can also describe the cumulative death toll well.

TS Model-based method for estimation in the united states
Based on the given data of America, we use the expert modeler to process the data and we fi nd that all of the cumulative number of diagnoses, deaths and cures conform the ARIMA model. However, the parameters setting of each group of them are not identical.
After processing, it is found that the cumulative diagnoses  The number of periods ahead of the forecast, i.e., the forecast step size  Next, we performed a residual test on the model based on white noise. As can be seen from Figure 5, the ACF and PACF graphs of the residuals, the autocorrelation coeffi cients and partial correlation coeffi cients of all lag orders are not signifi cantly different from 0. From Figure 6, it can also be seen that the P value obtained from the Q test of the residual is 0.304, that is, we cannot reject the null hypothesis, and think that the residual is a white noise sequence.
Therefore, the ARIMA (2,2,6) model can well describe the cumulative number of diagnoses.
The cumulative number of cures conforms ARIMA (0, 2, 0) model, which is equal to 2nd order difference equation. The related equation set is similar to the equation .
As usual, we should use white noise to perform a residual test. As the Figure 7 shows below, the ACF and PACF graphs of the residuals can be seen that the autocorrelation coeffi cients and partial correlation coeffi cients of all lag orders are not signifi cantly different from 0.
At last, we use expert modeler to process the data of cumulative deaths in U.S., then it is found that it conforms the ARIMA (0, 2, 1) model. The related equation set still conforms with the equation .
Then we perform a white noise residual test. As can be seen from Figure 8, the ACF and PACF graphs of the residuals, the autocorrelation coeffi cients and partial correlation coeffi cients of all lag orders are not signifi cantly different from 0.

The TS Model-based method results in hubei province
We set the width of the confi dence interval to 95%, then use the Holt model and ARIMA model to fi t and predict the cumulative number of people diagnosed, cumulatively cured

Functions
Corresponding equations d-order differential      and cumulatively died in Hubei Province respectively. The obtained results shown in the following fi gures.
As can be seen in the Figure 9-11, the time series plots of the real and fi tted data almost overlap, and the Holt and ARIMA models fi t the original data well.
At the same time, after 28 April, the epidemic situation in Hubei Province has been controlled, and the cumulative number of diagnoses will basically not increase dramatically. The Holt model and ARIMA model can also well predict the cumulative diagnoses, cumulative cures and cumulative deaths. The Table   6 Table 7 below.

Discussion
At the time of the virus outbreak, everyone is living in panic, fearing for their lives and the lives of their families and the safety of the country.
We use SPSS [12], to accurately get the models we need, such as Holt model and ARIMA model, and then use these models to fi t the sequences, and estimate the model parameters based on the sequence values. Finally, we perform residual tests on the models with white noise to check whether the model is applicable.
Time series analysis of COVID-19 gives the course, direction and trend of the epidemic and predicts the likely development of future epidemics to what state. This will give us some guidance in our lives, such as in the response to the epidemic what measures should be taken to intervene in the development of the epidemic, to save more lives.
We can see from the results that not only the epidemic situation of China and the United States selected period is different, but also in epidemic development after 60 days or so.
It can be seen that outbreak in the United States will continue to present an exponential growth, but the epidemic situation of Hubei province is basically controlled, probably around 100 days in Hubei province in May outbreak was completely under control. The outcome of the epidemic is different, which is largely related to the measures taken by the country to combat the epidemic, as can be seen from the attitude of China and the United States towards the epidemic [13]. That's why it's important to have the right understanding in the face of an epidemic, and not to be blindly arrogant and underestimate the seriousness and potential dangers of an epidemic.
Time series can not only be used in the analysis of infectious diseases, but also in many social disciplines such as measurement [14] and economy [15]. The data order and size in time series contain the information of the objective world and its changes, and represent the dynamic process. Therefore, the main purpose of time series analysis is to understand the considered dynamic system, predict future events, and control future events through intervention [16,17].

Limitations
When establishing the model, we regard the data with large fl uctuations as outliers. In fact, there are many more complex models that can catch these outliers. At the same time, when we make predictions, the ARIMA model is only suitable for short-term prediction. Over a certain period of time, the predicted value will not change any more, due to the theory of the model. So when solving this problem, we can assume the predicted values as the observed values, thus the long-term prediction would be possible, but the difference between the truly observed data might be larger and larger.
Since the epidemic is only predicted in the short term, it can be seen from the analysis chart of epidemic situation in the United States that the cumulative number of people who are cured, died and diagnosed, cases are all moving in an increasing direction. However, in practice, the number of people in these categories should reach a stable value in the end.