Open Journal of Bioinformatics and Biostatistics
Research Article       Open apdtcess      Peer-Reviewed

Forecast number of new cases of Corona Virus Disease (COVID-19) in Ethiopia, using the case-based autoregressive integrated moving average model

Alemu Bekele Eticha*

Department of Statistics, College of Natural and Computational Science, Mizan-Tepi University, Tepi, Ethiopia
*Corresponding author: Alemu Bekele Eticha, Department of Statistics, College of Natural and Computational Science, Mizan-Tepi University, Tepi, Ethiopia, E-mail:
Received: 21 October, 2020 | Accepted: 30 December, 2020 | Published: 31 December, 2020
Keywords: COVID-19 trend; Time series forecaster of new case of Covid-19; ARIMA (2,2,2); New case confirmed of Corona virus

Cite this as

Eticha AB (2020) Forecast number of new cases of Corona Virus Disease (COVID-19) in Ethiopia, using the case-based autoregressive integrated moving average model. Open J Bioinform Biostat 4(1): 017-022. DOI: 10.17352/ojbb.000008

After the initial outbreak in Ethiopia, the dispersion of SARS-CoV-2 is elevated number of cases. Literally, reported results for confirmed cases peaked in August 2020 and declined after that time, as evidenced by the contestd responses that have invested in pandemic control in the country.

ARIMA models are a most widely used approaches to time series forecasting and provide harmonizing approaches to the problem of forecasting. ARIMA models aim to describe autocorrelations in the data. Thus, in this study, the Autoregressive Integrated Moving Average (ARIMA) method is used to predict the number of new coronavirus cases. In short, Auto regression uses the dependent relationship between observation and lagged observations; Integrated using the difference in raw observations; and Moving Average relies on the dependency between observation and residual error.

ARIMA (2, 2, 2) predicts the number of confirmed cases of COVID-19, based on the period between March 2020 and December 2020 at 95% confidence intervals. The result revealed that the maximum expected new case per day was 807 and the minimum forecast was 410 cases per day in the next two months. In addition, the total number of confirmed COVID-19 expected cases could reach about 160585 by end-February 2021.

In general, if the government of Ethiopia stops controlling the COVID-19 mechanisms the pandemic may relapse severely and affect the country more. The study therefore proposed that the constructive stepladder incorporates control mechanisms. Thus, depending on the results of the report, all the organizations involved will establish policies.


It is understood that human coronavirus infections are responsible for mild respiratory diseases. The three global coronavirus outbreaks that contributed to significant mortality and morbidity were SARS CoV-1, MERS-CoV and SARS-CoV2. The first outbreak of the twenty first century caused by coronavirus was SARS CoV-1.

As the most common, SARS COV-1 infection had a wide range of respiratory and gastrointestinal symptoms and the last confirmed case was reported in 2004. The second outbreak in 2012 was caused by Middle East respiratory coronavirus syndrome (MERS-CoV), and case fatality was substantially higher than that of SARS-COV-1. MERS-CoV has a wide variety of mild, moderate to serious physical symptoms and some patients have acute respiratory distress syndrome. On December 2019 the third and most recent outbreak of serious acute respiratory syndrome coronavirus-2 (SARS-CoV-2) began, leading to a global pandemic. Patients with SARS-CoV2 infection may be asymptomatic or have the most typical signs of fever, cough, and shortness of breath. SARS-Cov-2, literally called COVID-19 [1], was the subject of this study.

The latest outbreak (COVID-19) became a global pandemic as of 4 May 2020 and is still ongoing [2]. The pandemic was identified in Wuhan, China in December 2019 [3]. Later, the rapid spread of this pandemic from corner to corner was a sudden shock to the entire planet. As of 31 December 2020, more than 83.8 million cases of COVID-19 have been registered in about 2010 countries and regions, resulting in more than 1.8 million deaths, while more than 59.3 million people have recovered [4].

In Africa, the first case of coronavirus was reported in Egypt on 14 February 2020. As of 31 December 2020, the five top African countries reporting most cases of Covid-19 were South Africa (1,057,161), Morocco (439,193), Tunisia (139,140), Egypt (138,062) and Ethiopia (124,264). By the end of 2020, the number of confirmed cases reported from Ethiopia (124,264) and the number of deaths reach 1923. Thus, Ethiopia is one of the countries that have been seriously affected by coronavirus disease (COVID-19) since the first case was identified on 13 March 2020. Then, to date, the country’s case report passes one hundred and twenty-four thousand. At the end of 2020, over 124264 confirmed COVID-19 cases were recorded from Ethiopia resulted in over 1923 deaths, with more than 112096 recoveries [4,5].

While well known, interventions such as hand washing, maintaining social distances and wearing face masks suggested by public health workers to monitor the spread of coronavirus,it’s transmission never stopped still in Ethiopia [6]. In the first three months, the case confirmed, and the rate of death in the country grew over the initial three months [7,8]. Since COVID-19 continues to be a global public health concern and requires the utmost effort to track the prevalence of the virus, it is argued that the fundamental prerequisite for successful monitoring of the prevalence rate is the use of pandemic predictive methods [9]. Thus, the study predicted a new case of COVID-19 in Ethiopia for the next 60 days with the goal of notifying all concerned entities to monitor the prevalence, and hazard of this pandemic.


Data set

Across the world, researchers and policy makers are looking at confirmed cases and deaths to understand and compare the spread of the COVID-19. In this study, data of COVID-19 were extracted and organized from the daily report of the public health institute until 31 December 2020. The data presented and analyzed throughout Minitab Version 13. The time series forecasting method was used to show the country’s future 60-day. The study design was a registry-based trend series analysis from the Federal Ministry of Health on a daily basis from March to the end of 2020.

Analytical methods

ARIMA modeling is one of the best modeling techniques in the time series [2,9]. ARIMA methods with the help of a number of parameters and a model expressed as ARIMA (p, d, q). Here, p stands for the order of self-regression, d stands for the degree of difference in trends, and q stands for the order of the moving average [10]. For the confirmed COVID-19 cases in Ethiopia, I have used ARIMA (2,2,2) technique. The model for predicting possible reported cases of COVID-19 was shown as;

ARIMA( p,d,f )= α 1 X t1  + α 2 X t2  + β 1 Z t1  + β 2 Z t2  + Z t  MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaWGbbGaamOuaiaadMeacaWGnbGaamyqamaabmaapaqaa8qacaWGWbGaaiilaiaadsgacaGGSaGaamOzaaGaayjkaiaawMcaaiabg2da9iabeg7aH9aadaWgaaWcbaWdbiaaigdaa8aabeaak8qacaWGybWdamaaBaaaleaapeGaamiDaiabgkHiTiaaigdacaGGGcaapaqabaGcpeGaey4kaSIaeqySde2damaaBaaaleaapeGaaGOmaaWdaeqaaOWdbiaadIfapaWaaSbaaSqaa8qacaWG0bGaeyOeI0IaaGOmaiaacckaa8aabeaak8qacqGHRaWkcqaHYoGypaWaaSbaaSqaa8qacaaIXaaapaqabaGcpeGaamOwa8aadaWgaaWcbaWdbiaadshacqGHsislcaaIXaGaaiiOaaWdaeqaaOWdbiabgUcaRiabek7aI9aadaWgaaWcbaWdbiaaikdaa8aabeaak8qacaWGAbWdamaaBaaaleaapeGaamiDaiabgkHiTiaaikdacaGGGcaapaqabaGcpeGaey4kaSIaamOwa8aadaWgaaWcbaWdbiaadshacaGGGcaapaqabaaaaa@6766@

Z t  =  X t  X t1  MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaaeaaaaaaaaa8qacaWGAbWdamaaBaaaleaapeGaamiDaiaacckaa8aabeaak8qacqGH9aqpcaGGGcGaamiwa8aadaWgaaWcbaWdbiaadshacaGGGcaapaqabaGcpeGaeyOeI0Iaamiwa8aadaWgaaWcbaWdbiaadshacqGHsislcaaIXaGaaiiOaaWdaeqaaaaa@4503@

Where X (t) is the average number of confirmed cases of COVID-19 at ith day; parameters are α1, α2, β1 and β2 while Zt is the residual term for ith day. In conjunction with already verified events, the trend analysis can be estimated and the time series analysis conducted for this purpose. The prediction of forecast refers to the application of a formula to estimate future figures on the basis of previous results. The ARIMA (2, 2, 2) was used in this study to identify patterns in reported cases of COVID-19 in Ethiopia based on data 2020. The statistical significance amount is set at 0.05. A diagram shows confirming cases regarding time to verify the model’s efficiency [11], for actual confirmed cases.


Descriptive statistics

The description of the corona virus spread in Ethiopia by months was displayed on Table 1 and Figure 1.The Ethiopian Federal Ministry of Health reported a first COVID-19 case on 13 March 2020. By the end of the month, there were 25 confirmed cases, two recoveries, and no deaths, leaving 23 positive cases of pandemics in April 2020. By the end of April, there had been 106 new cases and bringing the total number of confirmed cases to 131. There were three deaths. The number of people recovered increased to 59, leaving 69 active cases of pandemic at end of that month.

In May, 1041 new cases were identified as a pandemic, taking the total number of cases to 1172. Only eleven people died. The number of recovered patients rose to 209, leaving 952 active cases of coronavirus. There were 4,674 new cases in June, taking the total number of confirmed cases to 5846. The death toll has risen to 103. The number of people recovered rose to 2430, leaving 3313 active cases at the end of June. In July, there were 11,684 new cases, bringing the total number of confirmed cases to 17,530. The death toll has risen to 274. The number of people recovered increased to 6950. At the end of the month, there were 10,306 active cases. There were also 34,601 new cases in August, increasing the total number of confirmed cases to 52,131. The number of deaths nearly tripled to 809. At the end of the August there were 32,328 active cases of pandemic.

There were 23,237 new cases in September, taking the total number of confirmed cases to 75,368. The death toll increased to 1198. The number of people recovered increased to 30,952, leaving 42,441 active pandemic cases at the end of the month. Similarly, 20801 new cases and 271 additional deaths occurred in the end of October. The event of a pandemic increased to 96,169, and the death toll increased to 1469.

There were 13905, new cases in November, raising the total number of confirmed cases to 110,074. The death toll rose to 1706. The number of recovered patients increased to 73,815, at the end of the month. In the last month of 2020, there were 14190 new cases and 217 additional deaths. The pandemic case increased to 124,264, and the death toll rose to 1,923. The total number of new cases, deaths and recoveries was shown in Table 1.

The line of new case report decreased after it peaked in August 2020. The pattern of case is not obvious from the fuigure1. The death chart is somewhat constant while the recovery chart show increase with month.

Measures of model accuracy

In this study, the time series model encompasses to forecast COVID-19 cases in the coming 60 days. The results for the measure of model accuracy for ARIMA, Linear Trend, Quadratic Linear, and S-Curve Trend, Moving Average, and Exponential model had displayed in Table 2. Look at the mean absolute percent error (MAPE), mean absolute deviation (MAD), and the mean square of deviation (MSD) values suggest that ARIMA(2,2,2) is the most accurate of all for forecasting future values as it possesses the least point for all the measures of the models. 

Then, parameters are estimated for the ARIMA (2, 2, 2) model and displayed in Table 3. Then it is observed that AR (2) and MA (2) parameters have a p-value of 0.000, 0.000, 0.000, and 0.0001 respectively, indicating that the parameters are significant in the model at a 5% level of significance except for intercept.

The prediction of new cases of COVID-19 in Ethiopia, shown in Table 4, with a 95% confidence interval. According to the expected result, the number of confirmed COVID-19 new cases will increase slightly over the next 60 days. This increase is evidenced by the unrestricted response that has been invested in pandemic control in the country. However, the estimated predicted values were high, requiring more effort to minimize the spread of this pandemic across the country. The key problem of the outbreak is the explanation why a few people have not shown any signs of the virus spreading the virus to others without understanding the test.

The outcome of the forecasted maximum and minimum number of new Covid-19 cases will be 410 and 807 in one day. In addition, the cumulative confirmed cases reached 160,585 at the end of February 2021. Basically, it is displayed in Table 4. Thus, more prevention measures and more resources will be introduced by the government; unless the coronavirus relapses and affects the country more.

Trend analysis presents the related statistics for reported COVID-19 data in Figure 2. The time series graph of the confirmed corona virus cases from 14 March 2020 to the end of December 2020 is shown in Figure 2. From the story, it is obvious that the time series is not stationary. The growing pattern of time series plot and model is a decline with time after peak in August 2020. The predicted value points out that the pandemic would quickly relapse.

Figures 3-5 show the residual plots for confirmed COVID-19 cases in Ethiopia from 13 March 2020 to 31 December 2020. Minor and insignificant residual deviations were shown from the straight line on the gauss probability plot (Figure 4). This also implies that the errors are somewhat near normal due to a few outliers, such as the August report. Thus, the normality assumption may be slightly followed, but the residual histogram followed the normality assumption (Figure 5). The graph between the residuals and fitted values shows a small dispersion at very early times, and about 19 percent of the new case values are zero (See Figure 3). This implies that the assumption of constant variance is also satisfied, except the smaller (zero) data values are dropped.

Figures 6,7 demonstrate the diagnostic plots for the expected cases of COVID-19 in Ethiopia. ACF is a (complete) auto-correlation function that gives us the auto-correlation values of any sequence with its lagged values. It defines in simple terms how well the present value of the sequence is connected to its previous values (Figure 6). PACF is a partial auto-correlation function, finds a residual correlation (remaining after removing the effects that have already been explained by the earlier lag(s)). Then next lag value, therefore ‘partial’ and not ‘complete’ as we remove the found variations before we find the next correlation (Figure 7).


The main objective of this analysis was to predict confirmed cases of COVID-19 in Ethiopia based on past recorded cases. It is important to build a reliable and effective predictive model that can enable governments and other stakeholders to monitor the further spread of COVID-19. ARIMA models are predictive technique that offers a good forecast and has been widely used for the rapid trend of infectious diseases [2,9,10,12].

The time series model and plot indicated a small rise in the number of future cases in Ethiopia. The number of reported cases in the country is expected to exceed 160585 in the two months 2021. At the end of October, Inline projected cumulative infections across Ethiopia to hit 56,610 on average [9], while the real value until 19 October was just 89,137 cases and 1352 deaths [5]. This means that the number of people who are corona-virus-positive in Ethiopia can increase more than predicted [9]. However, in this analysis, the total number of new cases reached 124,264 at the end of 2020. As a result, this analysis was estimated to have a more reliable value than other researchers, since the disparity is 36, 321 events. Similar to the predicted values in this study, monthly variation of pandemics in Ethiopia in the first three months has been shown to increase [9].

The scholars disprove the effect of geographical difference and temperature in reducing the distribution of COVID-19 (16). However, Covid-19 has caused serious global social and economic distress. Thus, COVID-19, regulated only by pre-protective strategies. However, no aspect is evaluated in this study due to the unavailability of exposure data.

Pandemic patterns in the US and India have risen from 12 July 2020 to 11 September 2020 [10,13]. Countries are now the first and second highly confirmed cases of the pandemic. In the case of India, the case of COVID-19 was predicted using the ARIMA method and suggested the implementation of lockdown [9]. The security measures proposed were not, however, well implemented, and now the possibility of a pandemic in that country is uncontrollable. Ethiopia was 5th in the number of COVID-19 confirmed cases in Africa, according to the WHO report [5]. It is presumed that if and only if the standard WHO disease prevention and control measures are not taken [9], the worst-case scenario could occur. This study therefore showed slight elevate in cases Ethiopia.

Like other scholars, this research was carried out to forecast the potential estimation of cases of COVID-19 using the ARIMA model [9-12]. In line with this analysis, the scholars have found that ARIMA technique is suitable for predicting the prevalence of Covid-19 [13-16]. The ARIMA model predicted a pandemic outbreak in India and indicated important corona viruses in the country prior to a rise in confirmed cases [10]. Since the same model used in this analysis, the ARIMA model is useful in predicting potential cases of COVID-19. Ethiopia must also introduce controlling mechanisms and establish even more pandemic prevention policies.


Since the study shows a small increase in the number of cases in Ethiopia, more attention has been paid to the control of Covid-19. Unless the Government of Ethiopia implements a mechanism to control the pandemic, it may relapse and affect the country more. The study therefore suggested that proactive and control mechanisms should be implemented on a continuous basis. Since ARIMA model is an effort to predict the future forecast of the distribution of COVID-19, based on current data, so that the institutions have to formulate policies from now on the result of the study.

  1. Worldometer (2020) Ethiopia Corona Virus –Worldometer. WHO. Link:
  2. Unicef. Ethiopia (2020) Socio-economic impacts of COVID-19. UNICEF Ethiopia. 1. Link:
  3. Takele R (2020) Stochastic modelling for predicting COVID-19 prevalence in East Africa Countries. Infectious Disease Modelling 5: 598-607. Link:
  4. Ayenew B, Yitayew M, Pandey D (2020) Challenges and opportunities to tackle COVID-19 spread in Ethiopia. Peer Scientist 2: e1000014. Link:
  5. Abate L, Bekele A, Bedada B (2020) Status of distribution of coronavirus disease (COVID-19) in Ethiopia within first three months. AJRSP 2: 156-166. Link:
  6. Na J, Tibebu H, De Silva V, Kondoz A, Caine M (2020) Probabilistic approximation of effective reproduction number of COVID-19 using daily death statistics. Chaos, Solitons and Fractals 140: 110181. Link:
  7. Takele R (2020) Stochastic modelling for predicting COVID-19 prevalence in East Africa Countries. Infect Dis Model 5: 598-607. Link:
  8. Tandon H, Ranjan P, Chakraborty T, Suhag V (2020) Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future. Link:
  9. Das RC (2020) Forecasting incidences of COVID-19 using Box-Jenkins method for the period July 12-Septembert 11, 2020: A study on highly affected countries. Chaos, Solitons and Fractals 140: 110248. Link:
  10. Khana FM, Gupta R (2020) ARIMA and NAR based prediction model for time series analysis of COVID-19 cases in India. Journal of Safety Science and Resilience 1: 12-18. Link:
  11. Zeynep C (2020) Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ 729: 138817. Link:
  12. Papastefanopoulos V, Linardatos P, Kotsiantis S (2020) COVID-19: A Comparison of Time Series Methods to Forecast Percentage of Active Cases per Population. Applied sciences 10: 3880. Link:
  13. Lu H, Stratton CW, Tang YW (2020) Outbreak of Pneumonia of Unknown Etiology in Wuhan China: The Mystery and the Miracle. J Med Virol 92: 401–402. Link:
  14. CSSE (2020) Corona virus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). Link:
  15. Mann R, Perisetti A, Gajendran M, Gandhi Z, Umapathy C, et al. (2020) Clinical Characteristics, Diagnosis, and Treatment of Major Coronavirus Outbreaks. Front Med 7. Link:
  16. Fattorini D, Regoli F (2020) Role of the chronic air pollution levels in the Covid-19 outbreak risk in Italy. Environ Pollu 264: 114732. Link:
© 2020 Eticha AB. This is an open-apdtcess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.