Volume 8, (Spl-1- SARS-CoV-2), October-November Issue - 2020, Pages:S01-S08 |
Authors: Syed Tahir Ali Shah, Abeer Iftikhar, Muhammad Imran Khan, Majad Mansoor, Adeel Feroz Mirza, Muhammad Bilal |
Abstract: Given the socio-economic impact of coronavirus disease 2019 (COVID-19), it is essential to gauge the spread of this disease. Pakistan is one of the countries, which initially did not suffer from this disease. To observe prediction of COVID-19 cases in Pakistan, the study would utilize a linear regression model. By this model, we can predict the number of infected cases in Pakistan in an efficient way. Linear regression and correlation are two parameters used in the estimation of the linear relationship between various parameters. Correlation tells about the direction and strength of an intervariable linear relationship without discrimination between dependent and independent variables (daily COVID-19 infections are the independent variable, and prediction value is the dependent variable). While linear regression explains the estimations that can predict the values based on given information (number of infected and number of prediction) and consider dependent and independent variables. A scatter plot is deemed to be a useful tool in the determination of relationship strength between relative variables. A correlation coefficient is the measure of association with numerical configuration between two comparable variables that can stand between [-1, 1]. By using this linear regression model, we can predict the number of cases in Pakistan. |
[Download PDF] |
Full Text: 1 Introduction The novel coronavirus is responsible for the pandemic that started in December 2019 form Wuhan, which is the capital city of Hubei province in China and has rapidly prevailed over the world (Chimmula & Zhang, 2020). The World Health Organization (WHO) declared coronavirus as a global pandemic on 11 March 2020. Many countries around the world are trying to restrict the further spreading of this virus (Tomar & Gupta, 2020). COVID-19 is a third Coronavirus outbreak globally and earned the 3rd highest death rate in the history of pandemics (Bilal et al., 2020). The world's economy has stood on the verge of destruction; even the world's largest economies could not control the virus at this time. So far, there has been no treatment of the disease. The COVID-19 case in Pakistan was reported in Karachi, Sindh province, on 26th February 2020 and was confirmed by the country’s Ministry of Health (Waris et al., 2020). Multiple linear regression (Rath et al., 2020), linear regression (Ghosal et al., 2020), advance integrated moving average (Singh et al., 2020), autoregressive integrated moving average (ARIMA)(Ilie et al., 2020), data-based forecasting (Anastassopoulou et al., 2020), and short-term and long-term prediction models are successfully utilized for gauging the impact of COVID-19 spread in different countries of the world (Ahmar & del Val, 2020; Seyed Mohammad Ayyoubzadeh et al. 2020). The common purpose is to identify and plan the peak time, scale of spread, and allocate resources for prevention, control, and cure of COVID-19. Some prediction tools may help handle the crisis in the best administrative way. As recently, we have used the SIR model to predict the expected number of infected cases and deaths (Ali Shah et al., 2020). Here, we propose a linear regression method for the prediction of the number of COVID-19 cases during the upcoming days. Based on previous data, Shah et al. (2020) have used the Regression model to predict COVID-19 in Pakistan by using the said model. Here, we will predict the number of infected and diagnosed cases and deaths in September 2020. A different Beta and Gamma values are used to predict the COVID-19. 1.1. Related work Waris et al. (2020) have compared Pakistan with other countries having a strong economy. They analyzed the number of available hospitals for COVID-19, inadequate health facilities in Pakistan, and also discussed quarantine facilities in the country. The authors have discussed the complete scenario of Pakistan’s hospital capability and economic condition. Yang et al. (2020) used artificial intelligence by using the trained SARS 2003 data to predict current COVID-19. The authors have also used the susceptible, exposed, infected, and recovered (SEIR) model to derive the COVID-19 curve and to find peaks and magnitude of the spread of COVID-19. The study has predicted that the peak of COVID-19 in China will be in late February and a gradual decline by the end of April (Yang et al., 2020). A data-driven long-short term memory (LSTM) has been used to predict COVID-19 in India for the next month (May-2020).The number of positive cases and recovery rates per day is estimated, utilizing LSTM methods. The study evaluated the increase and deceased of cases and predicts that the spread of the virus could be reduced if preventative measures, like lockdown and social distance, be exercised (Tomar & Gupta, 2020). Chimmula & Zhang (2020) used LSTM by employing available data and to build a time series, sequential model, and to predict and find peak COVID-19 in Canada. The study also predicted that the current pandemic in Canada would end within three months (first week of August 2020), and a small number of infection clusters may appear until the end of 2020. The study also has compared the transmission rate in Canada with other countries, like Italy and America (Chimmula & Zhang, 2020). Ayyoubzadeh et al. (2020) collected data from Google Trend and used LSTM and linear regression to estimate the number of positive cases of COVID-19 in Iran. The authors have claimed that the prediction could be very helpful to policymakers and the health care department to allocate health resources accordingly. Bandyopadhyay & Dutta (2020) have utilized deep learning and neural networks long short-term memory and gated recurrent unit (GRU) to train the model of neural network (NN). The combination of n of LSTM and GRU provides better prediction of COVID-19. Supplementary rechecking of frontline medical workers could be a promising strategy. It is crucial to improve the quality of results by implementing the accurately designed and optimized detection process (Bandyopadhyay & Dutta, 2020). 2 Materials and Methods All the global data is taken from the WHO daily situation report (WHO 2020), and Pakistani data is obtained from the official page of the government of Pakistan for COVID-19 (Government of Pakistan 2020). The data are downloaded in the CVS file and Jupiter notebook and analyzed with the Python 3.5 software uploaded. To obtain information related to infections and death cases by country, many researchers often use linear regression to estimate the precise average numerical values Y for a known value of X by the application of a regression line. If the y-intercept and slope of the regression line given, the value of X can be driven. Through these x values, values of Y can be estimated by calculation, which can be considered as estimated values. Linear regression analyses were used to establish a linear regression model that could successfully facilitate the average numerical values prediction of COVID-19 future in Pakistan. The cumulative figure of COVID-19 cases in Pakistan as the dependent variable and plotting the regional distribution of the number of cases. 3 Results and Discussion Figure 1 explains the machine learning classification. Machine learning has three main types: reinforcement learning, unsupervised learning, and supervised learning. The proposed regression analysis in current manuscript falls into the category of supervised learning, which is an approach of creating artificial intelligence, where the program is given labelled input data and the expected output results. Supervised learning allows you to collect data or produce a data output from the previous experience, and the linear regression model predicts a single output value using training data. Linear Regression is not just a machine learning algorithm; it plays a considerable role in statistics. In supervised learning, each input is associated with a target label. The task of the model is to understand the pattern and find the best fit line that covers each input target pair. Through the model, the value of y is calculated by the calculations through linear combinations of input values of variables x. This hypothesis is written as the following equation. Table 3 illustrates the state of daily confirmed cases in Pakistan. From this table, it can be seen that in the beginning period, there were no cases reported in Pakistan. In Pakistan, the first case was registered on 26 February 2020, and then gradually, it increased, and then the peak of Covid-19 was observed in mid of July month. Whereas, from August 2020, a gradually decreasing trend in cases was recorded. 3.3 Coefficients of regression polynomials Regression coefficients are estimated values of undetermined parameters, and they express the relationship of predictor variable with a response. For linear regression, values that multiply the predictor are known as coefficients. For example, in equation y= 5X+3, the coefficient is +5 predictor is X while 3 is constant. There is another form of linear regression known as polynomial regression, in which the relationship between dependent and independent variables is reproduced as the specific degree polynomial like nth degree. The relationship adjusted by polynomial regression between dependents variable y and independent variables x is adjusted in a nonlinear relationship. [0.00000000e+00 -2.96412582e+00 -5.40109773e+01 2.70260730e+00 -5.11969748e-02 4.53852559e-04 -1.85262628e-06 2.81785306e-09] Results of current study indicate variance percentages of the dependent variable that the independent variables explain collectively. R2 measures the strength of the relationship between the models and the dependent variable on a convenient 0 – 100% scale. The value of R2 = 0.9987181307458601 in this case. 4 Conclusion and future work In this research analysis, the linear regression model is commonly used to has an estimation of the expected number of patients and predict the expected peaks of diseases. Similarly, we have used the linear regression model for the prediction of the number of COVID-19 cases in Pakistan. If the same situation prevails, according to the |
Ahmar AS, Del Val EB (2020) SutteARIMA: Short-term forecasting method, a case: Covid-19 and stock market in Spain. Science of The Total Environment 138883. Ali Shah ST, Mansoor M, Mirza AF, Dilshad M, Khan MI, Farwa R, Khan MA, Bilal M, Iqbal HMN (2020) Predicting COVID-19 Spread in Pakistan using the SIR Model. Journal of Pure and Applied Microbiology 14(2): 1423-1430. 10.22207/JPAM.14.2.40 Anastassopoulou C, Russo L, Tsakris A, Siettos C (2020) Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PloS one 15(3): e0230405. Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, Kalhori SRN (2020) Predicting COVID-19 incidence through analysis of google trends data in iran: data mining and deep learning pilot study. JMIR Public Health and Surveillance 6(2): e18828. Ayyoubzadeh SM, Ayyoubzadeh SM, Zahedi H, Ahmadi M, Kalhori SRN (2020) Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study. JMIR Public Health Surveill 6(2): e18828. 10.2196/18828 Bandyopadhyay SK, Dutta S (2020) Machine Learning Approach for Confirmation of COVID-19 Cases: Positive, Negative, Death and Release. MedRxiv 10.1101/2020.03.25.2004350510.1101/2020.03.25.20043505 Bilal M, Khan MI, Nazir MS, Ahmed I, Iqbal HMN (2020) Coronaviruses and COVID-19 – Complications and Lessons Learned for the Future. Journal of Pure and Applied Microbiology 14(suppl 1): 725-731. 10.22207/JPAM.14.SPL1.09 Chimmula VKR, Zhang L (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 135(109864. 10.1016/j.chaos.2020.109864 Ghosal S, Sengupta S, Majumder M, Sinha B (2020) Prediction of the number of deaths in India due to SARS-CoV-2 at 5–6 weeks. Diabetes & Metabolic Syndrome: Clinical Research & Reviews. DOI: DOI: 10.1016/j.dsx.2020.03.017. Goverment Of Pakistan (2020) CORONAVIRUS DASHBOARD OF PAKISTAN. Journal. 2020(Issue), http://covid.gov.pk Ilie OD, Cojocariu RO, Ciobica A, Timofte SI, Mavroudis I, Doroftei B (2020) Forecasting the spreading of COVID-19 across nine countries from Europe, Asia, and the American continents using the arima models. Microorganisms 8(8): 1158. Rath S, Tripathy A, Tripathy AR (2020) Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear Singh RK, Rani M, Bhagavathula AS, Sah R, Rodriguez-Morales AJ, Kalita H, Nanda C, Sharma S, Sharma YD, Rabaan AA (2020) Prediction of the COVID-19 pandemic for the top 15 affected countries: Advanced autoregressive integrated moving average (ARIMA) model. JMIR public health and surveillance. 6(2): e19115. Tomar A, Gupta N (2020) Prediction for the spread of COVID-19 in India and effectiveness of preventive measures. Science of the Total Environment 728(138762. 10.1016/j.scitotenv.2020.138762 Waris A, Atta UK, Ali M, Asmat A, Baset A (2020) COVID-19 outbreak: current scenario of Pakistan. New Microbes New Infect. 35(100681. 10.1016/j.nmni.2020.100681 World Health Organization (2020) Coronavirus disease Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M, Liu P, Cao X, Gao Z, Mai Z, Liang J, Liu X, Li S, Li Y, Ye F, Guan W, Yang Y, Li F, Luo S, Xie Y, Liu B, Wang Z, Zhang S, Wang Y, Zhong N, He J (2020) Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. Journal of Thoracic Disease 12(3): 165-174. 10.21037/jtd.2020.02.64.
|