Exploring dependence of COVID-19 on environmental factors and spread prediction in India

Abstract

COVID-19 has taken the world by storm, with the majority of nations still being challenged by the novel coronavirus. The present work attempts to evaluate the spread of COVID-19 in India using the Susceptible-Exposed-Infectious-Removed (SEIR) model to establish the impact of socio-behavioural aspects, especially social distancing. The impact of environmental factors like temperature and relative humidity (RH) using statistical methods, including Response Surface Methodology (RSM) and Pearson’s correlation, is also studied on numbers of COVID-19 cases per day. Here we report the resultant changes of lockdowns-unlocks initiated by the Government of India for COVID-19, as against the scenario of total lockdown. The phased unlocks and crowded gatherings result in an increase in the number of cases and stretch the mitigation timeline of COVID-19 spread, delaying the flattening of the curve. The SEIR model predictions have been fairly validated against the actual cases. The daily spread of COVID-19 cases is also fairly correlated with temperature in Indian cities, as supported by well-established causation of the role of higher temperatures in disrupting the lipid layer of coronavirus, but is greatly undermined by the key factor of social distancing and gets confounded with other multiple unknown co-varying environmental factors. However, the analysis couldn’t clearly establish the role of RH in affecting daily COVID-19 cases. Hence, it becomes essential to include environmental parameters into epidemiological models like SEIR and to systematically plan controlled laboratory experiments and modeling studies to draw conclusive inferences, assisting policymakers and stakeholders in formulating comprehensive action plans to alleviate the COVID-19 spread.

Introduction

During the end of December 2019, an outbreak of atypical pneumonia [now being called as coronavirus disease 2019 (COVID-19) started in Wuhan, China1,2,3,4. The virus is being considered of zoonotic origin. It is being referred to as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or in general as novel coronavirus, and the disease-associated is being called COVID-193,4,5. With the onslaught of SARS-CoV-2 in India, major interventions in epidemic preparedness started. These interventions include, but not limited to, public awareness, deployment of widespread testing facilities, medical institutions preparedness, and surveillance and tracking of individual movement and quarantines of suspected cases6. Presently social distancing and regularly washing of hands are some of the best ways to keep this virus at bay7.

Countries all over the world are challenged with this virus and have declared lockdowns in their various cities and states8,9,10,11. In India, nationwide lockdown is declared for 21 days starting 25th March 202012. People are instructed to stay indoors unless for emergency services13. All tourist visas and e-visas for travellers are suspended till 15th April 2020. Travellers, who returned after 15th February 2020, are quarantined for a minimum of 14 days upon their arrival in India. The researchers estimate that the virus proliferates to more than two persons from every infected person, highlighting the possibility to infect millions. The effectiveness of lockdown is contingent on people avoiding social gatherings and limiting population movement. India reported its first COVID-19 case, originating from China, on 30th January 2020 in Kerala. Government of India (GoI) reported its first COVID-19 death on 10th March 2020 in Karnataka. Till 26th May 2020, India has reported 145,380 confirmed cases with 67,404 recoveries and 4,167 deaths by COVID-19.

Researchers are trying to understand the trend of the SARS-COV-2 movement, to predict the scenarios of new potential cases and to plan effective remediations for their country14,15. The role of environmental conditions in the survivability of this virus has also emerged as a potential factor impacting the spread16,17. Few studies have suggested that virus should behave similar to most influenza or flu, or also SARS, commonly suggesting its reduced activity in hot and humid conditions18,19,20. However, it is essentially necessary to investigate the impact of local environmental conditions, on the virus spread. The alarming community spread of SARS-CoV-2 majorly impacts the public health, economics and behavioural aspect of the society.

SARS-CoV-2 affects the human body in a similar fashion as other influenza viruses, attacking the respiratory system and is affected by temperature and humidity as well21. There have been researches that correlate the decrease in mortality due to COVID-19 to unit degree rise in temperature, which is similar to other respiratory ailment causing viruses22,23. The survival of this virus is evidently affected by environmental factors such as temperature and relative humidity (RH) as reinforced by recent works of literature24,25. Some of the recent evidences show link between air pollutants and SARS-CoV-2 susceptibility26,27,28. Past evidence can also be taken into consideration where laboratory studies on SARS-CoV and MERS CoV have indicated to be more stable at lower temperatures and dry conditions compounding the help in transitivity through various factors such as increased suspension time, improved conditions for attachment, reduced function of cilia and so on29,30. The ability of a MERS-CoV virus to be transmitted via aerosols remains feasible in an airborne state. It is observed the decrease in capability of virus is significantly influenced by the temperature effect and human to human transmission31. The impact of environmental factors is exceedingly related to confirming COVID-19 cases as flu virus spreads rapidly in cold and dry condition and becomes inactive above 30 °C32. Hence from the past and recent studies, the importance of factors such as ambient temperature and RH is fairly clear and concrete.

The susceptible-exposed-infectious-recovered (SEIR models) has been reported as a successful tool to understand the pandemic dynamics and to evaluate the impact of environmental and social conditions on the spread of COVID-1933,34,35,36. Hence, the present work focuses on highlighting the impact of behavioural aspects of the society, and local environmental condition on COVID-19 spread using the SEIR model and statistical tools. The present assessments have been carried out in two folds. First is to predict the number of confirmed cases of COVID-19 for India and the time period for its subsidence. Secondly, to evaluate the dependency of COVID-19 spread on meteorological (may also be referred to as environmental) factors, namely, temperature, and RH.

Results

SEIR model output

SEIR Model is simulated for two cases. Case A, where the model has considered the input data till 3rd April 2020 and Case B was modelled with data up to 9th June 2020. The difference between the two datasets is that India had declared lockdown on 25th March 2020 after understanding the situation that possible community spread of the COVID-19 cases has started and it is to be curbed almost immediately to flatten the curve37. Later on, it emerged that the case is true, as a large number of gatherings were happening during the first and second week of March, the spillover of which is seen at the end of March and early April, given the incubation period of novel coronavirus varies from 2 to 14 days38,39,40.

For Case A, within the limit of the available datasets, the rise of the number of COVID-19 cases was gradual. The results for Case A are shown below in Table 1 and Fig. 1. Referring to SEIR model equations as supplementary data in Annexure-I, S[t] data is taken from COVID-19 cases in India from supplementary data in Annexure-II and S[t + 1], i.e., for the day plus one, is computed through the model.

Table 1 Simulation result of SEIR model for Case A.
Fig. 1
figure1

Trend of rise and fall of cases in each category of the SEIR model (Case A).

From Table 1 and Fig. 1, it can be inferred that if social distancing is strictly implemented to control the community spread, the total number of infected cases is controlled to 10,050 by mid-June and follows a decreasing trend. In Case A, the drop in curve of number of Susceptible-Exposed-Infected-Recovered cases is attributable to hypothesized stringent lockdown in India. The decrease in number of infections is noticed on 19th May. The interstate movement (inflow and outflow) is restricted up to 16th June for Case A which leads to the fall in number of cases. The simulations for Case B are indicated in Fig. 2 and Table 2.

Fig. 2
figure2

Trend of rise and fall of cases in each category of the SEIR Model (Case B).

Table 2 Simulation result of SEIR model for Case B.

Lapse in lockdown, social distancing and reopening has led to increased number of cases as indicated above. The increase in number of COVID-19 cases is influenced by the interstate movement of migrant workers, massive gathering of people and other major lapses as shown in Table 3. In Case B, the entry of possible community spread by the end of April and start of May has given the rise in the number of cases. The total maximum number of infected cases in Case B (Fig. 2) is 33.64 times of Case A (Fig. 1) for 16th June 2020. Figure 2 shows that the number of infected cases showed an increasing trend with the number of infections exceeding to 550,827 by the end of June.

Table 3 List of events during lockdown.
Fig. 3
figure3

Percentage error a infected cases b recovered cases c susceptible cases.

Validation is performed for 4 weeks to estimate the predicted values with actual values. Long term validation may not be reliable given the ever-changing scenario of India as indicated in Table 3. Figure 3 and Table 4 shows the percentage error assessed within ~1–4% with an average of 2.0%. The number of susceptible, infected and recovered of COVID-19 cases in mid and end weeks of June and first few weeks of July are validated.

Table 4 Percentage error between actual and predicted cases.

Environmental parameters

The state’s average data of temperature and RH are considered to study its relationship in the COVID-19 cases per day in two Indian states, namely; Maharashtra and Karnataka. The range of the parameters under study in these states is tabulated in Table 5.

Table 5 Range of variables and coded units.

These two states were chosen as they depict 2 different environmental conditions w.r.t. to the parameters under study, i.e., temperature and relative humidity, which is supported by two tailed t-test showing a significant difference in their meteorological condition with respect to temperature and RH as depicted in Table 6 with T = −3.67, P < 0.001 and T = −2.20, P-Value = 0.029 at 95% confidence interval, respectively. Table 6 also depicts the significant difference in the meteorological conditions of the cities under study, i.e. Mumbai, Srinagar, Kasaragod and New York, with p < 0.05 at CI of 95%.

Table 6 Two tailed t-test on the environmental parameters.

Table 7 delineates the results of ANOVA for the COVID-19 cases per day in the states of Maharashtra and Karnataka, under the varied temperature and RH conditions. From Table 7, it can be inferred that the main effect of the temperature and RH was significant for Maharashtra with F = 18.46, P < 0.001 and F = 122.17, P < 0.001, respectively and for Karnataka F = 6.82, P = 0.01 and F = 12.83, P < 0.001, respectively. However, the squared effect of temperature was found to be negligible for both the states with P > 0.05 at CI-95%, the squared effect of the RH was also seen prominently in both the states, i.e. Maharashtra (F = 24.66, P < 0.001) and Karnataka (F = 11.5, P < 0.001). Also, a significant interaction between the input variables is observed affecting the number of COVID-19 cases per day in Maharashtra with F = 4.25, P = 0.041 (CI: 95%). However, Karnataka depicts no significant interactions between the temperature and RH, impacting the number of cases per day with F = 3.54, P = 0.063 (CI: 95%). The response surface mapped over the number of COVID-19 cases per day reflects a good model fit R-Sq of 83.63% and 57.22% for Maharashtra and Karnataka, respectively. The model fit is also confirmed with low standardized residuals, with good fits for both Maharashtra and Karnataka. Figure 4 depicts that modelled inputs data points are in close agreement with normal distribution curve line and residuals are closely distributed at different levels of the independent variable, and hence it can be said that equality of variance exists between varied levels of the response.

Table 7 Summary of ANOVA results for COVID-19 cases per day in states of India: (a) Maharashtra (b) Karnataka and (c) T and P values for Maharashtra and Karnataka.
Fig. 4
figure4

Residual plots for states: a Maharashtra b Karnataka.

The results of dependence of meteorological parameters on COVID-19 cases per day, might depict a feeble causation of the parameters, as the meteorological parameters have been grossly averaged over the spatial-temporal variations across a large areas of these states. Hence, the detailed analysis was performed at city level, to better understand the reliance of the parameters on COVID-19 cases per day. ANOVA, tabulated in Table 8a, b, has been performed for 3 cities in India and 1 city in USA as these cities are observed to be significantly different from each other in terms of meteorological parameters with P < 0.05 (CI: 95%) for both temperature and RH as shown in Table 6. The ANOVA results depict the significance of the dependence of the COVID-19 cases per day on temperature and RH, in all the four cities, with P < 0.05 at CI: 95%.

Table 8 ANOVA for COVID-19 cases per day in New York, Kasaragod, Srinagar and Mumbai.

In case of New York, the main effect, and interaction effect of RH and temperature are found to be significantly dependent of the COVID-19 cases per day in the city with low p-values less than 0.05 at 95% CI, except for the squared effects with F = 1.16 and P = 0.318. The response surface mapped shows a negative impact of the rising temperature and positive impact of relative humidity. The positive impact of RH is superseded (as indicated by the RSM model coefficients in coded units) by the negative combined effect of T and RH on the COVID-19 cases per day, as depicted by the RSM model coefficient, with a reasonable model fit with R-Sq = 53.70% and R-Sq(adj) = 51.86%.

Similarly, in case of Kasaragod, the RSM model has depicted a reasonable fit with R-Sq = 47.81% and R-Sq(adj) = 45.44, where the effect of temperature was found to have an significant and shows negative correlation with the COVID-19 cases per day in the city of Kasaragod with p < 0.05 (CI: 95%). The squared effect and interaction effect of the temperature and RH was the also found to be significant factors that impact COVID-19 cases with P < 0.001. For Kasaragod, the negative effect of T is superseded by the positive impact its squared effect and interaction effect.

For Srinagar, the main effect temperature was the only significant factor with F = 7.18 and P = 0.009 at CI of 95%, which is positively correlated with the response with R-Sq = 54.43% and R-Sq(adj) = 52.24%. In case of Mumbai, the main effect of Temperature, RH and squared effect of temperature were found be significant at P < 0.001 for all the factors with RSq = 24.91% R-Sq(adj) = 21.65%. However, the main effect temperature has shown a negative impact on the COVID-19 cases; its impact is subsided by the positively correlated squared effect of temperature, with higher model coefficient in coded units.

The overall impact of the temperature is observed to have a positive correlation on the COVID-19 cases for all the three Indian Cities i.e., Kasaragod, Srinagar and Mumbai, and is negatively correlated for New York This change is sign of correlation might be attributed to breaches in social distancing in the city of Mumbai39,41, its extremely high population density42, where the effect of environmental factors become secondary and less impactful, affecting the COVID-19 cases. Similarly, Srinagar has been witnessing regular lockdowns, owing to continued and ever-going political and public unrest43 and the spread is contained in through a strict lockdown scenario already existing in Srinagar, mitigating the spread of the virus. However, Kasaragod was taken as a model and a much balanced city, which don’t have extreme socio-political conditions as in case of Mumbai and Srinagar. But for Kasaragod, with India Unlocking from 1st June 2020 and considering a span of 42 unlocked days, the social/physical distance have breached at many levels across India (inclunding trans-state movements) and social/physical distance have been hard to maintain and monitor, despite of time-to time government advisories. However, considering the analysis for the city of Kasaragod before the unlock, i.e. upto 28th May, 2020 (Annexure IV), had been implemented, the temperarture was negatively correlated, strenthening and testifying the hypothesis that social distancing is the key and primary parameter impacting the spread and rise in COVID19 cases and environmental/ meteorological factors are undermined by social distancing factor and show a feeble impact. These impacts of meteorological parameters might also gets confounded with multiple environmental factors like bio-aerosols concentrations, air pollutions, particulates etc. and many other unknown co-varying factors. Figure 5 depicting the residual plot of RSM model confirms that the meteorological parameters are in agreement with the normal distribution line and residual being evenly distributed at all levels of input parameters and hence the assumption of the equality of variance holds true.

Fig. 5
figure5

Residual plot for cities a Srinagar b New York c Kasaragod and d Mumbai.

In addition to the RSM exploration of the above state and city cases dependency on temperature and RH, linear statistical correlation is also explored in order to understand if there is any significant difference between the results. Indian cities considered for analysis are Mumbai, Srinagar and Kasaragod. The Pearson’s correlation matrix for each city is shown in Table 9. New York, which has seen unprecedented growth in COVID-19 cases, is also correlated for temperature and RH and validates similar results as Indian cities. The linear correlation is assessed for above-stated states as well.

Table 9 Pearson’s Correlation Matrix for meteorological parameters on COVID-19 cases per day.

From Table 9, Kasargod and New York show a negative correlation with temperature, while Mumbai and Srinagar show a positive correlation. It is interesting to see that RH for all states and cities show positive correlation except for Srinagar. The strength of these correlations varied from strong to weak relations, as indicated in Table 9. It has been well established that with the increase in temperature, the RH should drop given that the holding capacity of air will increase. It is also interesting to note that for Kasaragod, the pearson’s correlation reflect a negative correlation for temperature in contract to a positive correlation depicted by RSM model. This change is correlations is attributed to the insufficiency of the linear correlation in pearson’s models compared to quadratic RSM model, which is capable to capture the non-linearity of the dataset to a great extent. Hence, a cautious approach should be adopted using linear models while drawing conclusive inferences. Mumbai shows a weak positive correlation with both temperature and RH, while Temperature-RH relation is also positive. This might be due to the fact that as Mumbai is a coastal city and a densely packed one with respect to population and land use. Another reason could be that Mumbai being a metropolitan city and depends on public commutes, making the implementation of social distance practically very difficult. In such cases, the dominance of other factors, for disease spread, on environmental factors cannot be ignored and need separate detailed research. Further, this also confirms that only environmental factors may not be able to mitigate this particular strain of novel coronavirus and other behavioural attributes are bound to pay a major role in its alleviation.

In the present work, single season short-term meteorological condition is studied given the availability of data and evolution of COVID-19 situation. RSM tools is used to understand the intricacies of multiple parameters on evolution of COVID-19 in the select cities. It is suggested that when considering the long term meteorological data series, it becomes essential to use the de-trended anomaly data, to establish an objective correlation between COVID-19 and meteorological parameters. Hence, future scope of work may include a de-trended anomaly perspective applied to all variables under consideration to remove longer term trends in the data and meteorological variables have to be calculated against the long-term average of the respective 6-day periods under consideration (taking thus into account the longer term background climatology), which will take care of the seasonal trend in temperature and humidity data.

Discussion

In current time, when every COVID-19 affected nation is making efforts to mitigate and alleviate the spread of this virus, it becomes essential to study the correlations of the cases of COVID-19 reported per day with respect to behavioural and environmental attributes. India, similar to other countries, is making efforts to curb the spread by strictly monitoring the lockdowns and ensuring social distancing apart from the quest for targeted medical solutions. In a country like India, where the population exceeds 1 billion, social distancing remains a challenging but potential and practically implementable solution. In the current research, an effort is made to establish the impact of social distancing to subside the rising cases in India through simulating multiple cases using the SEIR model. The model concluded that breach of social distancing by people engagement in crowded gatherings, with the onset of possible community spread, might result in extending the time to control the spread with a total number of infected cases exceeding 1,350,000 and still increasing at a rapid rate. When compared to the normal case of strict compliance of lockdown, at present the cases are 33.64 times higher and increasing. The above scenario is considering number of gatherings which occurred in end of April and during May, June; however, if there are continued gatherings like this, the outcome can be disastrous in terms of number of people infected and respective deaths. Hence, the decision of GoI of nationwide lockdown was very timely to control the spread of the virus and continued efforts are needed towards that so that the pandemic curve keeps rising at a rate which does not become a burden on medical facilities. The finding of SEIR helps to conclude that social distancing is one of the best tools available with governments right now to control the spread of this disease. The SEIR model is validated for a period of four weeks and gives a better understanding of the dynamics of virus spread are changing.

Also, the impact of environmental conditions is found to be significantly affecting the spread of COVID-19, established through statistical models like RSM and correlation matrices. The COVID-19 mean incubation periods of 6 days have been considered for establishing the correlation with environmental factors. The effect of meteorological parameter on daily COVID-19 cases, although show a good RSM model fit with R-Sq 83.63% for Maharashtra and a reasonable fit for Karnataka at 57.22%, doesn’t implicitly shows the relationships on COVID-19 cases per day as the environmental parameters gets grossly averaged over the states and doesn’t essentially capture the essence of the relation between the meteorological parameters with COVID-19 cases. Hence, the analysis at the city level is desirable to draft more conclusive inferences.

As temperature rise has been reported to mitigate the spread of viruses, similar trends are observed in New York, USA under the study for SARS-CoV-2. However, the temperature is observed to have a positive correlation on the COVID-19 cases for all the Indian cities, i.e., Srinagar, Mumbai and Kasaragod, which is inverse of the relationship obtained for New York. The relation for Mumbai can be attributed to breaches in social distancing in the city of Mumbai and its extremely high population density, where the effect of environmental factors become secondary and less impactful, affecting the COVID-19 cases. Similarly, Srinagar has been witnessing regular lockdowns, owing to continued and ever-going political-public unrest and the spread is contained in through a strict lockdown scenario, unlike New York, mitigating the spread of the virus.

Also, in the case of Kasaragod with unlock implementation, the effect of temperature and humidity is undermined and is dominated by the social distance factor, thereby showing a positive correlation, which was negative till the lockdown periods. Learning for the analysis of Indian cities and comparison with a baseline city as New York, it can be inferred that although the meteorological parameters have found be correlate with the COVID-19 cases but are greatly undermined by the social distancing factor, either breached or very strictly imposed, as seen in the all the Indian cities. Hence, in such cities, social distancing is the key and primary parameter impacting the spread and rise in COVID-19 cases, undermining the environmental/ meteorological factors, which might get confounded with other co-varying environmental factors.

Researches have suggested that hot climate should reduce the spread of COVID-19 owing the well-established causation of the role of higher temperatures within disrupting the lipid layer of coronavirus and viruses of similar nature, while the relation with RH has never been clear and established, till date. From the current research, it can be concluded that environmental factors impact on the COVID-19 can’t be negated, however, the clear significance of these parameters would require more controlled laboratory experiments and modeling study, including environmental parameters into epidemiological models like SEIR, to objectively conclude its impact on the COVID-19 transmission and mitigation.

Methods

SEIR model

The prediction of COVID-19 cases is based on the SEIR (Susceptible-Exposed-Infectious-Removed) model, which has been used to simulate two practical implementation conditions related to societal behavioural issue. Recently there have been advancements in terms of modified SEIR model and it has been upgraded to global model by incorporating potentially infected and infected but not isolated cases, however, the current analysis is restricted to globally accepted modified SEIR model, given the robust data available for this model. The first case is where the lockdown is followed diligently and the other being with community breaches, partial lockdowns, and failure of social distancing implementation protocols. Both of these scenarios actually happened in India and have been considered accordingly.

The SEIR model for the spread of Infectious disease is simulated, similar to severe acute respiratory syndrome44. Immunity, infected, exposure probability and recovery/removal are the compartments of the model45. The modified SEIR model with its codes and governing equations are stimulated for India having maximum number of reported COVID- 19 cases for understanding efficacy of measures in the current time and future. Governing equations along with the data for SEIR model are given as supplementary data in Annexure I. World Health Organization (WHO), India reported the numerals of COVID-19 pandemic spread in India on a day-to-day basis. The primary sources of the data are WHO situation reports for India. Daily migration index has been computed on the basis of mass movements of people from one city to another through various mode of transportation like air, road and rail. SEIR equations are modelled using python program v2.7.5 (open source) to predict the forthcoming Susceptible [S], Exposed [E], Infected [I], Recovery [R] cases for India46. The parameters considered with respect to the model are given as follows:

SEIR model parameters

Table 10 below gives the SEIR model parameters. Detailed equations and corresponding values are given as supplementary data in Annexure-I.

Table 10 SEIR model parameters.

The model is run for two scenarios, one with complete lockdown throughout the duration (Case A) and the other with phased opening a breach in lockdown (Case B). The duration of model run is taken from 31st January until 4th August. The data for Case A is considered from 31st January to 3th April 2020 to ensure no breach in lockdown and it simulated further as a base scenario. Various breaches are observed 31st of March 2020 as given in Table 3 and are incorporated in Case B. The transmission rate is calculated on a weekly basis using methodologies adopted from the literature47,48 for the given data sets for each case as stated above. The coding for SEIR model runs for each week by solving equations as given in supplementary data in Annexure-I. The simulation for both cases A and B are done on the basis of the single run as population and social distancing has proven to be one of the most important parameters in controlling the spread of this virus and is validated for a specific time period48.

Breach in lockdown is reported by end of March 2020. Mass gathering of people at Mosque by Tablighi jamaat approximately 2300 people gathered from numerous neighbouring countries. Another vast event in Ahmedabad, thousands of people rushed to ration shops while distributing the ration approximately from 17,000 outlets. Another event like Unlock 1 and 2 has also gave rise to number of cases. Such events led to the massive spread of this infectious disease. Further, due to massive economic loss, phased relaxation is given in lockdown, which added to number of cases. Some of the relevant and important breaches related to lockdown and change in governing policies are highlighted in Table 3.

The predicted values is determined from the input data is shown in supplementary Table. 1. Dataset of incubation rate and recovery or death rate are used in the equations to estimate the upcoming cases. The data on confirmed cases for India which have been used in the above model is shown as supplementary data in supplementary file.

Statistical analysis

Two Indian States have been selected for studying the impact of the meteorological parameters on COVID-19 cases per day, namely, Maharashtra and Karnataka. The reasons for the selection of these states was, firstly the different environmental conditions w.r.t temperature and RH in these 2 states and secondly, the availability of consistent data on meteorological conditions and reporting COVID-19 cases. The select states have also witnessed breaches in social distancing norms41,49,50; hence it becomes important to study the impact of environmental conditions in such states to rationally weigh the impact of environmental factors and social distancing factor. The raw data on meteorological parameters for the states is calculated by averaging all the stations data, considering 24 h data including day and night values, available from CPCB (Central Pollution Control Board), India and the station-wise data is appended as supplementary information in Annexure-V.

Averaging meteorological conditions over the entire spatial domain of the state would be a gross representation of the environmental factors studied for dependence on confirmed COVID-19 cases per day. Hence, it becomes essentially important to explore the dependence of meteorological conditions at the cities level to overcome the assumptions made during the analysis at state level. Therefore, four additional cities were selected for in-depth analysis to establish the dependence of the environmental factors on COVID-19 cases per day. Three Indian Cities i.e., Mumbai, Kasaragod and Srinagar and 1 city abroad i.e., New York was selected for the analysis. The selection was primarily based on the different environmental conditions in the select Indian cities with the baseline city of New York, USA. These select cities had also witnessed one of the most rapid rises in COVID-19 cases per day i.e. Mumbai in India and New York in the USA or one of the first states to report the COVID*19 cases i.e. Kasaragod in India; Srinagar was chosen for a baseline comparison of an Indian city of similar environmental conditions to New York in the USA.

The hypotheses testing is performed using 2-sample t-test to establish the mean difference in the environmental conditions, w.r.t. temperature and RH of the selected cities is equal to 0 (vs. not = 0) at a 95% confidence interval. The analyses would help establish the distinct environmental conditions across the studied states and cities, aiding the selection and analyses of significant different states and cities, in terms of meteorological conditions, on the daily COVID-19 cases.

While studying the relation of environmental factors with COVID-19, RH and temperature are considered as the base parameters. The daily COVID-19 cases in various cities of India are correlated with above parameters to understand the relation. The COVID-19 mean incubation periods of 6 days have been considered for establishing the correlation with environmental factors51,52. Therefore, the average temperature and RH for six-day have been considered with one-day onset for the time taken for COVID-19 testing and reporting. Hence, six-day time series moving average of environmental conditions i.e. temperature and RH have been correlated with the results of the COVID-19 case reported on the seventh-day and the same have been consistently used in all the subsequent analysis to study the SEIR and impact of meteorological conditions on COVID-19 cases.

RSM is used as an effective statistical tool to analyse the linear and squared effect of meteorological parameters and understanding the interaction between these meteorological parameters affecting the daily COVID-19 cases. Hence, in addition to simple pearson’s correlation (depicting linear relation), Response Surface Methodology (RSM) is used to establish an empirical association between the meteorological parameters, which included Temperature and RH, over the confirmed/reported COVID-19 cases per day in India. The raw data related to statistical modelling is given as supplementary data in Annexure III. RSM compared to the classical methods has been advantageous due to it reduce dataset requirement to map the responses over the complete domain of the inputs variable. The quadratic model in RSM also helps minimise the crude linear approximations of the inputs parameters on the response function, thus aids capturing the non-linearity of the meteorological processes and thus variables (being a consequence of those processes) through a full quadratic model as depicted by the RSM model Eq. (1).

A response surface is used to map the entire cases of the particular region using a full quadratic function. The association of the parameters can be depicted as a second-order polynomial equation.

R=𝛽0+𝑖=1𝑖=𝑛𝛽𝑖𝑥𝑖+𝑖=1𝑖=𝑛𝛽𝑖𝑖𝑥2𝑖+𝑖<𝑗𝛽𝑖𝑗𝑥𝑖𝑥𝑗,R=β0+∑i=1i=n⁡βixi+∑i=1i=n⁡βiixi2+∑∑i<jβijxixj,
(1)

where R is the response variable, i.e. COVID-19 cases per day; β0 a constant; βi the linear coefficient; βii the squared coefficient; and βij the product-coefficient, xi represent the linear or main effect of the input variable, x2𝑖xi2 represent the squared effect of the input variable, xixj represent the interaction term of the input variable and n is the number of factors53,54. The model coefficients (i.e., β0,βi.βii and βij) are computed using the least square method by minimizing the sum of squared residuals/error. The significance of the respective parameters is established by Analysis of Variance (ANOVA), which can interpret in terms of P values55. The lower the P-value, the higher is the significance of the corresponding parameters. For all the analysis, the p-value less than 0.05 at confidence interval of 95% is considered as a sufficient condition to reject the null hypothesis and represent the significance of the parameter. A good model fit can be affirmed with high R2 (also referred as R-Sq) value53,54, which help check the competency of the model. All the inputs and response variable have been modelled in coded units’ i.e. highest parameter values as +1 and lowest parameter values as −1 and the remaining parameters have been linearly interpolated between −1 to +1. The analysis in coded unit helps understand and interpret the significance of the input parameters using the coefficient of the RSM model mapped within the limit of the dataset i.e. a particular city/state. Two- sample t-test and Response surface modelling (RSM) is carried using statistical software MINITAB 1453.