Methods
Sample
1.1 Study population
In the world
there are now 195 independent sovereign states (including disputed but defacto
independent Taiwan), plus about 60 dependent areas, and five disputed
territories, like Kosovo. (http://www.nationsonline.org/oneworld/countries_of_the_world.htm).
1.2 Study sample
The current
study includes data from a sample of 248 world countries .
1.3 Description of the sample
The sample
represents 95% (248/260) of the whole .population of sovereign states, dependent
areas and disputed territories. Data are from the World Bank database of
countries indicators; data refers to years 2013.
Measures
2.1 Description of the variables to be included in the analysis
Life expectancy at birth, the response variable, indicates the number of years a newborn infant would live if prevailing patterns of mortality at the time of its birth were to stay the same throughout its life. This variable refers to year 2013 and derived from male and female life expectancy at birth from sources such as: (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Report (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
The explicatory
variables included in the analysis are:
GDP per capita
GDP per
capita is gross domestic product divided by midyear population. GDP is the sum
of gross value added by all resident producers in the economy plus any product
taxes and minus any subsidies not included in the value of the products. It is
calculated without making deductions for depreciation of fabricated assets or
for depletion and degradation of natural resources. Data are in current U.S.
dollars. The World Bank is the source of data and values refer to year 2013.
Access
to electricity
Access to
electricity is the percentage of population with access to electricity.
Electrification data are collected from industry, national surveys and
international sources. Values refer to year 2013.
Health
expenditure per capita
Total
health expenditure is the sum of public and private health expenditures as a
ratio of total population. It covers the provision of health services
(preventive and curative), family planning activities, nutrition activities,
and emergency aid designated for health but does not include provision of water
and sanitation. Data are in current U.S. dollars and refers to year 2013.
Improved
sanitation facilities
Access to
improved sanitation facilities refers to the percentage of the population using
improved sanitation facilities. Improved sanitation facilities are likely to
ensure hygienic separation of human excreta from human contact. They include
flush/pour flush (to piped sewer system, septic tank, pit latrine), ventilated
improved pit (VIP) latrine, pit latrine with slab, and composting toilet. Data
are from the WHO/UNICEF Joint Monitoring Programme (JMP) for Water Supply and
Sanitation and refers to year 2013.
Improved
water source
Access to
an improved water source refers to the percentage of the population using an
improved drinking water source. The improved drinking water source includes
piped water on premises (piped household water connection located inside the
user’s dwelling, plot or yard), and other improved drinking water sources
(public taps or standpipes, tube wells or boreholes, protected dug wells,
protected springs, and rainwater collection). Data are from the WHO/UNICEF
Joint Monitoring Programme (JMP) for Water Supply and Sanitation and refers to year 2013.
Labor
force, female
Female
labor force as a percentage of the total show the extent to which women are
active in the labor force. Labor force comprises people ages 15 and older who
meet the International Labour Organization's definition of the economically
active population. Data are from the International Labour Organization and refers
to year 2013.
PM2.5
air pollution
Percent of
population exposed to ambient concentrations of PM2.5 that exceed the WHO
guideline value is defined as the portion of a country’s population living in
places where mean annual concentrations of PM2.5 are greater than 10 micrograms
per cubic meter, the guideline value recommended by the World Health
Organization as the lower end of the range of concentrations over which adverse
health effects due to PM2.5 exposure have been observed. Data are from the Institute
for Health Metrics and Evaluation, University of Washington in Seattle and refers
to year 2013.
2.2 How variables are managed.
Life expectancy
at birth, the response variable, has been dichotomized to split countries depending
on whether they have a value less/equal to 65.65 years (countries with low life
expectancy at birth) or higher.
All predictors
variables are kept in the original continuous format.
Analyses
1) Description of the statistical methods
The distributions
for the predictors and life expectancy at birth, the response variable, were
evaluated by examining frequency tables for categorical variables and
calculating the mean, standard deviation and minimum and maximum values for
quantitative variables.
Scatter
plots and box plots were also examined, and Pearson correlation and Analysis of
Variance (ANOVA) were used to test bivariate associations between individual
predictors and life expectancy at birth, the response variable.
Classification
and Regression Tree (CART) analysis was used to identify possible determinants
of life expectancy; CART analysis was performed using PROC HPSPLIT in SAS
version 9.14
The entropy
criterion was selected in the GROW statement to split the observations during
the process of recursive partitioning that results in a large initial tree.
Cost-complexity was selected in the PRUNE statement for pruning and select a
smaller subtree that avoids overfitting the data.
The
cost-complexity plot was also displayed with estimates of the average square
error (ASE) for a series of progressively smaller subtrees of the large tree.
The
confusion matrix to evaluate the accuracy of the fitted tree, the
misclassification rate, the specificity and the sensitivity were also
calculated. Missing value are assigned using ASSIGNMISSING=POPULAR.
2) Training and tests data sets
CART
divides the data into learning and test subsamples. The learning sample is used
to grow an overly large tree, while the test sample is then used to estimate
the rate at which cases are misclassified. The misclassification rate is
calculated for every sized tree and the selected subtree represents the lowest
probability of misclassification.
3) Type of cross validation to be used
A cross
validation of the final model parameters is performed and a table that
describes the cross validation error measures of the parameters is produced.
Cross
validation method used assign each training observation randomly to one of 10 folds
(with a probability of 1/10 for any given fold).
Nessun commento:
Posta un commento