Write a 2- to 3-page critique of the research you found in the Walden Library that includes responses to the following prompts:
Why did the authors select binary logistic regression in the research?
Do you think this test was the most appropriate choice? Why or why not?
Did the authors display the results in a figure or table?
Does the results table stand alone? In other words, are you able to interpret the study from it? Why or why not?
Clinical Research Report
Differential analysis of
disease risk assessment
using binary logistic
regression with
different analysis strategies
Wenbo Xu1,*, Yang Zhao2,*, Shiyan Nian3,
Lei Feng1, Xuejing Bai2, Xuan Luo2 and
Feng Luo2
Abstract
Objective: To investigate the importance of controlling confounding factors during binary logis-
tic regression analysis.
Methods: Male coronary heart disease (CHD) patients (n¼ 664) and healthy control subjects
(n¼ 400) were enrolled. Fourteen indexes were collected: age, uric acid, cholesterol, triglyceride,
high density lipoprotein cholesterol, low density lipoprotein cholesterol, apolipoprotein A1, apo-
lipoprotein B100, lipoprotein a, homocysteine, total bilirubin, direct bilirubin, indirect bilirubin,
and c-glutamyl transferase. Associations between these indexes and CHD were assessed by
logistic regression, and results were compared by using different analysis strategies.
Results: 1) Without controlling for confounding factors, 14 indexes were directly inputted in the
analysis process, and 11 indexes were finally retained. A model was obtained with conflicting
results. 2) According to the application conditions for logistic regression analysis, all 14 indexes
were weighed according to their variances and the results of correlation analysis. Seven indexes
were finally included in the model. The model was verified by receiver operating characteristic
curve, with an area under the curve of 0.927.
Conclusions: When binary logistic regression analysis is used to evaluate the complex relation-
ships between risk factors and CHD, strict control of confounding factors can improve the
reliability and validity of the analysis.
1Department of Laboratory, People’s Hospital of Yuxi
City, Yuxi, Yunnan, P.R. China
2Department of Laboratory, The Sixth Affiliated Hospital
of Kunming Medical University, Yuxi, Yunnan, P.R. China
3Intensive Care Unit, People’s Hospital of Yuxi City, Yuxi,
Yunnan, P.R. China
*These authors contributed equally to this study.
Corresponding author:
Lei Feng, Department of Laboratory, People’s Hospital of
Yuxi City, 21 Nieer Road, Yuxi, Yunnan 653100, P.R. China.
Email: [email protected]
Journal of International Medical Research
2018, Vol. 46(9) 3656–3664
! The Author(s) 2018
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0300060518777173
journals.sagepub.com/home/imr
Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative
Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which
permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is
attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
mailto:[email protected]
http://uk.sagepub.com/en-gb/journals-permissions
http://dx.doi.org/10.1177/0300060518777173
journals.sagepub.com/home/imr
http://crossmark.crossref.org/dialog/?doi=10.1177%2F0300060518777173&domain=pdf&date_stamp=2018-06-08
Keywords
Binary logistic regression, confounding factor control, coronary heart disease, analysis strategies,
statistical methods, uric acid, cholesterol, triglycerides, lipoprotein, bilirubin
Date received: 9 February 2018; accepted: 23 April 2018
Introduction
Logistic regression has become a relatively
commonly used statistical method for stud-
ies involving risk assessment of complex
diseases. The most common and mature
method is binary logistic regression analy-
sis.1 A large number of studies and appli-
cations have shown that the logistic
regression model can meet the require-
ments of classification data modeling, and
it has become the standard method for
modeling categorical dependent variables.2
However, the application of the binary
logistic regression model has strict require-
ments. If the conditions in which the
model is used are inappropriate or inap-
propriately controlled, this may lead to
unexplained results and erroneous conclu-
sions.3 In this study, we compare two
logistic regression results obtained by dif-
ferent analysis strategies through the use of
a practical case, and then discuss the
necessity and importance of confounding
factor control in binary logistic regression
analysis. In this case, we studied the corre-
lations between 14 physiological and bio-
chemical indexes and the risk of coronary
heart disease (CHD); we attempted to
establish a CHD risk assessment model
through binary logistic regression analysis.
Although there were gender differences in
CHD risk factors, the results were similar
between genders. For brevity, we have lim-
ited this report to the results from analyses
involving male patients.
Materials and methods
Participants
This study was approved by the ethics
review committee of Yuxi People’s
Hospital of Yunnan Province. All subjects
in this study were recruited from People’s
Hospital of Yuxi City, and provided written
informed consent to participate. Inclusion
criteria for CHD patients were: 1.
Coronary angiography clearly showed at
least one instance of vascular stenosis
>50%. 2. A history of coronary stent
implantation. Inclusion criteria for the con-
trol group were: 1. Coronary angiography
results were normal; 2. There were no symp-
toms of clinical ischemic chest pain (i.e.,
myocardial markers were normal and exer-
cise electrocardiogram was flat). Exclusion
criteria for CHD patients were any one of
the following: 1. Coronary angiography
showed stenosis of <50%. 2. Serious
systemic disease (e.g., cancer, respiratory
failure, liver or kidney dysfunction). 3.
A history of immunosuppressive therapy,
trauma, cancer chemotherapy, infection,
radiotherapy, or a recent operation (within
2 months). 4. Incomplete medical record or
partial data collection.
Biochemical analysis
The following serum biochemical indicators
were examined in the laboratory: uric acid
(UA), cholesterol (TC), triglyceride (TG),
Xu et al. 3657
high density lipoprotein cholesterol (HDL-
C), low density lipoprotein cholesterol
(LDL-C), apolipoprotein A1 (ApoA1),
apolipoprotein B100 (ApoB100), lipopro-
tein a (Lp(a)), homocysteine (HCY), total
bilirubin (TBIL), direct bilirubin (DBIL),
indirect bilirubin (IBIL), and c-glutamyl
transferase (c-GT). All testing was per-
formed by using Switzerland-imported
Roche Cobas C 701 automatic biochemical
analyzer (Roche Diagnostics Co., Ltd.,
Shanghai, China) and supporting reagents.
Statistical analysis
IBM SPSS Statistics, version 20.0 (IBM
Corp., Armonk, NY, USA) was used for
statistical analysis. All variables were first
evaluated for conformance to a normal dis-
tribution, to check for uniformity in concen-
tration and dispersion. Logarithmic
transformation of variables with skewed dis-
tributions ensured consistency in subsequent
analyses. Mean and standard deviation were
used to describe each variable; independent
samples t-tests were used to compare means
between case and control groups. The
Pearson Correlation coefficient matrix was
used to assess correlations between varia-
bles. Conditional forward method binary
logistic regression analysis was used to estab-
lish a mathematical model of the relation-
ship between the variables and CHD. The
validity of the mathematical model was eval-
uated through receiver operating character-
istic curve (ROC curve) analysis and the area
under the ROC curve (AUC) was measured.
The significance level was set at a¼ 0.05
(Figure 1).
Results
Patients
We analyzed a total of 664 male patients
with CHD, who were initially diagnosed at
the Internal Medicine-Cardiovascular
Department of our hospital between
October 2010 and March 2013. Patient
ages were 27-87 years old, mean age 62.2
years. Additionally, we analyzed a total of
400 healthy males, ages 24-84 years old,
mean age 45.5 years, during the same
period (control group).
Logistic regression without controls for
confounding factors
Without correcting for the distribution
characteristics and internal correlation of
the 14 independent variables, the condition-
al forward method of binary logistic regres-
sion of independent and dependent
variables was performed directly; the results
are shown in Table 1. After 11 step regres-
sions, of all 14 variables, only TC, TBIL,
and IBIL were excluded. Age, APOB100,
Lp(a), and HCY were positively correlated
with CHD, the OR value of APOB100
[Odds ratio equation for Exp (B)] was
35.959, and LDL-C showed a negative cor-
relation with CHD (Exp(B) value is 0.396).
These are all incomprehensible results. If
the application conditions for binary logis-
tic regression are adequately controlled,
such a result will not appear, as shown in
Figure 1. Receiver operating characteristic curve
of the evaluation model.
3658 Journal of International Medical Research 46(9)
the follow-up analysis. UA, TG, HDL-C,
LDL-C, APOA1, and DBIL were negative-
ly correlated with CHD.
Logistic regression with controls for
confounding factors
After normality testing (K-S test) for the 14
indexes, we found that UA, TC, HDL-C,
LDL-C, and APOA1 conform to the
normal distribution, so the actual values
were used in subsequent analyses. In contrast,
APOB100, TG, Lp (a), HCY, TBIL, DBIL,
IBIL, and c-GT exhibited non-normal distri-
butions, so these indexes were converted by
logarithm transformation. Notably, age does
not conform to the normal distribution; how-
ever, to understand the relationship between
age and CHD, the actual value of age was
used in this analysis. Difference analysis of
the 14 indexes showed no significant differ-
ences between the case group and the control
group in terms of lnc-GT; for all other vari-
ables, the differences were significant.
Detailed data are shown in Table 2.
Correlation analysis
To clearly assess the internal correlation of
each variable, and to analyze the influences
of confounding factors, Pearson correlation
analysis was used to quantify correlations
among variables; these results are shown in
Table 3. There was a strong correlation
between age and lnHCY (0.413); UA
showed a relatively strong independence;
TC exhibited strong correlations with LDL-
C (0.748) and lnAPOB100 (0.645); a strong
correlation was observed between HDL-C
and APOA1 (0.651); there was a strong cor-
relation between LDL-C and lnAPOB100
(0.569); there is a strong correlation between
lnTG and lnc-GT (0.351); lnLp (a) is rela-
tively independent, and the correlation coef-
ficient with respect to every other variable is
lower; lnTBIL and lnDBIL (0.633) exhibit a
strong correlation and especially with lnIBIL
(0.930); lnc-GT has strong independence.
Characterization of the final model
According to the difference and correlation
analysis results, and combined with the
Table 1. Binary logistic regression results without confounding factor correction
B SE Wald df P Exp(B)
95% CI
of Exp(B)
Step 11 Age 0.106 0.010 112.713 1 <0.001 1.112 1.090–1.134
UA �0.004 0.001 12.842 1 <0.001 0.996 0.993–0.998
TG �0.538 0.081 43.631 1 <0.001 0.584 0.498–0.685
HDL-C �2.846 0.511 31.006 1 <0.001 0.058 0.021–0.158
LDL-C �0.925 0.152 37.011 1 <0.001 0.396 0.294–0.534
APOA1 �2.503 0.571 19.195 1 <0.001 0.082 0.027–0.251
APOB100 3.582 0.690 26.924 1 <0.001 35.959 9.293–139.149
Lp(a) 0.001 0.001 5.726 1 0.017 1.001 1.000–1.002
HCY 0.198 0.024 69.720 1 <0.001 1.219 1.164–1.277
DBIL �0.197 0.065 9.339 1 0.002 0.821 0.723–0.932
c-GT 0.010 0.002 16.406 1 <0.001 1.010 1.005–1.014
Constant 1.053 1.226 0.738 1 0.390 2.866
B, coefficient value; SE, standard error; df, degrees of freedom; 95% CI, 95% confidence interval; UA, uric acid; TG,
triglycerides; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; APOA1, apoli-
poprotein A1; APOB100, apolipoprotein B100; Lp(a), lipoprotein a; HCY, homocysteine; DBIL, direct bilirubin; c-GT,
c-glutamyl transferase.
Xu et al. 3659
appropriate application conditions for
binary logistic regression, age, UA, TC,
lnTG, lnLp (a), lnHCY, HDL-C, LDL-C
and lnTBIL (nine indicators) were finally
added to the binary logistic regression
analysis. After seven step regressions, age,
UA, HDL-C, the lnTG, lnLp (a), lnHCY,
and lnTBIL were fitted into the regression
model; detailed results are shown in
Table 4. The model fitting effect was
tested by ROC and the result is shown in
Figure 1. Age, UA, HDL, lnTG, lnLp (a),
lnHCY, and lnTBIL jointly predicted the
risk of CHD with an AUC of 0.927 (95%
confidence interval: 0.911–0.942); thus, this
evaluation model has relatively better sen-
sitivity and specificity.
Discussion
The aim of this study was to explore the
correlations among 14 physiological and
biochemical indexes and CHD, in order to
establish a disease risk assessment model, as
this is a commonly used method for the
study of risk factors of complex diseases.
For methodology selection, “whether suf-
fering from CHD” constitutes the typical
binary classification variable, whereas the
14 indicators are all continuous variables.
The most common and mature analysis
strategy is binary logistic regression when
studying the direction of weak intensity cor-
relation between continuous variables and
two classification variables. Each medical
statistical method has its scope of applica-
tion and its confounding factor control
strategy; if these are ignored during the
scope of its application, erroneous results
may be obtained, such that the effectiveness
of the research will be questioned.4–12
In this study, the initial binary logistic
regression was performed without any con-
founding factor control, and the results
seemed to be satisfactory, but there were
some results that were questionable. For
instance, (1) “recognized” cardiovascular
risk factor LDL-C is displayed as a protec-
tive factor; (2) c-GT showed no significant
difference between CHD group and control
group in difference analysis, while c-GT has
been proven as an independent risk factor
Table 2. Data summary and t-test results
Controls (n¼400) CHD (n¼664) t P
Age (years) 45.46� 12.13 62.16� 11.04 �23.016 <0.001
UA (lmol/L) 404.92� 86.14 376.08� 95.79 5.069 <0.001
TC (mmol/L) 4.92� 0.88 4.57� 1.08 5.821 <0.001
ln TG (mmol/L) 0.69� 0.63 0.53� 0.56 4.385 <0.001
HDL-C (mmol/L) 1.34� 0.33 1.13� 0.26 10.979 <0.001
LDL-C (mmol/L) 3.00� 0.76 2.83� 0.96 3.216 0.001
APOA1 (g/L) 1.42� 0.27 1.20� 0.23 13.518 <0.001
ln APOB100 (g/L) �0.23� 0.20 �0.15� 0.23 �5.862 <0.001
ln Lp(a) (mg/L) 4.74� 0.88 4.99� 1.01 �4.241 <0.001
ln HCY (lmol/L) 2.39� 0.45 2.90� 0.36 �19.373 <0.001
ln TBIL (lmol/L) 2.61� 0.35 2.47� 0.41 5.958 <0.001
ln DBIL (lmol/L) 1.26� 0.39 1.11� 0.54 4.909 <0.001
ln IBIL (lmol/L) 2.31� 0.37 2.12� 0.52 7.066 <0.001
ln c-GT (IU/L) 3.65� 0.77 3.69� 0.69 �0.859 0.391
Data are presented as mean� standard deviation.
UA, uric acid; TC, total cholesterol; TG, triglycerides; HDL-C, high density lipoprotein cholesterol; LDL-C, low density
lipoprotein cholesterol; APOA1, apolipoprotein A1; APOB100, apolipoprotein B100; Lp(a), lipoprotein a; HCY, homo-
cysteine; TBIL, total bilirubin; DBIL, direct bilirubin; IBIL, indirect bilirubin; c-GT, c-glutamyl transferase.
3660 Journal of International Medical Research 46(9)
T
a
b
le
3
.
P
e
ar
so
n
co
rr
e
la
ti
o
n
an
al
ys
is
re
su
lt
s
A
ge
U
A
T
C
H
D
L
-C
L
D
L
-C
A
P
O
A
1
ln
T
G
ln A
P
O
B
1
0
0
ln L
p
(a
)
ln H
C
Y
ln T
B
IL
ln D
B
IL
ln IB
IL
ln
c-
G
T
A
ge
P
C
C
1
.0
0
0
�0
.1
6
4
**
�0
.1
5
3
**
�0
.0
9
7
**
�0
.0
7
1
*
�0
.1
7
0
**
�0
.2
6
6
**
0
.0
4
0
0
.1
4
4
**
0
.4
1
3
**
�0
.1
1
7
**
�0
.0
2
2
�0
.1
5
8
**
�0
.1
5
5
**
U
A
P
C
C
�0
.1
6
4
**
1
.0
0
0
0
.0
9
2
**
�0
.0
2
3
0
.0
1
0
0
.0
1
0
0
.2
5
1
**
0
.0
5
3
�0
.0
7
5
*
0
.0
5
0
�0
.0
5
1
�0
.0
7
7
*
�0
.0
3
3
0
.1
4
1
**
T
C
P
C
C
�0
.1
5
3
**
0
.0
9
2
**
1
.0
0
0
0
.2
8
2
**
0
.7
4
8
**
0
.0
2
0
0
.3
2
4
**
0
.6
4
5
**
0
.0
8
4
**
�0
.0
8
2
**
0
.0
0
2
�0
.1
6
3
**
0
.0
6
7
*
0
.1
7
7
**
H
D
L
-C
P
C
C
�0
.0
9
7
**
�0
.0
2
3
0
.2
8
2
**
1
.0
0
0
0
.1
3
1
**
0
.6
5
1
**
�0
.3
2
2
**
�0
.0
6
9
*
0
.1
0
7
**
�0
.1
9
6
**
0
.1
9
8
**
0
.1
4
5
**
0
.1
9
1
**
�0
.0
2
5
L
D
L
-C
P
C
C
�0
.0
7
1
*
0
.0
1
0
0
.7
4
8
**
0
.1
3
1
**
1
.0
0
0
�0
.1
2
9
**
0
.0
4
2
0
.5
6
9
**
0
.0
9
8
**
�0
.0
2
1
0
.0
3
8
�0
.1
1
4
**
0
.1
0
5
**
0
.0
2
8
A
P
O
A
1
P
C
C
�0
.1
7
0
**
0
.0
1
0
0
.0
2
0
0
.6
5
1
**
�0
.1
2
9
**
1
.0
0
0
�0
.1
3
5
**
�0
.2
8
8
**
0
.0
2
7
�0
.2
4
6
**
0
.1
7
3
**
0
.1
2
3
**
0
.1
7
6
**
�0
.0
3
6
ln
T
G
P
C
C
�0
.2
6
6
**
0
.2
5
1
**
0
.3
2
4
**
�0
.3
2
2
**
0
.0
4
2
�0
.1
3
5
**
1
.0
0
0
0
.2
6
9
**
�0
.1
7
2
**
�0
.1
0
9
**
�0
.1
2
2
**
�0
.2
9
4
**
�0
.0
2
8
0
.3
5
1
**
ln
A
P
O
B
1
0
0
P
C
C
0
.0
4
0
0
.0
5
3
0
.6
4
5
**
�0
.0
6
9
*
0
.5
6
9
**
�0
.2
8
8
**
0
.2
6
9
**
1
.0
0
0
0
.1
2
9
**
0
.1
1
2
**
�0
.0
6
1
*
�0
.2
3
0
**
0
.0
0
8
0
.1
6
5
**
ln
L
p
(a
)
P
C
C
0
.1
4
4
**
�0
.0
7
5
*
0
.0
8
4
**
0
.1
0
7
**
0
.0
9
8
**
0
.0
2
7
�0
.1
7
2
**
0
.1
2
9
**
1
.0
0
0
0
.0
5
5
0
.0
1
3
�0
.0
0
9
0
.0
0
1
�0
.0
7
9
*
ln
H
C
Y
P
C
C
0
.4
1
3
**
0
.0
5
0
�0
.0
8
2
**
�0
.1
9
6
**
�0
.0
2
1
�0
.2
4
6
**
�0
.1
0
9
**
0
.1
1
2
**
0
.0
5
5
1
.0
0
0
�0
.1
1
2
**
�0
.0
7
3
*
�0
.1
2
9
**
�0
.0
0
3
ln
T
B
IL
P
C
C
�0
.1
1
7
**
�0
.0
5
1
0
.0
0
2
0
.1
9
8
**
0
.0
3
8
0
.1
7
3
**
�0
.1
2
2
**
�0
.0
6
1
*
0
.0
1
3
�0
.1
1
2
**
1
.0
0
0
0
.6
3
3
**
0
.9
3
0
*
*
�0
.0
1
0
ln
D
B
IL
P
C
C
�0
.0
2
2
�0
.0
7
7
*
�0
.1
6
3
**
0
.1
4
5
**
�0
.1
1
4
**
0
.1
2
3
**
�0
.2
9
4
**
�0
.2
3
0
**
�0
.0
0
9
�0
.0
7
3
*
0
.6
3
3
**
1
.0
0
0
0
.3
4
2
**
�0
.0
0
3
ln
IB
IL
P
C
C
�0
.1
5
8
**
�0
.0
3
3
0
.0
6
7
*
0
.1
9
1
**
0
.1
0
5
**
0
.1
7
6
**
�0
.0
2
8
0
.0
0
8
0
.0
0
1
�0
.1
2
9
**
0
.9
3
0
**
0
.3
4
2
**
1
.0
0
0
�0
.0
1
5
ln
c-
G
T
P
C
C
�0
.1
5
5
**
0
.1
4
1
**
0
.1
7
7
**
�0
.0
2
5
0
.0
2
8
�0
.0
3
6
0
.3
5
1
**
0
.1
6
5
**
�0
.0
7
9
*
�0
.0
0
3
�0
.0
1
0
�0
.0
0
3
�0
.0
1
5
1
.0
0
0
N
o
te
:
**
C
o
rr
el
at
io
n
is
si
gn
ifi
ca
n
t
at
th
e
0
.0
1
le
ve
l,
*C
o
rr
e
la
ti
o
n
is
si
gn
ifi
ca
n
t
at
th
e
0
.0
5
le
ve
l.
P
C
C
:
P
e
ar
so
n
C
o
rr
e
la
ti
o
n
co
e
ff
ic
ie
n
t,
P
C
C
>
0
.5
ar
e
m
ar
ke
d
in
b
o
ld
.
U
A
,
u
ri
c
ac
id
;
T
C
,
to
ta
l
ch
o
le
st
e
ro
l;
T
G
,
tr
ig
ly
ce
ri
d
e
s;
H
D
L
-C
,
h
ig
h
d
e
n
si
ty
lip
o
p
ro
te
in
ch
o
le
st
e
ro
l;
L
D
L
-C
,
lo
w
d
e
n
si
ty
lip
o
p
ro
te
in
ch
o
le
st
e
ro
l;
A
P
O
A
1
,
ap
o
lip
o
p
ro
te
in
A
1
;
A
P
O
B
1
0
0
,
ap
o
lip
o
p
ro
te
in
B
1
0
0
;
L
p
(a
),
lip
o
p
ro
te
in
a;
H
C
Y,
h
o
m
o
cy
st
e
in
e;
T
B
IL
,
to
ta
l
b
ili
ru
b
in
;
D
B
IL
,
d
ir
e
ct
b
ili
ru
b
in
;
IB
IL
,
in
d
ir
e
ct
b
ili
ru
b
in
;
c-
G
T
,
c-
gl
u
ta
m
yl
tr
an
sf
e
ra
se
.
Xu et al. 3661
of CHD in the logistic regression modeling,
and the level of c-GT shows a positive
correlation with CHD; (3) the Exp(B)
value of APOB100 is 35.959, which is
abnormally high in context of the modeling.
According to the statistical guidelines, the
application of binary logistic regression
should meet with the following conditions:
a, the dependent variable should be a
binary variable; b, the correlation of depen-
dent variables and logit (P) is a linear rela-
tionship; c, residual approaches 0 and is
subject to binomial distribution; d, the
source of binary logistic regression cannot
complete multicollinearity diagnosis, so
that the requirements of the observation
values should be mutually independent.1
In this study, “whether suffering from
CHD” is the typical binary classification
variable, meeting the requirement a; maxi-
mum likelihood method can examine coin-
cidence, meeting the requirements for b and
c; regarding the independence mentioned in
requirement d, there are no repeat individ-
uals or at least no genetically repeat indi-
viduals in the research population in this
study. The confounding factors that were
generated from the independent variables
impact the results to a great extent, such
that we have to rely on the relevant statis-
tical analysis and biochemical knowledge to
judge and screening.
In this study, the scope of the application
and the confounding factor control were
given full consideration when using the
binary logistic regression, specifically regard-
ing performance in: first, strict evaluation
was implemented regarding the data distri-
bution characteristics, and we conducted
normality tests: Kolmogorov-Smirnov test
and trend analysis of P-P (the results were
not shown for brevity), and part of the data
was transformed. For APOB100, TG, Lp
(a), HCY, TBIL, DBIL, IBIL, and c-GT,
a natural log transformation was used to
ensure that these variables could be incorpo-
rated into subsequent linear analysis.
Second, the internal correlation of all varia-
bles was analyzed by Pearson correlation
matrix analysis, and the regression index
was screened according to the correlation
or independence of the variables. The results
showed that UA, lnTG, lnLp (a), and
lnHCY were relatively independent and
were used in subsequent logistic analysis; in
the blood lipid spectrum, although lnTG has
a wide correlation with other indicators, the
correlation coefficient is< 0.5, so it was also
included in the subsequent logistic
analysis. HDL-C and LDL-C are common
clinical indexes that serve as common risk
factors for CHD, but their role in the pre-
diction of CHD is still in dispute,13,14 and
needs to be confirmed by research analyses,
Table 4. Binary logistic regression results with confounding factor correction
B SE Wald df P Exp(B)
95% CI
of Exp(B)
Step 7 Age 0.098 0.009 120.44 1 <0.001 1.103 1.084–1.122
UA �0.004 0.001 10.459 1 0.001 0.997 0.994–0.999
HDL-C �3.515 0.397 78.562 1 <0.001 0.030 0.014–0.065
ln TG �0.534 0.188 8.112 1 0.004 0.586 0.406–0.847
ln Lp(a) 0.252 0.101 6.261 1 0.012 1.286 1.056–1.567
ln HCY 2.821 0.291 94.217 1 <0.001 16.789 9.499–26.675
ln TBIL �0.619 0.258 5.77 1 0.016 0.538 0.325–0.892
Constant �5.907 1.314 20.201 1 <0.001 0.003
B, coefficient value; SE, standard error; df, degrees of freedom; 95% CI, 95% confidence interval; UA, uric acid; TG,
triglycerides; HDL-C, high density lipoprotein cholesterol; Lp(a), lipoprotein a; HCY, homocysteine; TBIL, total bilirubin.
3662 Journal of International Medical Research 46(9)
so they were also included in the subsequent
logistic analysis. Pearson correlation matrix
analysis showed that LDL-C and
lnAPOB100 have a strong correlation, and
that HDL-C and APOA1 have a strong cor-
relation; in order to ensure the reliability
of the results analysis regarding HDL-C
and LDL-C, APOA1 and lnAPOB100 were
not included in the regression indexes. As a
risk factor of CHD, TC was analyzed in the
subsequent logistic analysis. The traditional
four liver function indexes, lnTBIL, lnDBIL,
lnIBIL, and lnc-GT were analyzed: although
lnc-GT has strong independence, difference
analysis showed that there was no significant
difference between the case and control
groups; therefore, it was not included in
the subsequent analysis. Thus, only lnTBIL
was selected as the representative index of
the three bilirubin indicators in the following
analyses. Third, through the series of statis-
tical processing steps, the final modeling
variables were confirmed; these variables
respectively represent individual blood lipid
level and metabolic characteristics. Fourth,
in the processing of the binary logistic
regression, through the score test, �2 log
like-lihood value, model prediction accuracy
based on two classification logistic regres-
sion analysis process, ROC curves and a
series of statistical confounding factor con-
trol methods, we evaluated the reliability,
likelihood, and effectiveness of the binary
logistic regression analysis. Through all
these means of effective control, a CHD
risk assessment model with higher diagnostic
efficiency was finally obtained.
The study of correlation factors is the
premise of prevention and control of com-
plex diseases, such as CHD. However, the
use of related factors in CHD prevention
and control is not simple. Although the
various risk factors have been studied over
decades, there are few risk factors can be
directly applied in the prediction of CHD.
Age, smoking, hypertension, gender, and
other factors have been widely used in
prior prediction models for CHD (e.g.,
Framingham, FHS 1991, FHS 2008,
ASSIGN, QRISK2, SCORE, Reynolds,
and PROCAM). Other risk factors, such
as blood lipids, have generally not been
included in the above prediction models.
In addition to the blood lipid detection
methods and indicators of representative
underlying characteristics, the lack of rigor-
ous statistical analysis is one of the reasons
leading to instability.15 Therefore, in the
study of complex disease risk factors, it is
particularly important to grasp the statisti-
cal methods. This study, which includes
confounding factor control of the binary
logistic regression analysis as the break-
through point and compares regression
results under different analysis strategies,
preliminarily confirms the importance of
confounding factor control for the reliabil-
ity of logistic regression results. However,
due to the limitations of our non-
professionals statistical backgrounds, there
are many shortcomings in the application
and presentation of these statistical meth-
ods, which will be strengthened and
improved in the future studies.
Declaration of conflicting interest
The author(s) declared no potential conflicts of
interest with respect to the research, authorship,
and/or publication of this article.
Funding
The author(s) disclosed receipt of the following
financial support for the research, authorship,
and/or publication of this article: This research
was supported by the Regional Fund Project
[grant number: 81460326] of the National
Natural Science Foundation of China; Yunnan
Provincial Science and Technology Department
of Basic Research on the Application of Self-
Financing Projects [2013FZ257]; Joint Special
Funds [2013FZ283] from the Yunnan Province
Science and Technology Department and
Department of Applied Basic Research of
Xu et al. 3663
Kunming Medical University; the Scientific
Research Fund of Yunnan Provincial
Education Department [2011C083]; and the
Training Special Funds of High Level Health
and Family Planning Technical Personnel in
Yunnan Province [D-201644]. The funders had
no role in study design; in the collection, analy-
sis, and interpretation of data; in writing the
report; or in the decision to submit the article
for publication.
References
1. Hodeghatta UR and Nayak U. Logistic
Regression. In: Business Analytics Using R
- A Practical Approach. Berkeley, CA:
Apress, 2017, pp.233–255.
2. Zhang WT and Kuang CW. SPSS statistical
analysis-based tutorial. 2nd ed. Beijing:
Higher education press, 2011.
3. Muche R. [Logistic regression: a useful tool
in rehabilitation research]. Rehabilitation
(Stuttg) 2008; 47: 56–62. [German]
4. Wijnands JM, Boonen A, Dagnelie PC, et al.
The cross-sectional association between uric
acid and atherosclerosis and the role of low-
grade inflammation: the CODAM study.
Rheumatology (Oxford) 2014; 53: 2053–2062.
5. Reschke LD, Miller ER, Fadrowski JJ, et al.
Elevated uric acid and obesity-related car-
diovascular disease risk factors among
hypertensive youth. Pediatr Nephrol 2015;
30: 2169–2176.
6. Giallauria F, Predotti P, Casciello A, et al.
Serum uric acid is associated with
non-dipping circadian pattern in young
patients (30–40 years old) with newly diag-
nosed essential hypertension. Clin Exp
Hypertens 2016; 38: 1.
7. De LG, Venegoni L, Iorio S, et al.
Platelet distribution width and the extent
of coronary artery disease: results from a
large prospective study. Platelets 2010;
21: 508.
8. Wood D. Joint European Societies Task
Force. Established and emerging cardiovas-
cular risk factor. Am Heart J. 2001;
141: S49–S57.
9. Lin JP, O’Donnell CJ, Schwaiger JP, et al.
Association Between the UGT1A1*28
Allele, Bilirubin levels, and coronary heart
disease in the Framingham heart study.
Circulation 2006; 114: 1476–1481.
10. Grundy SM. Gamma-glutamyl transferase:
another biomarker for metabolic syndrome
and cardiovascular risk. Arterioscler Thromb
Vasc Biol 2007; 27: 4–7.
11. Seo Y and Aonuma K. Gamma-Glutamyl
transferase as a risk biomarker of cardiovas-
cular disease - does it have …