Assignment 2: Article Critique of Binary Logistic Regression

by

Write a 2- to 3-page critique of the research you found in the Walden Library that includes responses to the following prompts:

Why did the authors select binary logistic regression in the research?
Do you think this test was the most appropriate choice? Why or why not?
Did the authors display the results in a figure or table?
Does the results table stand alone? In other words, are you able to interpret the study from it? Why or why not?

Never use plagiarized sources. Get Your Original Essay on
Assignment 2: Article Critique of Binary Logistic Regression
Hire Professionals Just from $11/Page
Order Now Click here

Clinical Research Report

Differential analysis of
disease risk assessment
using binary logistic
regression with
different analysis strategies

Wenbo Xu1,*, Yang Zhao2,*, Shiyan Nian3,
Lei Feng1, Xuejing Bai2, Xuan Luo2 and
Feng Luo2

Abstract

Objective: To investigate the importance of controlling confounding factors during binary logis-

tic regression analysis.

Methods: Male coronary heart disease (CHD) patients (n¼ 664) and healthy control subjects
(n¼ 400) were enrolled. Fourteen indexes were collected: age, uric acid, cholesterol, triglyceride,
high density lipoprotein cholesterol, low density lipoprotein cholesterol, apolipoprotein A1, apo-

lipoprotein B100, lipoprotein a, homocysteine, total bilirubin, direct bilirubin, indirect bilirubin,

and c-glutamyl transferase. Associations between these indexes and CHD were assessed by
logistic regression, and results were compared by using different analysis strategies.

Results: 1) Without controlling for confounding factors, 14 indexes were directly inputted in the

analysis process, and 11 indexes were finally retained. A model was obtained with conflicting

results. 2) According to the application conditions for logistic regression analysis, all 14 indexes

were weighed according to their variances and the results of correlation analysis. Seven indexes

were finally included in the model. The model was verified by receiver operating characteristic

curve, with an area under the curve of 0.927.

Conclusions: When binary logistic regression analysis is used to evaluate the complex relation-

ships between risk factors and CHD, strict control of confounding factors can improve the

reliability and validity of the analysis.

1Department of Laboratory, People’s Hospital of Yuxi

City, Yuxi, Yunnan, P.R. China
2Department of Laboratory, The Sixth Affiliated Hospital

of Kunming Medical University, Yuxi, Yunnan, P.R. China
3Intensive Care Unit, People’s Hospital of Yuxi City, Yuxi,

Yunnan, P.R. China

*These authors contributed equally to this study.

Corresponding author:

Lei Feng, Department of Laboratory, People’s Hospital of

Yuxi City, 21 Nieer Road, Yuxi, Yunnan 653100, P.R. China.

Email: [email protected]

Journal of International Medical Research

2018, Vol. 46(9) 3656–3664

! The Author(s) 2018
Article reuse guidelines:

sagepub.com/journals-permissions

DOI: 10.1177/0300060518777173

journals.sagepub.com/home/imr

Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative

Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which

permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is

attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).

mailto:[email protected]

http://uk.sagepub.com/en-gb/journals-permissions

http://dx.doi.org/10.1177/0300060518777173

journals.sagepub.com/home/imr

http://crossmark.crossref.org/dialog/?doi=10.1177%2F0300060518777173&domain=pdf&date_stamp=2018-06-08

Keywords

Binary logistic regression, confounding factor control, coronary heart disease, analysis strategies,

statistical methods, uric acid, cholesterol, triglycerides, lipoprotein, bilirubin

Date received: 9 February 2018; accepted: 23 April 2018

Introduction

Logistic regression has become a relatively
commonly used statistical method for stud-
ies involving risk assessment of complex
diseases. The most common and mature
method is binary logistic regression analy-
sis.1 A large number of studies and appli-
cations have shown that the logistic
regression model can meet the require-
ments of classification data modeling, and
it has become the standard method for
modeling categorical dependent variables.2

However, the application of the binary
logistic regression model has strict require-
ments. If the conditions in which the
model is used are inappropriate or inap-
propriately controlled, this may lead to
unexplained results and erroneous conclu-
sions.3 In this study, we compare two
logistic regression results obtained by dif-
ferent analysis strategies through the use of
a practical case, and then discuss the
necessity and importance of confounding
factor control in binary logistic regression
analysis. In this case, we studied the corre-
lations between 14 physiological and bio-
chemical indexes and the risk of coronary
heart disease (CHD); we attempted to
establish a CHD risk assessment model
through binary logistic regression analysis.
Although there were gender differences in
CHD risk factors, the results were similar
between genders. For brevity, we have lim-
ited this report to the results from analyses
involving male patients.

Materials and methods

Participants

This study was approved by the ethics

review committee of Yuxi People’s

Hospital of Yunnan Province. All subjects

in this study were recruited from People’s

Hospital of Yuxi City, and provided written

informed consent to participate. Inclusion

criteria for CHD patients were: 1.

Coronary angiography clearly showed at

least one instance of vascular stenosis

>50%. 2. A history of coronary stent

implantation. Inclusion criteria for the con-

trol group were: 1. Coronary angiography

results were normal; 2. There were no symp-

toms of clinical ischemic chest pain (i.e.,

myocardial markers were normal and exer-

cise electrocardiogram was flat). Exclusion

criteria for CHD patients were any one of

the following: 1. Coronary angiography

showed stenosis of <50%. 2. Serious systemic disease (e.g., cancer, respiratory failure, liver or kidney dysfunction). 3. A history of immunosuppressive therapy, trauma, cancer chemotherapy, infection, radiotherapy, or a recent operation (within 2 months). 4. Incomplete medical record or partial data collection. Biochemical analysis The following serum biochemical indicators were examined in the laboratory: uric acid (UA), cholesterol (TC), triglyceride (TG), Xu et al. 3657 high density lipoprotein cholesterol (HDL- C), low density lipoprotein cholesterol (LDL-C), apolipoprotein A1 (ApoA1), apolipoprotein B100 (ApoB100), lipopro- tein a (Lp(a)), homocysteine (HCY), total bilirubin (TBIL), direct bilirubin (DBIL), indirect bilirubin (IBIL), and c-glutamyl transferase (c-GT). All testing was per- formed by using Switzerland-imported Roche Cobas C 701 automatic biochemical analyzer (Roche Diagnostics Co., Ltd., Shanghai, China) and supporting reagents. Statistical analysis IBM SPSS Statistics, version 20.0 (IBM Corp., Armonk, NY, USA) was used for statistical analysis. All variables were first evaluated for conformance to a normal dis- tribution, to check for uniformity in concen- tration and dispersion. Logarithmic transformation of variables with skewed dis- tributions ensured consistency in subsequent analyses. Mean and standard deviation were used to describe each variable; independent samples t-tests were used to compare means between case and control groups. The Pearson Correlation coefficient matrix was used to assess correlations between varia- bles. Conditional forward method binary logistic regression analysis was used to estab- lish a mathematical model of the relation- ship between the variables and CHD. The validity of the mathematical model was eval- uated through receiver operating character- istic curve (ROC curve) analysis and the area under the ROC curve (AUC) was measured. The significance level was set at a¼ 0.05 (Figure 1). Results Patients We analyzed a total of 664 male patients with CHD, who were initially diagnosed at the Internal Medicine-Cardiovascular Department of our hospital between October 2010 and March 2013. Patient ages were 27-87 years old, mean age 62.2 years. Additionally, we analyzed a total of 400 healthy males, ages 24-84 years old, mean age 45.5 years, during the same period (control group). Logistic regression without controls for confounding factors Without correcting for the distribution characteristics and internal correlation of the 14 independent variables, the condition- al forward method of binary logistic regres- sion of independent and dependent variables was performed directly; the results are shown in Table 1. After 11 step regres- sions, of all 14 variables, only TC, TBIL, and IBIL were excluded. Age, APOB100, Lp(a), and HCY were positively correlated with CHD, the OR value of APOB100 [Odds ratio equation for Exp (B)] was 35.959, and LDL-C showed a negative cor- relation with CHD (Exp(B) value is 0.396). These are all incomprehensible results. If the application conditions for binary logis- tic regression are adequately controlled, such a result will not appear, as shown in Figure 1. Receiver operating characteristic curve of the evaluation model. 3658 Journal of International Medical Research 46(9) the follow-up analysis. UA, TG, HDL-C, LDL-C, APOA1, and DBIL were negative- ly correlated with CHD. Logistic regression with controls for confounding factors After normality testing (K-S test) for the 14 indexes, we found that UA, TC, HDL-C, LDL-C, and APOA1 conform to the normal distribution, so the actual values were used in subsequent analyses. In contrast, APOB100, TG, Lp (a), HCY, TBIL, DBIL, IBIL, and c-GT exhibited non-normal distri- butions, so these indexes were converted by logarithm transformation. Notably, age does not conform to the normal distribution; how- ever, to understand the relationship between age and CHD, the actual value of age was used in this analysis. Difference analysis of the 14 indexes showed no significant differ- ences between the case group and the control group in terms of lnc-GT; for all other vari- ables, the differences were significant. Detailed data are shown in Table 2. Correlation analysis To clearly assess the internal correlation of each variable, and to analyze the influences of confounding factors, Pearson correlation analysis was used to quantify correlations among variables; these results are shown in Table 3. There was a strong correlation between age and lnHCY (0.413); UA showed a relatively strong independence; TC exhibited strong correlations with LDL- C (0.748) and lnAPOB100 (0.645); a strong correlation was observed between HDL-C and APOA1 (0.651); there was a strong cor- relation between LDL-C and lnAPOB100 (0.569); there is a strong correlation between lnTG and lnc-GT (0.351); lnLp (a) is rela- tively independent, and the correlation coef- ficient with respect to every other variable is lower; lnTBIL and lnDBIL (0.633) exhibit a strong correlation and especially with lnIBIL (0.930); lnc-GT has strong independence. Characterization of the final model According to the difference and correlation analysis results, and combined with the Table 1. Binary logistic regression results without confounding factor correction B SE Wald df P Exp(B) 95% CI of Exp(B) Step 11 Age 0.106 0.010 112.713 1 <0.001 1.112 1.090–1.134 UA �0.004 0.001 12.842 1 <0.001 0.996 0.993–0.998 TG �0.538 0.081 43.631 1 <0.001 0.584 0.498–0.685 HDL-C �2.846 0.511 31.006 1 <0.001 0.058 0.021–0.158 LDL-C �0.925 0.152 37.011 1 <0.001 0.396 0.294–0.534 APOA1 �2.503 0.571 19.195 1 <0.001 0.082 0.027–0.251 APOB100 3.582 0.690 26.924 1 <0.001 35.959 9.293–139.149 Lp(a) 0.001 0.001 5.726 1 0.017 1.001 1.000–1.002 HCY 0.198 0.024 69.720 1 <0.001 1.219 1.164–1.277 DBIL �0.197 0.065 9.339 1 0.002 0.821 0.723–0.932 c-GT 0.010 0.002 16.406 1 <0.001 1.010 1.005–1.014 Constant 1.053 1.226 0.738 1 0.390 2.866 B, coefficient value; SE, standard error; df, degrees of freedom; 95% CI, 95% confidence interval; UA, uric acid; TG, triglycerides; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; APOA1, apoli- poprotein A1; APOB100, apolipoprotein B100; Lp(a), lipoprotein a; HCY, homocysteine; DBIL, direct bilirubin; c-GT, c-glutamyl transferase. Xu et al. 3659 appropriate application conditions for binary logistic regression, age, UA, TC, lnTG, lnLp (a), lnHCY, HDL-C, LDL-C and lnTBIL (nine indicators) were finally added to the binary logistic regression analysis. After seven step regressions, age, UA, HDL-C, the lnTG, lnLp (a), lnHCY, and lnTBIL were fitted into the regression model; detailed results are shown in Table 4. The model fitting effect was tested by ROC and the result is shown in Figure 1. Age, UA, HDL, lnTG, lnLp (a), lnHCY, and lnTBIL jointly predicted the risk of CHD with an AUC of 0.927 (95% confidence interval: 0.911–0.942); thus, this evaluation model has relatively better sen- sitivity and specificity. Discussion The aim of this study was to explore the correlations among 14 physiological and biochemical indexes and CHD, in order to establish a disease risk assessment model, as this is a commonly used method for the study of risk factors of complex diseases. For methodology selection, “whether suf- fering from CHD” constitutes the typical binary classification variable, whereas the 14 indicators are all continuous variables. The most common and mature analysis strategy is binary logistic regression when studying the direction of weak intensity cor- relation between continuous variables and two classification variables. Each medical statistical method has its scope of applica- tion and its confounding factor control strategy; if these are ignored during the scope of its application, erroneous results may be obtained, such that the effectiveness of the research will be questioned.4–12 In this study, the initial binary logistic regression was performed without any con- founding factor control, and the results seemed to be satisfactory, but there were some results that were questionable. For instance, (1) “recognized” cardiovascular risk factor LDL-C is displayed as a protec- tive factor; (2) c-GT showed no significant difference between CHD group and control group in difference analysis, while c-GT has been proven as an independent risk factor Table 2. Data summary and t-test results Controls (n¼400) CHD (n¼664) t P Age (years) 45.46� 12.13 62.16� 11.04 �23.016 <0.001 UA (lmol/L) 404.92� 86.14 376.08� 95.79 5.069 <0.001 TC (mmol/L) 4.92� 0.88 4.57� 1.08 5.821 <0.001 ln TG (mmol/L) 0.69� 0.63 0.53� 0.56 4.385 <0.001 HDL-C (mmol/L) 1.34� 0.33 1.13� 0.26 10.979 <0.001 LDL-C (mmol/L) 3.00� 0.76 2.83� 0.96 3.216 0.001 APOA1 (g/L) 1.42� 0.27 1.20� 0.23 13.518 <0.001 ln APOB100 (g/L) �0.23� 0.20 �0.15� 0.23 �5.862 <0.001 ln Lp(a) (mg/L) 4.74� 0.88 4.99� 1.01 �4.241 <0.001 ln HCY (lmol/L) 2.39� 0.45 2.90� 0.36 �19.373 <0.001 ln TBIL (lmol/L) 2.61� 0.35 2.47� 0.41 5.958 <0.001 ln DBIL (lmol/L) 1.26� 0.39 1.11� 0.54 4.909 <0.001 ln IBIL (lmol/L) 2.31� 0.37 2.12� 0.52 7.066 <0.001 ln c-GT (IU/L) 3.65� 0.77 3.69� 0.69 �0.859 0.391 Data are presented as mean� standard deviation. UA, uric acid; TC, total cholesterol; TG, triglycerides; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; APOA1, apolipoprotein A1; APOB100, apolipoprotein B100; Lp(a), lipoprotein a; HCY, homo- cysteine; TBIL, total bilirubin; DBIL, direct bilirubin; IBIL, indirect bilirubin; c-GT, c-glutamyl transferase. 3660 Journal of International Medical Research 46(9) T a b le 3 . P e ar so n co rr e la ti o n an al ys is re su lt s A ge U A T C H D L -C L D L -C A P O A 1 ln T G ln A P O B 1 0 0 ln L p (a ) ln H C Y ln T B IL ln D B IL ln IB IL ln c- G T A ge P C C 1 .0 0 0 �0 .1 6 4 ** �0 .1 5 3 ** �0 .0 9 7 ** �0 .0 7 1 * �0 .1 7 0 ** �0 .2 6 6 ** 0 .0 4 0 0 .1 4 4 ** 0 .4 1 3 ** �0 .1 1 7 ** �0 .0 2 2 �0 .1 5 8 ** �0 .1 5 5 ** U A P C C �0 .1 6 4 ** 1 .0 0 0 0 .0 9 2 ** �0 .0 2 3 0 .0 1 0 0 .0 1 0 0 .2 5 1 ** 0 .0 5 3 �0 .0 7 5 * 0 .0 5 0 �0 .0 5 1 �0 .0 7 7 * �0 .0 3 3 0 .1 4 1 ** T C P C C �0 .1 5 3 ** 0 .0 9 2 ** 1 .0 0 0 0 .2 8 2 ** 0 .7 4 8 ** 0 .0 2 0 0 .3 2 4 ** 0 .6 4 5 ** 0 .0 8 4 ** �0 .0 8 2 ** 0 .0 0 2 �0 .1 6 3 ** 0 .0 6 7 * 0 .1 7 7 ** H D L -C P C C �0 .0 9 7 ** �0 .0 2 3 0 .2 8 2 ** 1 .0 0 0 0 .1 3 1 ** 0 .6 5 1 ** �0 .3 2 2 ** �0 .0 6 9 * 0 .1 0 7 ** �0 .1 9 6 ** 0 .1 9 8 ** 0 .1 4 5 ** 0 .1 9 1 ** �0 .0 2 5 L D L -C P C C �0 .0 7 1 * 0 .0 1 0 0 .7 4 8 ** 0 .1 3 1 ** 1 .0 0 0 �0 .1 2 9 ** 0 .0 4 2 0 .5 6 9 ** 0 .0 9 8 ** �0 .0 2 1 0 .0 3 8 �0 .1 1 4 ** 0 .1 0 5 ** 0 .0 2 8 A P O A 1 P C C �0 .1 7 0 ** 0 .0 1 0 0 .0 2 0 0 .6 5 1 ** �0 .1 2 9 ** 1 .0 0 0 �0 .1 3 5 ** �0 .2 8 8 ** 0 .0 2 7 �0 .2 4 6 ** 0 .1 7 3 ** 0 .1 2 3 ** 0 .1 7 6 ** �0 .0 3 6 ln T G P C C �0 .2 6 6 ** 0 .2 5 1 ** 0 .3 2 4 ** �0 .3 2 2 ** 0 .0 4 2 �0 .1 3 5 ** 1 .0 0 0 0 .2 6 9 ** �0 .1 7 2 ** �0 .1 0 9 ** �0 .1 2 2 ** �0 .2 9 4 ** �0 .0 2 8 0 .3 5 1 ** ln A P O B 1 0 0 P C C 0 .0 4 0 0 .0 5 3 0 .6 4 5 ** �0 .0 6 9 * 0 .5 6 9 ** �0 .2 8 8 ** 0 .2 6 9 ** 1 .0 0 0 0 .1 2 9 ** 0 .1 1 2 ** �0 .0 6 1 * �0 .2 3 0 ** 0 .0 0 8 0 .1 6 5 ** ln L p (a ) P C C 0 .1 4 4 ** �0 .0 7 5 * 0 .0 8 4 ** 0 .1 0 7 ** 0 .0 9 8 ** 0 .0 2 7 �0 .1 7 2 ** 0 .1 2 9 ** 1 .0 0 0 0 .0 5 5 0 .0 1 3 �0 .0 0 9 0 .0 0 1 �0 .0 7 9 * ln H C Y P C C 0 .4 1 3 ** 0 .0 5 0 �0 .0 8 2 ** �0 .1 9 6 ** �0 .0 2 1 �0 .2 4 6 ** �0 .1 0 9 ** 0 .1 1 2 ** 0 .0 5 5 1 .0 0 0 �0 .1 1 2 ** �0 .0 7 3 * �0 .1 2 9 ** �0 .0 0 3 ln T B IL P C C �0 .1 1 7 ** �0 .0 5 1 0 .0 0 2 0 .1 9 8 ** 0 .0 3 8 0 .1 7 3 ** �0 .1 2 2 ** �0 .0 6 1 * 0 .0 1 3 �0 .1 1 2 ** 1 .0 0 0 0 .6 3 3 ** 0 .9 3 0 * * �0 .0 1 0 ln D B IL P C C �0 .0 2 2 �0 .0 7 7 * �0 .1 6 3 ** 0 .1 4 5 ** �0 .1 1 4 ** 0 .1 2 3 ** �0 .2 9 4 ** �0 .2 3 0 ** �0 .0 0 9 �0 .0 7 3 * 0 .6 3 3 ** 1 .0 0 0 0 .3 4 2 ** �0 .0 0 3 ln IB IL P C C �0 .1 5 8 ** �0 .0 3 3 0 .0 6 7 * 0 .1 9 1 ** 0 .1 0 5 ** 0 .1 7 6 ** �0 .0 2 8 0 .0 0 8 0 .0 0 1 �0 .1 2 9 ** 0 .9 3 0 ** 0 .3 4 2 ** 1 .0 0 0 �0 .0 1 5 ln c- G T P C C �0 .1 5 5 ** 0 .1 4 1 ** 0 .1 7 7 ** �0 .0 2 5 0 .0 2 8 �0 .0 3 6 0 .3 5 1 ** 0 .1 6 5 ** �0 .0 7 9 * �0 .0 0 3 �0 .0 1 0 �0 .0 0 3 �0 .0 1 5 1 .0 0 0 N o te : ** C o rr el at io n is si gn ifi ca n t at th e 0 .0 1 le ve l, *C o rr e la ti o n is si gn ifi ca n t at th e 0 .0 5 le ve l. P C C : P e ar so n C o rr e la ti o n co e ff ic ie n t, P C C >
0
.5

ar
e
m
ar
ke
d
in

b
o
ld
.

U
A
,
u
ri
c
ac
id
;
T
C
,
to
ta
l
ch
o
le
st
e
ro
l;
T
G
,
tr
ig
ly
ce
ri
d
e
s;
H
D
L
-C

,
h
ig
h
d
e
n
si
ty

lip
o
p
ro
te
in

ch
o
le
st
e
ro
l;
L
D
L
-C

,
lo
w

d
e
n
si
ty

lip
o
p
ro
te
in

ch
o
le
st
e
ro
l;
A
P
O
A
1
,
ap
o
lip
o
p
ro
te
in

A
1
;

A
P
O
B
1
0
0
,
ap
o
lip
o
p
ro
te
in

B
1
0
0
;
L
p
(a
),
lip
o
p
ro
te
in

a;
H
C
Y,

h
o
m
o
cy
st
e
in
e;

T
B
IL
,
to
ta
l
b
ili
ru
b
in
;
D
B
IL
,
d
ir
e
ct

b
ili
ru
b
in
;
IB
IL
,
in
d
ir
e
ct

b
ili
ru
b
in
;
c-
G
T
,
c-
gl
u
ta
m
yl
tr
an
sf
e
ra
se
.

Xu et al. 3661

of CHD in the logistic regression modeling,
and the level of c-GT shows a positive
correlation with CHD; (3) the Exp(B)
value of APOB100 is 35.959, which is
abnormally high in context of the modeling.
According to the statistical guidelines, the
application of binary logistic regression
should meet with the following conditions:
a, the dependent variable should be a
binary variable; b, the correlation of depen-
dent variables and logit (P) is a linear rela-
tionship; c, residual approaches 0 and is
subject to binomial distribution; d, the
source of binary logistic regression cannot
complete multicollinearity diagnosis, so
that the requirements of the observation
values should be mutually independent.1

In this study, “whether suffering from
CHD” is the typical binary classification
variable, meeting the requirement a; maxi-
mum likelihood method can examine coin-
cidence, meeting the requirements for b and
c; regarding the independence mentioned in
requirement d, there are no repeat individ-
uals or at least no genetically repeat indi-
viduals in the research population in this
study. The confounding factors that were
generated from the independent variables
impact the results to a great extent, such
that we have to rely on the relevant statis-
tical analysis and biochemical knowledge to
judge and screening.

In this study, the scope of the application
and the confounding factor control were
given full consideration when using the
binary logistic regression, specifically regard-
ing performance in: first, strict evaluation
was implemented regarding the data distri-
bution characteristics, and we conducted
normality tests: Kolmogorov-Smirnov test
and trend analysis of P-P (the results were
not shown for brevity), and part of the data
was transformed. For APOB100, TG, Lp
(a), HCY, TBIL, DBIL, IBIL, and c-GT,
a natural log transformation was used to
ensure that these variables could be incorpo-
rated into subsequent linear analysis.
Second, the internal correlation of all varia-
bles was analyzed by Pearson correlation
matrix analysis, and the regression index
was screened according to the correlation
or independence of the variables. The results
showed that UA, lnTG, lnLp (a), and
lnHCY were relatively independent and
were used in subsequent logistic analysis; in
the blood lipid spectrum, although lnTG has
a wide correlation with other indicators, the
correlation coefficient is< 0.5, so it was also included in the subsequent logistic analysis. HDL-C and LDL-C are common clinical indexes that serve as common risk factors for CHD, but their role in the pre- diction of CHD is still in dispute,13,14 and needs to be confirmed by research analyses, Table 4. Binary logistic regression results with confounding factor correction B SE Wald df P Exp(B) 95% CI of Exp(B) Step 7 Age 0.098 0.009 120.44 1 <0.001 1.103 1.084–1.122 UA �0.004 0.001 10.459 1 0.001 0.997 0.994–0.999 HDL-C �3.515 0.397 78.562 1 <0.001 0.030 0.014–0.065 ln TG �0.534 0.188 8.112 1 0.004 0.586 0.406–0.847 ln Lp(a) 0.252 0.101 6.261 1 0.012 1.286 1.056–1.567 ln HCY 2.821 0.291 94.217 1 <0.001 16.789 9.499–26.675 ln TBIL �0.619 0.258 5.77 1 0.016 0.538 0.325–0.892 Constant �5.907 1.314 20.201 1 <0.001 0.003 B, coefficient value; SE, standard error; df, degrees of freedom; 95% CI, 95% confidence interval; UA, uric acid; TG, triglycerides; HDL-C, high density lipoprotein cholesterol; Lp(a), lipoprotein a; HCY, homocysteine; TBIL, total bilirubin. 3662 Journal of International Medical Research 46(9) so they were also included in the subsequent logistic analysis. Pearson correlation matrix analysis showed that LDL-C and lnAPOB100 have a strong correlation, and that HDL-C and APOA1 have a strong cor- relation; in order to ensure the reliability of the results analysis regarding HDL-C and LDL-C, APOA1 and lnAPOB100 were not included in the regression indexes. As a risk factor of CHD, TC was analyzed in the subsequent logistic analysis. The traditional four liver function indexes, lnTBIL, lnDBIL, lnIBIL, and lnc-GT were analyzed: although lnc-GT has strong independence, difference analysis showed that there was no significant difference between the case and control groups; therefore, it was not included in the subsequent analysis. Thus, only lnTBIL was selected as the representative index of the three bilirubin indicators in the following analyses. Third, through the series of statis- tical processing steps, the final modeling variables were confirmed; these variables respectively represent individual blood lipid level and metabolic characteristics. Fourth, in the processing of the binary logistic regression, through the score test, �2 log like-lihood value, model prediction accuracy based on two classification logistic regres- sion analysis process, ROC curves and a series of statistical confounding factor con- trol methods, we evaluated the reliability, likelihood, and effectiveness of the binary logistic regression analysis. Through all these means of effective control, a CHD risk assessment model with higher diagnostic efficiency was finally obtained. The study of correlation factors is the premise of prevention and control of com- plex diseases, such as CHD. However, the use of related factors in CHD prevention and control is not simple. Although the various risk factors have been studied over decades, there are few risk factors can be directly applied in the prediction of CHD. Age, smoking, hypertension, gender, and other factors have been widely used in prior prediction models for CHD (e.g., Framingham, FHS 1991, FHS 2008, ASSIGN, QRISK2, SCORE, Reynolds, and PROCAM). Other risk factors, such as blood lipids, have generally not been included in the above prediction models. In addition to the blood lipid detection methods and indicators of representative underlying characteristics, the lack of rigor- ous statistical analysis is one of the reasons leading to instability.15 Therefore, in the study of complex disease risk factors, it is particularly important to grasp the statisti- cal methods. This study, which includes confounding factor control of the binary logistic regression analysis as the break- through point and compares regression results under different analysis strategies, preliminarily confirms the importance of confounding factor control for the reliabil- ity of logistic regression results. However, due to the limitations of our non- professionals statistical backgrounds, there are many shortcomings in the application and presentation of these statistical meth- ods, which will be strengthened and improved in the future studies. Declaration of conflicting interest The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Regional Fund Project [grant number: 81460326] of the National Natural Science Foundation of China; Yunnan Provincial Science and Technology Department of Basic Research on the Application of Self- Financing Projects [2013FZ257]; Joint Special Funds [2013FZ283] from the Yunnan Province Science and Technology Department and Department of Applied Basic Research of Xu et al. 3663 Kunming Medical University; the Scientific Research Fund of Yunnan Provincial Education Department [2011C083]; and the Training Special Funds of High Level Health and Family Planning Technical Personnel in Yunnan Province [D-201644]. The funders had no role in study design; in the collection, analy- sis, and interpretation of data; in writing the report; or in the decision to submit the article for publication. References 1. Hodeghatta UR and Nayak U. Logistic Regression. In: Business Analytics Using R - A Practical Approach. Berkeley, CA: Apress, 2017, pp.233–255. 2. Zhang WT and Kuang CW. SPSS statistical analysis-based tutorial. 2nd ed. Beijing: Higher education press, 2011. 3. Muche R. [Logistic regression: a useful tool in rehabilitation research]. Rehabilitation (Stuttg) 2008; 47: 56–62. [German] 4. Wijnands JM, Boonen A, Dagnelie PC, et al. The cross-sectional association between uric acid and atherosclerosis and the role of low- grade inflammation: the CODAM study. Rheumatology (Oxford) 2014; 53: 2053–2062. 5. Reschke LD, Miller ER, Fadrowski JJ, et al. Elevated uric acid and obesity-related car- diovascular disease risk factors among hypertensive youth. Pediatr Nephrol 2015; 30: 2169–2176. 6. Giallauria F, Predotti P, Casciello A, et al. Serum uric acid is associated with non-dipping circadian pattern in young patients (30–40 years old) with newly diag- nosed essential hypertension. Clin Exp Hypertens 2016; 38: 1. 7. De LG, Venegoni L, Iorio S, et al. Platelet distribution width and the extent of coronary artery disease: results from a large prospective study. Platelets 2010; 21: 508. 8. Wood D. Joint European Societies Task Force. Established and emerging cardiovas- cular risk factor. Am Heart J. 2001; 141: S49–S57. 9. Lin JP, O’Donnell CJ, Schwaiger JP, et al. Association Between the UGT1A1*28 Allele, Bilirubin levels, and coronary heart disease in the Framingham heart study. Circulation 2006; 114: 1476–1481. 10. Grundy SM. Gamma-glutamyl transferase: another biomarker for metabolic syndrome and cardiovascular risk. Arterioscler Thromb Vasc Biol 2007; 27: 4–7. 11. Seo Y and Aonuma K. Gamma-Glutamyl transferase as a risk biomarker of cardiovas- cular disease - does it have …