Models, Statistical - Top 30 Publications

Logistic Regression Diagnostics: Understanding How Well a Model Predicts Outcomes.

Factor structure of parent and teacher ratings of the ODD symptoms for Malaysian primary school children.

This present study used confirmatory factor analysis (CFA) to examine the applicability of one-, two- three- and second order Oppositional Defiant Disorder (ODD) factor models, proposed in previous studies, in a group of Malaysian primary school children. These models were primarily based on parent reports. In the current study, parent and teacher ratings of the ODD symptoms were obtained for 934 children. For both groups of respondents, the findings showing some support for all models examined, with most support for a second order model with Burke et al. (2010) three factors (oppositional, antagonistic, and negative affect) as the primary factors. The diagnostic implications of the findings are discussed.

Examining the dimensional structure models of secondary traumatic stress based on DSM-5 symptoms.

Latent factor structure of Secondary Traumatic Stress (STS) has been examined using Diagnostic Statistic Manual-IV (DSM-IV)'s Posttraumatic Stress Disorder (PTSD) nomenclature. With the advent of Diagnostic Statistic Manual-5 (DSM-5), there is an impending need to reexamine STS using DSM-5 symptoms in light of the most updated PTSD models in the literature. The study investigated and determined the best fitted PTSD models using DSM-5 PTSD criteria symptoms. Confirmatory factor analysis (CFA) was conducted to examine model fit using the Secondary Traumatic Stress Scale in 241 registered and practicing Filipino nurses (166 females and 75 males) who worked in the Philippines and gave direct nursing services to patients. Based on multiple fit indices, the results showed the 7-factor hybrid model, comprising of intrusion, avoidance, negative affect, anhedonia, externalizing behavior, anxious arousal, and dysphoric arousal factors has excellent fit to STS. This model asserts that: (1) hyperarousal criterion needs to be divided into anxious and dysphoric arousal factors; (2) symptoms characterizing negative and positive affect need to be separated to two separate factors, and; (3) a new factor would categorize externalized, self-initiated impulse and control-deficit behaviors. Comparison of nested and non-nested models showed Hybrid model to have superior fit over other models. The specificity of the symptom structure of STS based on DSM-5 PTSD criteria suggests having more specific interventions addressing the more elaborate symptom-groupings that would alleviate the condition of nurses exposed to STS on a daily basis.

Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.

With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

Review and modelling of malaria crude incidence rate in a low incidence population, Illinois 1990 to 2013.

The highest risk of imported malaria in Illinois is associated with travel to countries of origin by immigrants to visit family and friends. We used Join point regression to analyze Malaria crude incidence rate (mCIR) trend from 1990 through 2013. We found join point regression a useful way to summarize mCIR trends because it connected the linear line segments over a fixed time interval (annual) and allowed characterization of the trends using the Annual Percent Change.

An Overview of the Models in Reporting School Data on Dental Credentialing Examinations.

The development and dissemination of meaningful and useful performance reports associated with examinations involved in the licensure process are important to the communities of interest, including state boards, candidates, and professional schools. Discussions of performance reporting have been largely neglected however. The authors recognize and reinforce the need for such discussions by providing prototypes of performance reporting in dentistry with examples and recommendations to guide practice. For illustrative purposes, this article reviews and discusses the different reporting models used over the past ten years with Part I and Part II of the National Board Dental Examination (NBDE). These reporting models are distinguished by such features as the following: 1) scores in each discipline covered on the exam (four for Part I and nine for Part II) and an overall average are reported in a standard-score metric; 2) a single overall score in a standard-score metric is reported; and 3) performance on the exam is reported as pass/fail. Standard scores on the NBDE range from 49 to 99, with 75 being a passing score. Sample data, without identifying information, are used to illustrate the reporting models.

Direct diffusion tensor estimation using a model-based method with spatial and parametric constraints.

To develop a new model-based method with spatial and parametric constraints (MB-SPC) aimed at accelerating diffusion tensor imaging (DTI) by directly estimating the diffusion tensor from highly undersampled k-space data.

Meta-CART: A tool to identify interactions between moderators in meta-analysis.

In the framework of meta-analysis, moderator analysis is usually performed only univariately. When several study characteristics are available that may account for treatment effect, standard meta-regression has difficulties in identifying interactions between them. To overcome this problem, meta-CART has been proposed: an approach that applies classification and regression trees (CART) to identify interactions, and then subgroup meta-analysis to test the significance of moderator effects. The previous version of meta-CART has its shortcomings: when applying CART, the sample sizes of studies are not taken into account, and the effect sizes are dichotomized around the median value. Therefore, this article proposes new meta-CART extensions, weighting study effect sizes by their accuracy, and using a regression tree to avoid dichotomization. In addition, new pruning rules are proposed. The performance of all versions of meta-CART was evaluated via a Monte Carlo simulation study. The simulation results revealed that meta-regression trees with random-effects weights and a 0.5-standard-error pruning rule perform best. The required sample size for meta-CART to achieve satisfactory performance depends on the number of study characteristics, the magnitude of the interactions, and the residual heterogeneity.

Gaussian model-based partitioning using iterated local search.

The emergence of Gaussian model-based partitioning as a viable alternative to K-means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model-based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple-restart (multistart) approach commonly used for K-means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart-ILS procedures for minimizing a very general model-based criterion that assumes no restrictions on cluster size or within-group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10-min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained.

Response style analysis with threshold and multi-process IRT models: A review and tutorial.

Two different item response theory model frameworks have been proposed for the assessment and control of response styles in rating data. According to one framework, response styles can be assessed by analysing threshold parameters in Rasch models for ordinal data and in mixture-distribution extensions of such models. A different framework is provided by multi-process item response tree models, which can be used to disentangle response processes that are related to the substantive traits and response tendencies elicited by the response scale. In this tutorial, the two approaches are reviewed, illustrated with an empirical data set of the two-dimensional 'Personal Need for Structure' construct, and compared in terms of multiple criteria. Mplus is used as a software framework for (mixed) polytomous Rasch models and item response tree models as well as for demonstrating how parsimonious model variants can be specified to test assumptions on the structure of response styles and attitude strength. Although both frameworks are shown to account for response styles, they differ on the quantitative criteria of model selection, practical aspects of model estimation, and conceptual issues of representing response styles as continuous and multidimensional sources of individual differences in psychological assessment.

Correlation-based linear discriminant classification for gene expression data.

Microarray gene expression technology provides a systematic approach to patient classification. However, microarray data pose a great computational challenge owing to their large dimensionality, small sample sizes, and potential correlations among genes. A recent study has shown that gene-gene correlations have a positive effect on the accuracy of classification models, in contrast to some previous results. In this study, a recently developed correlation-based classifier, the ensemble of random subspace (RS) Fisher linear discriminants (FLDs), was utilized. The impact of gene-gene correlations on the performance of this classifier and other classifiers was studied using simulated datasets and real datasets. A cross-validation framework was used to evaluate the performance of each classifier using the simulated datasets or real datasets, and misclassification rates (MRs) were computed. Using the simulated data, the average MRs of the correlation-based classifiers decreased as the correlations increased when there were more correlated genes. Using real data, the correlation-based classifiers outperformed the non-correlation-based classifiers, especially when the gene-gene correlations were high. The ensemble RS-FLD classifier is a potential state-of-the-art computational method. The correlation-based ensemble RS-FLD classifier was effective and benefited from gene-gene correlations, particularly when the correlations were high.

Innovative software for recording preanalytical errors in accord with the IFCC quality indicators.

Analysis of volumetric response of pituitary adenomas receiving adjuvant CyberKnife stereotactic radiosurgery with the application of an exponential fitting model.

Tumor control rates of pituitary adenomas (PAs) receiving adjuvant CyberKnife stereotactic radiosurgery (CK SRS) are high. However, there is currently no uniform way to estimate the time course of the disease. The aim of this study was to analyze the volumetric responses of PAs after CK SRS and investigate the application of an exponential decay model in calculating an accurate time course and estimation of the eventual outcome.A retrospective review of 34 patients with PAs who received adjuvant CK SRS between 2006 and 2013 was performed. Tumor volume was calculated using the planimetric method. The percent change in tumor volume and tumor volume rate of change were compared at median 4-, 10-, 20-, and 36-month intervals. Tumor responses were classified as: progression for >15% volume increase, regression for ≤15% decrease, and stabilization for ±15% of the baseline volume at the time of last follow-up. For each patient, the volumetric change versus time was fitted with an exponential model.The overall tumor control rate was 94.1% in the 36-month (range 18-87 months) follow-up period (mean volume change of -43.3%). Volume regression (mean decrease of -50.5%) was demonstrated in 27 (79%) patients, tumor stabilization (mean change of -3.7%) in 5 (15%) patients, and tumor progression (mean increase of 28.1%) in 2 (6%) patients (P = 0.001). Tumors that eventually regressed or stabilized had a temporary volume increase of 1.07% and 41.5% at 4 months after CK SRS, respectively (P = 0.017). The tumor volume estimated using the exponential fitting equation demonstrated high positive correlation with the actual volume calculated by magnetic resonance imaging (MRI) as tested by Pearson correlation coefficient (0.9).Transient progression of PAs post-CK SRS was seen in 62.5% of the patients receiving CK SRS, and it was not predictive of eventual volume regression or progression. A three-point exponential model is of potential predictive value according to relative distribution. An exponential decay model can be used to calculate the time course of tumors that are ultimately controlled.

Bayesian analysis of longitudinal multitrait-multimethod data with ordinal response variables.

A new multilevel latent state graded response model for longitudinal multitrait-multimethod (MTMM) measurement designs combining structurally different and interchangeable methods is proposed. The model allows researchers to examine construct validity over time and to study the change and stability of constructs and method effects based on ordinal response variables. We show how Bayesian estimation techniques can address a number of important issues that typically arise in longitudinal multilevel MTMM studies and facilitates the estimation of the model presented. Estimation accuracy and the impact of between- and within-level sample sizes as well as different prior specifications on parameter recovery were investigated in a Monte Carlo simulation study. Findings indicate that the parameters of the model presented can be accurately estimated with Bayesian estimation methods in the case of low convergent validity with as few as 250 clusters and more than two observations within each cluster. The model was applied to well-being data from a longitudinal MTMM study, assessing the change and stability of life satisfaction and subjective happiness in young adults after high-school graduation. Guidelines for empirical applications are provided and advantages and limitations of a Bayesian approach to estimating longitudinal multilevel MTMM models are discussed.

National and Subnational Patterns of Cause of Death in Iran 1990-2015: Applied Methods.

Causes of death statistics provide crucial health intelligence in national and international communities. An efficient death registration system provides reliable information for health policy system. In many developing countries, death registration systems face a degree of misclassification and incompleteness. There are many impediments to putting an estimate of cause-specific death rates. Addressing those challenges could prevent misleading results.

CORAL: Binary classifications (active/inactive) for drug-induced liver injury.

The data on human hepatotoxcity (drug-induced liver injury) is extremely important information from point of view of drug discovery. Experimental clinical data on this endpoint is scarce. Experimental way to extend databases on this endpoint is extremely difficult. Quantitative structure - activity relationships (QSAR) is attractive alternative of the experimental approach.

Application of a statistical model for the assessment of environmental quality in neotropical semi-arid reservoirs.

The aim of this study was to develop a statistical model to assess the environmental quality of reservoirs located in semi-arid region using metrics of anthropogenic disturbance, water quality variables, and benthic macroinvertebrate communities as indicators. The proposed model was applied to 60 sites located in three reservoirs in the Paraíba river basin, Brazilian semi-arid region. Collections were made in December 2011. In each site, we collected one sample of benthic macroinvertebrates and one water sample for the determination of physical and chemical parameters. Characterization of the landscape was made through application of 10 physical habitat protocols on each site for the collected information on disturbance and subsequent calculation of disturbance metrics. The results showed the formation of two groups: group 1, consisting of 16 minimally altered sites, and group 2, with 44 severely altered sites. The proposed statistical model was sensitive enough to detect changes. In the minimally altered group, the Chironomids Aedokritus and Fissimentum were dominant, indicating a higher environmental quality, while Coelotanypus and Chironomus were abundant in severely altered sites with lower environmental quality. The conservation and management of reservoirs in semi-arid regions should be intensified in view of the need to maintain the environmental quality of these ecosystems.

Derivation of economic values for production traits in aquaculture species.

In breeding programs for aquaculture species, breeding goal traits are often weighted based on the desired gains but economic gain would be higher if economic values were used instead. The objectives of this study were: (1) to develop a bio-economic model to derive economic values for aquaculture species, (2) to apply the model to determine the economic importance and economic values of traits in a case-study on gilthead seabream, and (3) to validate the model by comparison with a profit equation for a simplified production system.

Multiple Linear Regressions by Maximizing the Likelihood under Assumption of Generalized Gauss-Laplace Distribution of the Error.

Multiple linear regression analysis is widely used to link an outcome with predictors for better understanding of the behaviour of the outcome of interest. Usually, under the assumption that the errors follow a normal distribution, the coefficients of the model are estimated by minimizing the sum of squared deviations. A new approach based on maximum likelihood estimation is proposed for finding the coefficients on linear models with two predictors without any constrictive assumptions on the distribution of the errors. The algorithm was developed, implemented, and tested as proof-of-concept using fourteen sets of compounds by investigating the link between activity/property (as outcome) and structural feature information incorporated by molecular descriptors (as predictors). The results on real data demonstrated that in all investigated cases the power of the error is significantly different by the convenient value of two when the Gauss-Laplace distribution was used to relax the constrictive assumption of the normal distribution of the error. Therefore, the Gauss-Laplace distribution of the error could not be rejected while the hypothesis that the power of the error from Gauss-Laplace distribution is normal distributed also failed to be rejected.

Granulocyte colony-stimulating factors in the prevention of febrile neutropenia: review of cost-effectiveness models.

We reviewed the evolution of the methods used in cost-effectiveness analyses of granulocyte colony-stimulating factors (G-CSFs) in the primary and secondary prevention of febrile neutropenia (FN) in patients receiving myelosuppressive cancer chemotherapy. Areas covered: FN is a side effect of myelosuppressive chemotherapy associated with significant morbidity, mortality, and costs. The risk of FN may depend on the drugs used within a chemotherapy regimen, and an FN event may cause chemotherapy dose reductions or delays in subsequent cycles. Expert commentary: More recent pharmacoeconomic models have reflected these clinical observations by modeling sequential chemotherapy regimens to account for FN risk on a per-cycle basis, and by accounting for chemotherapy dose reductions and consequent survival losses.

Identification and Prioritization of the Economic Impacts of Vaccines.

Understanding the most important economic impacts of vaccines can provide relevant information to stakeholders when selecting vaccine immunization strategies from a broader perspective. This study was therefore designed to first identify economic impacts to vaccinated individuals and, second, assess the relative importance of these economic impacts. A four-step approach was used, including a review of the literature, a pilot study, and expert consultation. As a fourth step, a survey utilizing a best-worst scaling was conducted among 26 different stakeholders to assess the relative importance of the identified economic impacts. In each of the 15 choice tasks, participants were asked to choose the most important and the least important economic impact from a set of four from the master list. We identified 23 economic impacts relevant for vaccine introduction. Four domains were identified, namely, health related benefits to vaccinated individuals, short- and long-term productivity gains, community or health systems externalities, and broader economic indicators. The first domain was seen as especially important with mortality, health care expenditure, and morbidity ranking in the top three overall. In conclusion, our study suggests that domain A "health related benefits to vaccinated individuals" are valued as more important than the other economic impacts.

Volitional and Real-Time Control Cursor Based on Eye Movement Decoding Using a Linear Decoding Model.

The aim of this study is to build a linear decoding model that reveals the relationship between the movement information and the EOG (electrooculogram) data to online control a cursor continuously with blinks and eye pursuit movements. First of all, a blink detection method is proposed to reject a voluntary single eye blink or double-blink information from EOG. Then, a linear decoding model of time series is developed to predict the position of gaze, and the model parameters are calibrated by the RLS (Recursive Least Square) algorithm; besides, the assessment of decoding accuracy is assessed through cross-validation procedure. Additionally, the subsection processing, increment control, and online calibration are presented to realize the online control. Finally, the technology is applied to the volitional and online control of a cursor to hit the multiple predefined targets. Experimental results show that the blink detection algorithm performs well with the voluntary blink detection rate over 95%. Through combining the merits of blinks and smooth pursuit movements, the movement information of eyes can be decoded in good conformity with the average Pearson correlation coefficient which is up to 0.9592, and all signal-to-noise ratios are greater than 0. The novel system allows people to successfully and economically control a cursor online with a hit rate of 98%.

Distinguishing between expert and statistical systems for application under ICH M7.

Application of accelerated failure time models for breast cancer patients' survival in Kurdistan Province of Iran.

Breast cancer is the most common cancer and the second common cause of cancer-induced mortalities in Iranian women. There has been a rapid development in hazard models and survival analysis in the last decade.

Time-Series Modeling and Simulation for Comparative Cost-Effective Analysis in Cancer Chemotherapy: An Application to Platinum-Based Regimens for Advanced Non-small Cell Lung Cancer.

The purpose of this study was to propose a time-series modeling and simulation (M&S) strategy for probabilistic cost-effective analysis in cancer chemotherapy using a Monte-Carlo method based on data available from the literature. The simulation included the cost for chemotherapy, for pharmaceutical care for adverse events (AEs) and other medical costs. As an application example, we describe the analysis for the comparison of four regimens, cisplatin plus irinotecan, carboplatin plus paclitaxel, cisplatin plus gemcitabine (GP), and cisplatin plus vinorelbine, for advanced non-small cell lung cancer. The factors, drug efficacy explained by overall survival or time to treatment failure, frequency and severity of AEs, utility value of AEs to determine QOL, the drugs' and other medical costs in Japan, were included in the model. The simulation was performed and quality adjusted life years (QALY) and incremental cost-effectiveness ratios (ICER) were calculated. An index, percentage of superiority (%SUP) which is the rate of the increased cost vs. QALY-gained plots within the area of positive QALY-gained and also below some threshold values of the ICER, was calculated as functions of threshold values of the ICER. An M&S process was developed, and for the simulation example, the GP regimen was the most cost-effective, in case of threshold values of the ICER=$70000/year, the %SUP for the GP are more than 50%. We developed an M&S process for probabilistic cost-effective analysis, this method would be useful for decision-making in choosing a cancer chemotherapy regimen in terms of pharmacoeconomic.

Comparison of Multivariate Poisson lognormal spatial and temporal crash models to identify hot spots of intersections based on crash types.

Most of the studies are focused on the general crashes or total crash counts with considerably less research dedicated to different crash types. This study employs the Systemic approach for detection of hotspots and comprehensively cross-validates five multivariate models of crash type-based HSID methods which incorporate spatial and temporal random effects. It is anticipated that comparison of the crash estimation results of the five models would identify the impact of varied random effects on the HSID. The data over a ten year time period (2003-2012) were selected for analysis of a total 137 intersections in the City of Corona, California. The crash types collected in this study include: Rear-end, Head-on, Side-swipe, Broad-side, Hit object, and Others. Statistically significant correlations among crash outcomes for the heterogeneity error term were observed which clearly demonstrated their multivariate nature. Additionally, the spatial random effects revealed the correlations among neighboring intersections across crash types. Five cross-validation criteria which contains, Residual Sum of Squares, Kappa, Mean Absolute Deviation, Method Consistency Test, and Total Rank Difference, were applied to assess the performance of the five HSID methods at crash estimation. In terms of accumulated results which combined all crash types, the model with spatial random effects consistently outperformed the other competing models with a significant margin. However, the inclusion of spatial random effect in temporal models fell short of attaining the expected results. The overall observation from the model fitness and validation results failed to highlight any correlation among better model fitness and superior crash estimation.

Rasch Analysis of the Malaysian Secondary School Student Leadership Inventory (M3SLI).

The importance of instilling leadership skills in students has always been a main subject of discussion in Malaysia. Malaysian Secondary School Students Leadership Inventory (M3SLI) is an instrument which has been piloted tested in year 2013. The main purpose of this study is to examine and optimize the functioning of the rating scale categories in M3SLI by investigating the rating scale category counts, average and expected rating scale category measures, and steps calibrations. In detail, the study was aimed to (1) identify whether the five-point rating scale was functioning as intended and (2) review the effect of a rating scale category revision on the psychometric characteristics of M3SLI. The study was carried out on students aged between 13 to 18 years (2183 students) by stratified random sampling in 26 public schools in Sabah, Malaysia, with the results analysed using Winsteps. This study found that the rating scale of Personality and Values constructs needed to be modified while the scale for Leadership Skills was maintained. For future studies, other aspects of psychometric properties like differential item functioning (DIF) based on demographic variables such as gender, school locations and forms should be researched on prior to the use of the instrument.

The Measurement Properties of the Assessing Math Concepts' Assessments of Primary Students' Number Sense Skills.

The purpose of this study was to examine the measurement properties of the Assessing Math Concepts AMC Anywhere Hiding and Ten Frame Assessments, formative assessments of primary students' number sense skills. Each assessment has two parts, where Part 1 is intended to be foundational skills for part two. Part 1 includes manipulatives whereas Part 2 does not. Student data from 228 kindergarten through second grade teachers with a total of 3,666 students was analyzed using Rasch scaling. Data analyses indicated that when the two assessments were examined separately the intended order of item difficulty was clear. When the parts of both assessments were analyzed together, the items in Part 2 were not consistently more difficult that the items in Part 1. This suggests an alternative sequence of tasks in that students may progress from working with a specific number with manipulatives then without manipulatives rather than working with a variety of numbers with manipulatives before moving onto assessments without manipulatives.

The Self-assessment Practices of Hong Kong Secondary Students: Findings with a New Instrument.

Self-assessment is a core skill that enables students to engage in self-regulated learning. The purpose of this study was to examine the psychometric properties of a Self-assessment Practice Scale and to depict the characteristics of self-assessment practices of Hong Kong secondary students using this newly developed instrument. A total of 6,125 students from 10 Hong Kong secondary schools completed the survey. Both Rasch and factor analyses revealed a two-dimension scale structure (i.e., Self-directed Feedback Seeking and Self-reflection). The two subscales demonstrated acceptable psychometric properties and suggestions for further improvement were proposed. The findings regarding self-assessment practices of secondary students indicated that, in general, students were quite used to engaging in self-reflection based on available feedback, but they were less disposed to taking the initiative to seek feedback on their own performance. Key demographic variables, e.g., gender and year level, played important roles in students' self-assessment practices. Girls had significantly higher self-assessment measures on both scales than did boys. Junior students had higher measures on both scales than did their senior counterparts. Implications and directions for future research were discussed.

Differential Item Functioning (DIF) and Subsequent Bias in Group Comparisons using a Composite Measurement Scale: A Simulation Study.

To determine the conditions in which the estimation of a difference between groups for a construct evaluated using a composite measurement scale is biased if the presence of Differential Item Functioning (DIF) is not taken into account.