PubTransformer

A site to transform Pubmed publications into these bibliographic reference formats: ADS, BibTeX, EndNote, ISI used by the Web of Knowledge, RIS, MEDLINE, Microsoft's Word 2007 XML.

Models, Statistical - Top 30 Publications

Assessing the accuracy and stability of variable selection methods for random forest modeling in ecology.

Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.

Dynamic decomposition of spatiotemporal neural signals.

Neural signals are characterized by rich temporal and spatiotemporal dynamics that reflect the organization of cortical networks. Theoretical research has shown how neural networks can operate at different dynamic ranges that correspond to specific types of information processing. Here we present a data analysis framework that uses a linearized model of these dynamic states in order to decompose the measured neural signal into a series of components that capture both rhythmic and non-rhythmic neural activity. The method is based on stochastic differential equations and Gaussian process regression. Through computer simulations and analysis of magnetoencephalographic data, we demonstrate the efficacy of the method in identifying meaningful modulations of oscillatory signals corrupted by structured temporal and spatiotemporal noise. These results suggest that the method is particularly suitable for the analysis and interpretation of complex temporal and spatiotemporal neural signals.

ROTS: An R package for reproducibility-optimized statistical testing.

Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).

Quality of reporting of multivariable logistic regression models in Chinese clinical medical journals.

Multivariable logistic regression (MLR) has been increasingly used in Chinese clinical medical research during the past few years. However, few evaluations of the quality of the reporting strategies in these studies are available.To evaluate the reporting quality and model accuracy of MLR used in published work, and related advice for authors, readers, reviewers, and editors.A total of 316 articles published in 5 leading Chinese clinical medical journals with high impact factor from January 2010 to July 2015 were selected for evaluation. Articles were evaluated according 12 established criteria for proper use and reporting of MLR models.Among the articles, the highest quality score was 9, the lowest 1, and the median 5 (4-5). A total of 85.1% of the articles scored below 6. No significant differences were found among these journals with respect to quality score (χ = 6.706, P = .15). More than 50% of the articles met the following 5 criteria: complete identification of the statistical software application that was used (97.2%), calculation of the odds ratio and its confidence interval (86.4%), description of sufficient events (>10) per variable, selection of variables, and fitting procedure (78.2%, 69.3%, and 58.5%, respectively). Less than 35% of the articles reported the coding of variables (18.7%). The remaining 5 criteria were not satisfied by a sufficient number of articles: goodness-of-fit (10.1%), interactions (3.8%), checking for outliers (3.2%), collinearity (1.9%), and participation of statisticians and epidemiologists (0.3%). The criterion of conformity with linear gradients was applicable to 186 articles; however, only 7 (3.8%) mentioned or tested it.The reporting quality and model accuracy of MLR in selected articles were not satisfactory. In fact, severe deficiencies were noted. Only 1 article scored 9. We recommend authors, readers, reviewers, and editors to consider MLR models more carefully and cooperate more closely with statisticians and epidemiologists. Journals should develop statistical reporting guidelines concerning MLR.

Spatial interpolation and radiological mapping of ambient gamma dose rate by using artificial neural networks and fuzzy logic methods.

The aim of this study was to determine spatial risk dispersion of ambient gamma dose rate (AGDR) by using both artificial neural network (ANN) and fuzzy logic (FL) methods, compare the performances of methods, make dose estimations for intermediate stations with no previous measurements and create dose rate risk maps of the study area. In order to determine the dose distribution by using artificial neural networks, two main networks and five different network structures were used; feed forward ANN; Multi-layer perceptron (MLP), Radial basis functional neural network (RBFNN), Quantile regression neural network (QRNN) and recurrent ANN; Jordan networks (JN), Elman networks (EN). In the evaluation of estimation performance obtained for the test data, all models appear to give similar results. According to the cross-validation results obtained for explaining AGDR distribution, Pearson's r coefficients were calculated as 0.94, 0.91, 0.89, 0.91, 0.91 and 0.92 and RMSE values were calculated as 34.78, 43.28, 63.92, 44.86, 46.77 and 37.92 for MLP, RBFNN, QRNN, JN, EN and FL, respectively. In addition, spatial risk maps showing distributions of AGDR of the study area were created by all models and results were compared with geological, topological and soil structure.

Evaluating Model-Data Fit by Comparing Parametric and Nonparametric Item Response Functions: Application of a Tukey-Hann Procedure.

This study describes an approach for examining model-data fit for the dichotomous Rasch model using Tukey-Hann item response functions (TH-IRFs). The procedure proposed in this paper is based on an iterative version of a smoothing technique proposed by Tukey (1977) for estimating nonparametric item response functions (IRFs). A root integrated squared error (RISE) statistic (Douglas and Cohen, 2001) is used to compare the TH-IRFs to the Rasch IRFs. Data from undergraduate students at a large university are used to demonstrate this iterative smoothing technique. The RISE statistic is used for comparing the item response functions to assess model-data fit. A comparison between the residual based Infit and Outfit statistics and RISE statistics are also examined. The results suggest that the RISE statistic and TH-IRFs provide a useful analytical and graphical approach for evaluating item fit. Implications for research, theory and practice related to model-data fit are discussed.

Scale Anchoring with the Rasch Model.

Scale anchoring is a method to provide additional meaning to particular scores at different points along a score scale by identifying representative items associated with the particular scores. These items are then analyzed to write statements of what types of performance can be expected of a person with the particular scores to help test takers and other stakeholders better understand what it means to achieve the different scores. This article provides simple formulas that can be used to identify possible items to serve as scale anchors with the Rasch model. Specific attention is given to practical considerations and challenges that may be encountered when applying the formulas in different contexts. An illustrative example using data from a medical imaging certification program demonstrates how the formulas can be applied in practice.

Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model.

This study examined the performance of four methods of handling missing data for discrete response options on a questionnaire: (1) ignoring the missingness (using only the observed items to estimate trait levels); (2) nearest-neighbor hot deck imputation; (3) multiple hot deck imputation; and (4) semi-parametric multiple imputation. A simulation study examining three questionnaire lengths (41-, 20-, and 10-item) crossed with three levels of missingness (10, 25, and 40 percent) was conducted to see which methods best recovered trait estimates when data were missing completely at random and the polytomous items were scored with Andrich's (1978) rating scale model. The results showed that ignoring the missingness and semi-parametric imputation best recovered known trait levels across all conditions, with the semi-parametric technique providing the most precise trait estimates. This study demonstrates the power of specific objectivity in Rasch measurement, as ignoring the missingness leads to generally unbiased trait estimates.

Constructing an Outcome Measure of Occupational Experience: An Application of Rasch Measurement Methods.

Rasch methods were used to evaluate and further develop the Daily Experiences of Pleasure, Productivity, and Restoration Profile (PPR Profile) into a health outcome measure of occupational experience. Analyses of 263 participant PPR Profiles focused on rating scale structure, dimensionality, and reliability. All rating scale categories increased with the intended meaning of the scales, but only 20 of the 21 category measures fit the Rasch rating scale model (RRSM). Several items also did not fit the RRSM and results of residual principal components analyses suggested possible second dimensions in each scale. More importantly, reliability coefficients were very low and participants could not be separated into more than one group as demonstrated by low person separation indices. The authors offer several recommendations for the next steps in the development of the PPR Profile as a health outcome measure of occupational experience.

High resolution microscopy reveals significant impacts of ocean acidification and warming on larval shell development in Laternula elliptica.

Environmental stressors impact marine larval growth rates, quality and sizes. Larvae of the Antarctic bivalve, Laternula elliptica, were raised to the D-larvae stage under temperature and pH conditions representing ambient and end of century projections (-1.6°C to +0.4°C and pH 7.98 to 7.65). Previous observations using light microscopy suggested pH had no influence on larval abnormalities in this species. Detailed analysis of the shell using SEM showed that reduced pH is in fact a major stressor during development for this species, producing D-larvae with abnormal shapes, deformed shell edges and irregular hinges, cracked shell surfaces and even uncalcified larvae. Additionally, reduced pH increased pitting and cracking on shell surfaces. Thus, apparently normal larvae may be compromised at the ultrastructural level and these larvae would be in poor condition at settlement, reducing juvenile recruitment and overall survival. Elevated temperatures increased prodissoconch II sizes. However, the overall impacts on larval shell quality and integrity with concurrent ocean acidification would likely overshadow any beneficial results from warmer temperatures, limiting populations of this prevalent Antarctic species.

InMAP: A model for air pollution interventions.

Mechanistic air pollution modeling is essential in air quality management, yet the extensive expertise and computational resources required to run most models prevent their use in many situations where their results would be useful. Here, we present InMAP (Intervention Model for Air Pollution), which offers an alternative to comprehensive air quality models for estimating the air pollution health impacts of emission reductions and other potential interventions. InMAP estimates annual-average changes in primary and secondary fine particle (PM2.5) concentrations-the air pollution outcome generally causing the largest monetized health damages-attributable to annual changes in precursor emissions. InMAP leverages pre-processed physical and chemical information from the output of a state-of-the-science chemical transport model and a variable spatial resolution computational grid to perform simulations that are several orders of magnitude less computationally intensive than comprehensive model simulations. In comparisons run here, InMAP recreates comprehensive model predictions of changes in total PM2.5 concentrations with population-weighted mean fractional bias (MFB) of -17% and population-weighted R2 = 0.90. Although InMAP is not specifically designed to reproduce total observed concentrations, it is able to do so within published air quality model performance criteria for total PM2.5. Potential uses of InMAP include studying exposure, health, and environmental justice impacts of potential shifts in emissions for annual-average PM2.5. InMAP can be trained to run for any spatial and temporal domain given the availability of appropriate simulation output from a comprehensive model. The InMAP model source code and input data are freely available online under an open-source license.

Electricity forecasting on the individual household level enhanced based on activity patterns.

Leveraging smart metering solutions to support energy efficiency on the individual household level poses novel research challenges in monitoring usage and providing accurate load forecasting. Forecasting electricity usage is an especially important component that can provide intelligence to smart meters. In this paper, we propose an enhanced approach for load forecasting at the household level. The impacts of residents' daily activities and appliance usages on the power consumption of the entire household are incorporated to improve the accuracy of the forecasting model. The contributions of this paper are threefold: (1) we addressed short-term electricity load forecasting for 24 hours ahead, not on the aggregate but on the individual household level, which fits into the Residential Power Load Forecasting (RPLF) methods; (2) for the forecasting, we utilized a household specific dataset of behaviors that influence power consumption, which was derived using segmentation and sequence mining algorithms; and (3) an extensive load forecasting study using different forecasting algorithms enhanced by the household activity patterns was undertaken.

Zooming in: From spatially extended traveling waves to localized structures: The case of the Sine-Gordon equation in (1+3) dimensions.

The Sine-Gordon equation in (1+3) dimensions has N-traveling front ("kink", "domain wall")- solutions for all N ≥ 1. A nonlinear functional of the solution, which vanishes on a single-front, maps multi-front solutions onto sets of infinitely long, but laterally bounded, rods, which move in space. Each rod is localized in the vicinity of the intersection of two Sine-Gordon fronts. The rod systems are solutions of the linear wave equation, driven by a term that is constructed out of Sine-Gordon fronts. An additional linear operation maps multi-rod systems onto sets of blobs. Each blob is localized in the vicinity of rod intersection, and moves in space. The blob systems are solutions of the linear wave equation, driven by a term that is also constructed out of Sine-Gordon fronts. The temporal evolution of multi-blob solutions mimics elastic collisions of systems of spatially extended particles.

Rainfall changes affect the algae dominance in tank bromeliad ecosystems.

Climate change and biodiversity loss have been reported as major disturbances in the biosphere which can trigger changes in the structure and functioning of natural ecosystems. Nonetheless, empirical studies demonstrating how both factors interact to affect shifts in aquatic ecosystems are still unexplored. Here, we experimentally test how changes in rainfall distribution and litter diversity affect the occurrence of the algae-dominated condition in tank bromeliad ecosystems. Tank bromeliads are miniature aquatic ecosystems shaped by the rainwater and allochthonous detritus accumulated in the bases of their leaves. Here, we demonstrated that changes in the rainfall distribution were able to reduce the chlorophyll-a concentration in the water of bromeliad tanks affecting significantly the occurrence of algae-dominated conditions. On the other hand, litter diversity did not affect the algae dominance irrespective to the rainfall scenario. We suggest that rainfall changes may compromise important self-reinforcing mechanisms responsible for maintaining high levels of algae on tank bromeliads ecosystems. We summarized these results into a theoretical model which suggests that tank bromeliads may show two different regimes, determined by the bromeliad ability in taking up nutrients from the water and by the total amount of light entering the tank. We concluded that predicted climate changes might promote regime shifts in tropical aquatic ecosystems by shaping their structure and the relative importance of other regulating factors.

Variation in benthic long-term data of transitional waters: Is interpretation more than speculation?

Biological long-term data series in marine habitats are often used to identify anthropogenic impacts on the environment or climate induced regime shifts. However, particularly in transitional waters, environmental properties like water mass dynamics, salinity variability and the occurrence of oxygen minima not necessarily caused by either human activities or climate change can attenuate or mask apparent signals. At first glance it very often seems impossible to interpret the strong fluctuations of e.g. abundances or species richness, since abiotic variables like salinity and oxygen content vary simultaneously as well as in apparently erratic ways. The long-term development of major macrozoobenthic parameters (abundance, biomass, species numbers) and derivative macrozoobenthic indices (Shannon diversity, Margalef, Pilou's evenness and Hurlbert) has been successfully interpreted and related to the long-term fluctuations of salinity and oxygen, incorporation of the North Atlantic Oscillation index (NAO index), relying on the statistical analysis of modelled and measured data during 35 years of observation at three stations in the south-western Baltic Sea. Our results suggest that even at a restricted spatial scale the benthic system does not appear to be tightly controlled by any single environmental driver and highlight the complexity of spatially varying temporal response.

Financial Forecasting and Stochastic Modeling: Predicting the Impact of Business Decisions.

In health care organizations, effective investment of precious resources is critical to assure that the organization delivers high-quality and sustainable patient care within a supportive environment for patients, their families, and the health care providers. This holds true for organizations independent of size, from small practices to large health systems. For radiologists whose role is to oversee the delivery of imaging services and the interpretation, communication, and curation of imaging-informed information, business decisions influence where and how they practice, the tools available for image acquisition and interpretation, and ultimately their professional satisfaction. With so much at stake, physicians must understand and embrace the methods necessary to develop and interpret robust financial analyses so they effectively participate in and better understand decision making. This review discusses the financial drivers upon which health care organizations base investment decisions and the central role that stochastic financial modeling should play in support of strategically aligned capital investments. Given a health care industry that has been slow to embrace advanced financial analytics, a fundamental message of this review is that the skills and analytical tools are readily attainable and well worth the effort to implement in the interest of informed decision making. (©) RSNA, 2017 Online supplemental material is available for this article.

The roles of prostate-specific antigen (PSA) density, prostate volume, and their zone-adjusted derivatives in predicting prostate cancer in patients with PSA less than 20.0 ng/mL.

The aim of this study was to develop nomograms for predicting prostate cancer and its zonal location using prostate-specific antigen density, prostate volume, and their zone-adjusted derivatives. A total of 928 consecutive patients with prostate-specific antigen (PSA) less than 20.0 ng/mL, who underwent transrectal ultrasound-guided transperineal 12-core prostate biopsy at West China Hospital between 2011 and 2014, were retrospectively enrolled. The patients were randomly split into training cohort (70%, n = 650) and validation cohort (30%, n = 278). Predicting models and the associated nomograms were built using the training cohort, while the validations of the models were conducted using the validation cohort. Univariate and multivariate logistic regression was performed. Then, new nomograms were generated based on multivariate regression coefficients. The discrimination power and calibration of these nomograms were validated using the area under the ROC curve (AUC) and the calibration curve. The potential clinical effects of these models were also tested using decision curve analysis. In total, 285 (30.7%) patients were diagnosed with prostate cancer. Among them, 131 (14.1%) and 269 (29.0%) had transition zone prostate cancer and peripheral zone prostate cancer. Each of zone-adjusted derivatives-based nomogram had an AUC more than 0.75. All nomograms had higher calibration and much better net benefit than the scenarios in predicting patients with or without different zones prostate cancer. Prostate-specific antigen density, prostate volume, and their zone-adjusted derivatives have important roles in detecting prostate cancer and its zonal location for patients with PSA 2.5-20.0 ng/mL. To the best of our knowledge, this is the first nomogram using these parameters to predict outcomes of 12-core prostate biopsy. These instruments can help clinicians to increase the accuracy of prostate cancer screening and to avoid unnecessary prostate biopsy.

Systematic Review of Health Economic Impact Evaluations of Risk Prediction Models: Stop Developing, Start Evaluating.

Although health economic evaluations (HEEs) are increasingly common for therapeutic interventions, they appear to be rare for the use of risk prediction models (PMs).

A Practical ANOVA Approach for Uncertainty Analysis in Population-Based Disease Microsimulation Models.

To provide a practical approach for calculating uncertainty intervals and variance components associated with initial-condition and dynamic-equation parameters in computationally expensive population-based disease microsimulation models.

A spatial method to calculate small-scale fisheries effort in data poor scenarios.

To gauge the collateral impacts of fishing we must know where fishing boats operate and how much they fish. Although small-scale fisheries land approximately the same amount of fish for human consumption as industrial fleets globally, methods of estimating their fishing effort are comparatively poor. We present an accessible, spatial method of calculating the effort of small-scale fisheries based on two simple measures that are available, or at least easily estimated, in even the most data-poor fisheries: the number of boats and the local coastal human population. We illustrate the method using a small-scale fisheries case study from the Gulf of California, Mexico, and show that our measure of Predicted Fishing Effort (PFE), measured as the number of boats operating in a given area per day adjusted by the number of people in local coastal populations, can accurately predict fisheries landings in the Gulf. Comparing our values of PFE to commercial fishery landings throughout the Gulf also indicates that the current number of small-scale fishing boats in the Gulf is approximately double what is required to land theoretical maximum fish biomass. Our method is fishery-type independent and can be used to quantitatively evaluate the efficacy of growth in small-scale fisheries. This new method provides an important first step towards estimating the fishing effort of small-scale fleets globally.

Estimating the standardized incidence ratio (SIR) with incomplete follow-up data.

A standard parameter to compare the disease incidence of a cohort relative to the population is the standardized incidence ratio (SIR). For statistical inference is commonly assumed that the denominator, the expected number of cases, is fixed. If a disease registry is available, incident cases can sometimes be identified by linkage with the registry, however, registries may not contain information on migration or death from other causes. A complete follow-up with a population registry may not be possible. In that case, end-of-follow-up date and therefore, exact person-years of observation are unknown.

Gleason Grading, Biochemical Failure, and Prostate Cancer-Specific Death.

To examine the relationship between the recently defined Gleason grade groups and prostate cancer-specific mortality.

Computation and measurement of cell decision making errors using single cell data.

In this study a new computational method is developed to quantify decision making errors in cells, caused by noise and signaling failures. Analysis of tumor necrosis factor (TNF) signaling pathway which regulates the transcription factor Nuclear Factor κB (NF-κB) using this method identifies two types of incorrect cell decisions called false alarm and miss. These two events represent, respectively, declaring a signal which is not present and missing a signal that does exist. Using single cell experimental data and the developed method, we compute false alarm and miss error probabilities in wild-type cells and provide a formulation which shows how these metrics depend on the signal transduction noise level. We also show that in the presence of abnormalities in a cell, decision making processes can be significantly affected, compared to a wild-type cell, and the method is able to model and measure such effects. In the TNF-NF-κB pathway, the method computes and reveals changes in false alarm and miss probabilities in A20-deficient cells, caused by cell's inability to inhibit TNF-induced NF-κB response. In biological terms, a higher false alarm metric in this abnormal TNF signaling system indicates perceiving more cytokine signals which in fact do not exist at the system input, whereas a higher miss metric indicates that it is highly likely to miss signals that actually exist. Overall, this study demonstrates the ability of the developed method for modeling cell decision making errors under normal and abnormal conditions, and in the presence of transduction noise uncertainty. Compared to the previously reported pathway capacity metric, our results suggest that the introduced decision error metrics characterize signaling failures more accurately. This is mainly because while capacity is a useful metric to study information transmission in signaling pathways, it does not capture the overlap between TNF-induced noisy response curves.

Modeling Variables With a Spike at Zero: Examples and Practical Recommendations.

In most epidemiologic studies and in clinical research generally, there are variables with a spike at zero, namely variables for which a proportion of individuals have zero exposure (e.g., never smokers) and among those exposed the variable has a continuous distribution. Different options exist for modeling such variables, such as categorization where the nonexposed form the reference group, or ignoring the spike by including the variable in the regression model with or without some transformation or modeling procedures. It has been shown that such situations can be analyzed by adding a binary indicator (exposed/nonexposed) to the regression model, and a method based on fractional polynomials with which to estimate a suitable functional form for the positive portion of the spike-at-zero variable distribution has been developed. In this paper, we compare different approaches using data from 3 case-control studies carried out in Germany: the Mammary Carcinoma Risk Factor Investigation (MARIE), a breast cancer study conducted in 2002-2005 (Flesch-Janys et al., Int J Cancer. 2008;123(4):933-941); the Rhein-Neckar Larynx Study, a study of laryngeal cancer conducted in 1998-2000 (Dietz et al., Int J Cancer. 2004;108(6):907-911); and a lung cancer study conducted in 1988-1993 (Jöckel et al., Int J Epidemiol. 1998;27(4):549-560). Strengths and limitations of different procedures are demonstrated, and some recommendations for practical use are given.

Transcutaneous Bilirubin Nomogram for Healthy Term and Late Preterm Neonates in First 96 Hours of Life.

To develop nomogram of Transcutaneous Bilirubin among healthy term and late-preterm neonates during first 96 hours of age.

SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids.

Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. In contrast to the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles and local structure conformations were incorporated. We used the k-nearest neighbors cleaning treatment for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred to as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.

Learning about and from others' prudence, impatience or laziness: The computational bases of attitude alignment.

Peoples' subjective attitude towards costs such as, e.g., risk, delay or effort are key determinants of inter-individual differences in goal-directed behaviour. Thus, the ability to learn about others' prudent, impatient or lazy attitudes is likely to be critical for social interactions. Conversely, how adaptive such attitudes are in a given environment is highly uncertain. Thus, the brain may be tuned to garner information about how such costs ought to be arbitrated. In particular, observing others' attitude may change one's uncertain belief about how to best behave in related difficult decision contexts. In turn, learning from others' attitudes is determined by one's ability to learn about others' attitudes. We first derive, from basic optimality principles, the computational properties of such a learning mechanism. In particular, we predict two apparent cognitive biases that would arise when individuals are learning about others' attitudes: (i) people should overestimate the degree to which they resemble others (false-consensus bias), and (ii) they should align their own attitudes with others' (social influence bias). We show how these two biases non-trivially interact with each other. We then validate these predictions experimentally by profiling people's attitudes both before and after guessing a series of cost-benefit arbitrages performed by calibrated artificial agents (which are impersonating human individuals).

Phylodynamics on local sexual contact networks.

Phylodynamic models are widely used in infectious disease epidemiology to infer the dynamics and structure of pathogen populations. However, these models generally assume that individual hosts contact one another at random, ignoring the fact that many pathogens spread through highly structured contact networks. We present a new framework for phylodynamics on local contact networks based on pairwise epidemiological models that track the status of pairs of nodes in the network rather than just individuals. Shifting our focus from individuals to pairs leads naturally to coalescent models that describe how lineages move through networks and the rate at which lineages coalesce. These pairwise coalescent models not only consider how network structure directly shapes pathogen phylogenies, but also how the relationship between phylogenies and contact networks changes depending on epidemic dynamics and the fraction of infected hosts sampled. By considering pathogen phylogenies in a probabilistic framework, these coalescent models can also be used to estimate the statistical properties of contact networks directly from phylogenies using likelihood-based inference. We use this framework to explore how much information phylogenies retain about the underlying structure of contact networks and to infer the structure of a sexual contact network underlying a large HIV-1 sub-epidemic in Switzerland.

Measured glomerular filtration rate does not improve prediction of mortality by cystatin C and creatinine.

Cystatin C may add explanatory power for associations with mortality in combination with other filtration markers, possibly indicating pathways other than glomerular filtration rate (GFR). However, this has not been firmly established since interpretation of associations independent of measured GFR (mGFR) is limited by potential multicollinearity between markers of GFR. The primary aim of this study was to assess associations between cystatin C and mortality, independent of mGFR. A secondary aim was to evaluate the utility of combining cystatin C and creatinine to predict mortality risk.

PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction.

In the recent few years, an increasing number of studies have shown that microRNAs (miRNAs) play critical roles in many fundamental and important biological processes. As one of pathogenetic factors, the molecular mechanisms underlying human complex diseases still have not been completely understood from the perspective of miRNA. Predicting potential miRNA-disease associations makes important contributions to understanding the pathogenesis of diseases, developing new drugs, and formulating individualized diagnosis and treatment for diverse human complex diseases. Instead of only depending on expensive and time-consuming biological experiments, computational prediction models are effective by predicting potential miRNA-disease associations, prioritizing candidate miRNAs for the investigated diseases, and selecting those miRNAs with higher association probabilities for further experimental validation. In this study, Path-Based MiRNA-Disease Association (PBMDA) prediction model was proposed by integrating known human miRNA-disease associations, miRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity for miRNAs and diseases. This model constructed a heterogeneous graph consisting of three interlinked sub-graphs and further adopted depth-first search algorithm to infer potential miRNA-disease associations. As a result, PBMDA achieved reliable performance in the frameworks of both local and global LOOCV (AUCs of 0.8341 and 0.9169, respectively) and 5-fold cross validation (average AUC of 0.9172). In the cases studies of three important human diseases, 88% (Esophageal Neoplasms), 88% (Kidney Neoplasms) and 90% (Colon Neoplasms) of top-50 predicted miRNAs have been manually confirmed by previous experimental reports from literatures. Through the comparison performance between PBMDA and other previous models in case studies, the reliable performance also demonstrates that PBMDA could serve as a powerful computational tool to accelerate the identification of disease-miRNA associations.