PubTransformer

A site to transform Pubmed publications into these bibliographic reference formats: ADS, BibTeX, EndNote, ISI used by the Web of Knowledge, RIS, MEDLINE, Microsoft's Word 2007 XML.

Algorithms - Top 30 Publications

A Comparison of the Performance of EndoPredict Clinical and NHS PREDICT in 120 Patients Treated for ER-positive Breast Cancer.

Computational algorithms, such as NHS PREDICT, have been developed using cancer registry data to guide decisions regarding adjuvant chemotherapy. They are limited by biases of the underlying data. Recent breakthroughs in molecular biology have aided the development of genomic assays which provide superior clinical information. In this study, we compared the performance in risk stratification of EndoPredict Clinical (EPClin, a composite of clinical data and EndoPredict) and PREDICT in a cohort of patients with breast cancer considered potential candidates for chemotherapy by the clinicians.

Salient object segmentation based on active contouring.

Traditional saliency detection algorithms lack object semantic character, and the segmentation algorithms cannot highlight the saliency of the segmentation regions. In order to compensate for the defects of these two algorithms, the salient object segmentation model, which is a novel combination of two algorithms, is established in this paper. With the help of a priori knowledge of image boundary background traits, the K-means++ algorithm is used to cluster the pixels for each region; in line with the sensitivity of the human eye to color and with its attention mechanism, the joint probability distribution of the regional contrast ratio and spatial saliency is established. The selection of the salient area is based on the probabilities, for which the region boundary is taken as the initial curve, and the level-set algorithm is used to perform the salient object segmentation of the image. The curve convergence condition is established according to the confidence level for the segmented region, thus avoiding over-convergence of the segmentation curve. With this method, the salient region boundary is adjacent to the object contour, so the curve evolution time is shorter, and compared with the traditional Li algorithm, the proposed algorithm has higher segmentation evaluation scores, with the additional benefit of emphasizing the importance of the object.

Robust iterative closest point algorithm based on global reference point for rotation invariant registration.

The iterative closest point (ICP) algorithm is efficient and accurate for rigid registration but it needs the good initial parameters. It is easily failed when the rotation angle between two point sets is large. To deal with this problem, a new objective function is proposed by introducing a rotation invariant feature based on the Euclidean distance between each point and a global reference point, where the global reference point is a rotation invariant. After that, this optimization problem is solved by a variant of ICP algorithm, which is an iterative method. Firstly, the accurate correspondence is established by using the weighted rotation invariant feature distance and position distance together. Secondly, the rigid transformation is solved by the singular value decomposition method. Thirdly, the weight is adjusted to control the relative contribution of the positions and features. Finally this new algorithm accomplishes the registration by a coarse-to-fine way whatever the initial rotation angle is, which is demonstrated to converge monotonically. The experimental results validate that the proposed algorithm is more accurate and robust compared with the original ICP algorithm.

A machine learning approach to triaging patients with chronic obstructive pulmonary disease.

COPD patients are burdened with a daily risk of acute exacerbation and loss of control, which could be mitigated by effective, on-demand decision support tools. In this study, we present a machine learning-based strategy for early detection of exacerbations and subsequent triage. Our application uses physician opinion in a statistically and clinically comprehensive set of patient cases to train a supervised prediction algorithm. The accuracy of the model is assessed against a panel of physicians each triaging identical cases in a representative patient validation set. Our results show that algorithm accuracy and safety indicators surpass all individual pulmonologists in both identifying exacerbations and predicting the consensus triage in a 101 case validation set. The algorithm is also the top performer in sensitivity, specificity, and ppv when predicting a patient's need for emergency care.

Comparative analysis of weighted gene co-expression networks in human and mouse.

The application of complex network modeling to analyze large co-expression data sets has gained traction during the last decade. In particular, the use of the weighted gene co-expression network analysis framework has allowed an unbiased and systems-level investigation of genotype-phenotype relationships in a wide range of systems. Since mouse is an important model organism for biomedical research on human disease, it is of great interest to identify similarities and differences in the functional roles of human and mouse orthologous genes. Here, we develop a novel network comparison approach which we demonstrate by comparing two gene-expression data sets from a large number of human and mouse tissues. The method uses weighted topological overlap alongside the recently developed network-decomposition method of s-core analysis, which is suitable for making gene-centrality rankings for weighted networks. The aim is to identify globally central genes separately in the human and mouse networks. By comparing the ranked gene lists, we identify genes that display conserved or diverged centrality-characteristics across the networks. This framework only assumes a single threshold value that is chosen from a statistical analysis, and it may be applied to arbitrary network structures and edge-weight distributions, also outside the context of biology. When conducting the comparative network analysis, both within and across the two species, we find a clear pattern of enrichment of transcription factors, for the homeobox domain in particular, among the globally central genes. We also perform gene-ontology term enrichment analysis and look at disease-related genes for the separate networks as well as the network comparisons. We find that gene ontology terms related to regulation and development are generally enriched across the networks. In particular, the genes FOXE3, RHO, RUNX2, ALX3 and RARA, which are disease genes in either human or mouse, are on the top-10 list of globally central genes in the human and mouse networks.

Proprioceptive assessment in clinical settings: Evaluation of joint position sense in upper limb post-stroke using a robotic manipulator.

Proprioception is a critical component for motor functions and directly affects motor learning after neurological injuries. Conventional methods for its assessment are generally ordinal in nature and hence lack sensitivity. Robotic devices designed to promote sensorimotor learning can potentially provide quantitative precise, accurate, and reliable assessments of sensory impairments. In this paper, we investigate the clinical applicability and validity of using a planar 2 degrees of freedom robot to quantitatively assess proprioceptive deficits in post-stroke participants. Nine stroke survivors and nine healthy subjects participated in the study. Participants' hand was passively moved to the target position guided by the H-Man robot (Criterion movement) and were asked to indicate during a second passive movement towards the same target (Matching movement) when they felt that they matched the target position. The assessment was carried out on a planar surface for movements in the forward and oblique directions in the contralateral and ipsilateral sides of the tested arm. The matching performance was evaluated in terms of error magnitude (absolute and signed) and its variability. Stroke patients showed higher variability in the estimation of the target position compared to the healthy participants. Further, an effect of target was found, with lower absolute errors in the contralateral side. Pairwise comparison between individual stroke participant and control participants showed significant proprioceptive deficits in two patients. The proposed assessment of passive joint position sense was inherently simple and all participants, regardless of motor impairment level, could complete it in less than 10 minutes. Therefore, the method can potentially be carried out to detect changes in proprioceptive deficits in clinical settings.

Development and validation of QDiabetes-2018 risk prediction algorithm to estimate future risk of type 2 diabetes: cohort study.

Objectives To derive and validate updated QDiabetes-2018 prediction algorithms to estimate the 10 year risk of type 2 diabetes in men and women, taking account of potential new risk factors, and to compare their performance with current approaches.Design Prospective open cohort study.Setting Routinely collected data from 1457 general practices in England contributing to the QResearch database: 1094 were used to develop the scores and a separate set of 363 were used to validate the scores.Participants 11.5 million people aged 25-84 and free of diabetes at baseline: 8.87 million in the derivation cohort and 2.63 million in the validation cohort.Methods Cox proportional hazards models were used in the derivation cohort to derive separate risk equations in men and women for evaluation at 10 years. Risk factors considered included those already in QDiabetes (age, ethnicity, deprivation, body mass index, smoking, family history of diabetes in a first degree relative, cardiovascular disease, treated hypertension, and regular use of corticosteroids) and new risk factors: atypical antipsychotics, statins, schizophrenia or bipolar affective disorder, learning disability, gestational diabetes, and polycystic ovary syndrome. Additional models included fasting blood glucose and glycated haemoglobin (HBA1c). Measures of calibration and discrimination were determined in the validation cohort for men and women separately and for individual subgroups by age group, ethnicity, and baseline disease status.Main outcome measure Incident type 2 diabetes recorded on the general practice record.Results In the derivation cohort, 178 314 incident cases of type 2 diabetes were identified during follow-up arising from 42.72 million person years of observation. In the validation cohort, 62 326 incident cases of type 2 diabetes were identified from 14.32 million person years of observation. All new risk factors considered met our model inclusion criteria. Model A included age, ethnicity, deprivation, body mass index, smoking, family history of diabetes in a first degree relative, cardiovascular disease, treated hypertension, and regular use of corticosteroids, and new risk factors: atypical antipsychotics, statins, schizophrenia or bipolar affective disorder, learning disability, and gestational diabetes and polycystic ovary syndrome in women. Model B included the same variables as model A plus fasting blood glucose. Model C included HBA1c instead of fasting blood glucose. All three models had good calibration and high levels of explained variation and discrimination. In women, model B explained 63.3% of the variation in time to diagnosis of type 2 diabetes (R2), the D statistic was 2.69 and the Harrell's C statistic value was 0.89. The corresponding values for men were 58.4%, 2.42, and 0.87. Model B also had the highest sensitivity compared with current recommended practice in the National Health Service based on bands of either fasting blood glucose or HBA1c. However, only 16% of patients had complete data for blood glucose measurements, smoking, and body mass index.Conclusions Three updated QDiabetes risk models to quantify the absolute risk of type 2 diabetes were developed and validated: model A does not require a blood test and can be used to identify patients for fasting blood glucose (model B) or HBA1c (model C) testing. Model B had the best performance for predicting 10 year risk of type 2 diabetes to identify those who need interventions and more intensive follow-up, improving on current approaches. Additional external validation of models B and C in datasets with more completely collected data on blood glucose would be valuable before the models are used in clinical practice.

Imaging of Benign Odontogenic Lesions.

Numerous benign cysts or solid tumors may present in the jaws. These arise from tooth-forming tissues in the dental alveolus or from nonodontogenic tissues in the basal bone of the mandible and maxilla. Radiologists provide 2 deliverables to assist in diagnosis and management: (1) appropriately formatted images demonstrating the location and extent of the lesion and (2) interpretive reports highlighting specific radiologic findings and an impression providing a radiologic differential diagnosis. This article provides guidance on essential image protocols for planning treatments, a radiologic differential diagnostic algorithm based on location and pattern recognition, and a summary of the main features of benign odontogenic lesions.

When Machines Think: Radiology's Next Frontier.

Artificial intelligence (AI), machine learning, and deep learning are terms now seen frequently, all of which refer to computer algorithms that change as they are exposed to more data. Many of these algorithms are surprisingly good at recognizing objects in images. The combination of large amounts of machine-consumable digital data, increased and cheaper computing power, and increasingly sophisticated statistical models combine to enable machines to find patterns in data in ways that are not only cost-effective but also potentially beyond humans' abilities. Building an AI algorithm can be surprisingly easy. Understanding the associated data structures and statistics, on the other hand, is often difficult and obscure. Converting the algorithm into a sophisticated product that works consistently in broad, general clinical use is complex and incompletely understood. To show how these AI products reduce costs and improve outcomes will require clinical translation and industrial-grade integration into routine workflow. Radiology has the chance to leverage AI to become a center of intelligently aggregated, quantitative, diagnostic information. Centaur radiologists, formed as a synergy of human plus computer, will provide interpretations using data extracted from images by humans and image-analysis computer algorithms, as well as the electronic health record, genomics, and other disparate sources. These interpretations will form the foundation of precision health care, or care customized to an individual patient. © RSNA, 2017.

The influence of filtering and downsampling on the estimation of transfer entropy.

Transfer entropy (TE) provides a generalized and model-free framework to study Wiener-Granger causality between brain regions. Because of its nonparametric character, TE can infer directed information flow also from nonlinear systems. Despite its increasing number of applications in neuroscience, not much is known regarding the influence of common electrophysiological preprocessing on its estimation. We test the influence of filtering and downsampling on a recently proposed nearest neighborhood based TE estimator. Different filter settings and downsampling factors were tested in a simulation framework using a model with a linear coupling function and two nonlinear models with sigmoid and logistic coupling functions. For nonlinear coupling and progressively lower low-pass filter cut-off frequencies up to 72% false negative direct connections and up to 26% false positive connections were identified. In contrast, for the linear model, a monotonic increase was only observed for missed indirect connections (up to 86%). High-pass filtering (1 Hz, 2 Hz) had no impact on TE estimation. After low-pass filtering interaction delays were significantly underestimated. Downsampling the data by a factor greater than the assumed interaction delay erased most of the transmitted information and thus led to a very high percentage (67-100%) of false negative direct connections. Low-pass filtering increases the number of missed connections depending on the filters cut-off frequency. Downsampling should only be done if the sampling factor is smaller than the smallest assumed interaction delay of the analyzed network.

Color- and Spectral-Doppler-Sonography: How to start an Examination - Step by Step.

Smartphone-based quantitative measurements on holographic sensors.

The research reported herein integrates a generic holographic sensor platform and a smartphone-based colour quantification algorithm in order to standardise and improve the determination of the concentration of analytes of interest. The utility of this approach has been exemplified by analysing the replay colour of the captured image of a holographic pH sensor in near real-time. Personalised image encryption followed by a wavelet-based image compression method were applied to secure the image transfer across a bandwidth-limited network to the cloud. The decrypted and decompressed image was processed through four principal steps: Recognition of the hologram in the image with a complex background using a template-based approach, conversion of device-dependent RGB values to device-independent CIEXYZ values using a polynomial model of the camera and computation of the CIEL*a*b* values, use of the colour coordinates of the captured image to segment the image, select the appropriate colour descriptors and, ultimately, locate the region of interest (ROI), i.e. the hologram in this case, and finally, application of a machine learning-based algorithm to correlate the colour coordinates of the ROI to the analyte concentration. Integrating holographic sensors and the colour image processing algorithm potentially offers a cost-effective platform for the remote monitoring of analytes in real time in readily accessible body fluids by minimally trained individuals.

Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets.

The prediction of the trends of stocks and index prices is one of the important issues to market participants. Investors have set trading or fiscal strategies based on the trends, and considerable research in various academic fields has been studied to forecast financial markets. This study predicts the trends of the Korea Composite Stock Price Index 200 (KOSPI 200) prices using nonparametric machine learning models: artificial neural network, support vector machines with polynomial and radial basis function kernels. In addition, this study states controversial issues and tests hypotheses about the issues. Accordingly, our results are inconsistent with those of the precedent research, which are generally considered to have high prediction performance. Moreover, Google Trends proved that they are not effective factors in predicting the KOSPI 200 index prices in our frameworks. Furthermore, the ensemble methods did not improve the accuracy of the prediction.

Accuracy of Medicare Claim-based Algorithm to Detect Breast, Prostate, or Lung Cancer Bone Metastases.

We had previously developed an algorithm for Medicare claims data to detect bone metastases associated with breast, prostate, or lung cancer. This study was conducted to examine whether this algorithm accurately documents bone metastases on the basis of diagnosis codes in Medicare claims data.

Validation of Molecular Pathology Codes for the Identification of Mutational Testing in Lung and Colon Cancer.

Targeted therapy for patients with lung and colon cancer based on tumor molecular profiles is an important cancer treatment strategy, but the impact of gene mutation tests on cancer treatment and outcomes in large populations is not clear. In this study, we assessed the accuracy of an algorithm to identify tumor mutation testing in administrative claims data during a period before test-specific Current Procedural Terminology codes were available.

Tree-based Claims Algorithm for Measuring Pretreatment Quality of Care in Medicare Disabled Hepatitis C Patients.

To help broaden the use of machine-learning approaches in health services research, we provide an easy-to-follow framework on the implementation of random forests and apply it to identify quality of care (QC) patterns correlated with treatment receipt among Medicare disabled patients with hepatitis C virus (HCV).

Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management.

Recurrent cancer is common, costly, and lethal, yet we know little about it in community-based populations. Electronic health records and tumor registries contain vast amounts of data regarding community-based patients, but usually lack recurrence status. Existing algorithms that use structured data to detect recurrence have limitations.

An Electronic Health Record-based Algorithm to Ascertain the Date of Second Breast Cancer Events.

Studies of cancer recurrences and second primary tumors require information on outcome dates. Little is known about how well electronic health record-based algorithms can identify dates or how errors in dates can bias analyses.

Design of the smart home system based on the optimal routing algorithm and ZigBee network.

To improve the traditional smart home system, its electric wiring, networking technology, information transmission and facility control are studied. In this paper, we study the electric wiring, networking technology, information transmission and facility control to improve the traditional smart home system. First, ZigBee is used to replace the traditional electric wiring. Second, a network is built to connect lots of wireless sensors and facilities, thanks to the capability of ZigBee self-organized network and Genetic Algorithm-Particle Swarm Optimization Algorithm (GA-PSOA) to search for the optimal route. Finally, when the smart home system is connected to the internet based on the remote server technology, home environment and facilities could be remote real-time controlled. The experiments show that the GA-PSOA reduce the system delay and decrease the energy consumption of the wireless system.

Economic evaluation of the one-hour rule-out and rule-in algorithm for acute myocardial infarction using the high-sensitivity cardiac troponin T assay in the emergency department.

The 1-hour (h) algorithm triages patients presenting with suspected acute myocardial infarction (AMI) to the emergency department (ED) towards "rule-out," "rule-in," or "observation," depending on baseline and 1-h levels of high-sensitivity cardiac troponin (hs-cTn). The economic consequences of applying the accelerated 1-h algorithm are unknown.

A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies.

The authors use four criteria to examine a novel community detection algorithm: (a) effectiveness in terms of producing high values of normalized mutual information (NMI) and modularity, using well-known social networks for testing; (b) examination, meaning the ability to examine mitigating resolution limit problems using NMI values and synthetic networks; (c) correctness, meaning the ability to identify useful community structure results in terms of NMI values and Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks; and (d) scalability, or the ability to produce comparable modularity values with fast execution times when working with large-scale real-world networks. In addition to describing a simple hierarchical arc-merging (HAM) algorithm that uses network topology information, we introduce rule-based arc-merging strategies for identifying community structures. Five well-studied social network datasets and eight sets of LFR benchmark networks were employed to validate the correctness of a ground-truth community, eight large-scale real-world complex networks were used to measure its efficiency, and two synthetic networks were used to determine its susceptibility to two resolution limit problems. Our experimental results indicate that the proposed HAM algorithm exhibited satisfactory performance efficiency, and that HAM-identified and ground-truth communities were comparable in terms of social and LFR benchmark networks, while mitigating resolution limit problems.

Obituary: Toshio Fujita, QSAR pioneer.

This is the obituary for Toshio Fujita, pioneer of the quantitative structure activity relationship (QSAR) paradigm.

An incremental anomaly detection model for virtual machines.

Self-Organizing Map (SOM) algorithm as an unsupervised learning method has been applied in anomaly detection due to its capabilities of self-organizing and automatic anomaly prediction. However, because of the algorithm is initialized in random, it takes a long time to train a detection model. Besides, the Cloud platforms with large scale virtual machines are prone to performance anomalies due to their high dynamic and resource sharing characters, which makes the algorithm present a low accuracy and a low scalability. To address these problems, an Improved Incremental Self-Organizing Map (IISOM) model is proposed for anomaly detection of virtual machines. In this model, a heuristic-based initialization algorithm and a Weighted Euclidean Distance (WED) algorithm are introduced into SOM to speed up the training process and improve model quality. Meanwhile, a neighborhood-based searching algorithm is presented to accelerate the detection time by taking into account the large scale and high dynamic features of virtual machines on cloud platform. To demonstrate the effectiveness, experiments on a common benchmark KDD Cup dataset and a real dataset have been performed. Results suggest that IISOM has advantages in accuracy and convergence velocity of anomaly detection for virtual machines on cloud platform.

Network analysis for count data with excess zeros.

Undirected graphical models or Markov random fields have been a popular class of models for representing conditional dependence relationships between nodes. In particular, Markov networks help us to understand complex interactions between genes in biological processes of a cell. Local Poisson models seem to be promising in modeling positive as well as negative dependencies for count data. Furthermore, when zero counts are more frequent than are expected, excess zeros should be considered in the model.

Castration-Resistant Prostate Cancer: An Algorithmic Approach.

Since 2010, 5 new agents have been approved for advanced prostate cancer treatment. The American Urologic Association (AUA) published guidelines for the management of castration-resistant prostate cancer in 2013. These guidelines identify 6 index patients to consider when selecting the most appropriate treatment. No comparative trials have provided an approach to optimize the sequencing of these drugs. For the urologist, incorporating the guidelines into clinical practice typically requires a multidisciplinary team. This article provides an algorithmic approach based on indication and mechanism of action that complements the AUA guidelines to ensure patients receive the most optimal care.

Machine-based classification of ADHD and nonADHD participants using time/frequency features of event-related neuroelectric activity.

Attention-deficit/hyperactivity disorder (ADHD) is the most frequent diagnosis among children who are referred to psychiatry departments. Although ADHD was discovered at the beginning of the 20th century, its diagnosis is still confronted with many problems.

Fuzzy-based propagation of prior knowledge to improve large-scale image analysis pipelines.

Many automatically analyzable scientific questions are well-posed and a variety of information about expected outcomes is available a priori. Although often neglected, this prior knowledge can be systematically exploited to make automated analysis operations sensitive to a desired phenomenon or to evaluate extracted content with respect to this prior knowledge. For instance, the performance of processing operators can be greatly enhanced by a more focused detection strategy and by direct information about the ambiguity inherent in the extracted data. We present a new concept that increases the result quality awareness of image analysis operators by estimating and distributing the degree of uncertainty involved in their output based on prior knowledge. This allows the use of simple processing operators that are suitable for analyzing large-scale spatiotemporal (3D+t) microscopy images without compromising result quality. On the foundation of fuzzy set theory, we transform available prior knowledge into a mathematical representation and extensively use it to enhance the result quality of various processing operators. These concepts are illustrated on a typical bioimage analysis pipeline comprised of seed point detection, segmentation, multiview fusion and tracking. The functionality of the proposed approach is further validated on a comprehensive simulated 3D+t benchmark data set that mimics embryonic development and on large-scale light-sheet microscopy data of a zebrafish embryo. The general concept introduced in this contribution represents a new approach to efficiently exploit prior knowledge to improve the result quality of image analysis pipelines. The generality of the concept makes it applicable to practically any field with processing strategies that are arranged as linear pipelines. The automated analysis of terabyte-scale microscopy data will especially benefit from sophisticated and efficient algorithms that enable a quantitative and fast readout.

Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm.

In this work, gene expression in autism spectrum disorder (ASD) is analyzed with the goal of selecting the most attributed genes and performing classification. The objective was achieved by utilizing a combination of various statistical filters and a wrapper-based geometric binary particle swarm optimization-support vector machine (GBPSO-SVM) algorithm. The utilization of different filters was accentuated by incorporating a mean and median ratio criterion to remove very similar genes. The results showed that the most discriminative genes that were identified in the first and last selection steps included the presence of a repetitive gene (CAPS2), which was assigned as the gene most highly related to ASD risk. The merged gene subset that was selected by the GBPSO-SVM algorithm was able to enhance the classification accuracy.

Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database.

Deep learning emerges as a powerful tool for analyzing medical images. Retinal disease detection by using computer-aided diagnosis from fundus image has emerged as a new method. We applied deep learning convolutional neural network by using MatConvNet for an automated detection of multiple retinal diseases with fundus photographs involved in STructured Analysis of the REtina (STARE) database. Dataset was built by expanding data on 10 categories, including normal retina and nine retinal diseases. The optimal outcomes were acquired by using a random forest transfer learning based on VGG-19 architecture. The classification results depended greatly on the number of categories. As the number of categories increased, the performance of deep learning models was diminished. When all 10 categories were included, we obtained results with an accuracy of 30.5%, relative classifier information (RCI) of 0.052, and Cohen's kappa of 0.224. Considering three integrated normal, background diabetic retinopathy, and dry age-related macular degeneration, the multi-categorical classifier showed accuracy of 72.8%, 0.283 RCI, and 0.577 kappa. In addition, several ensemble classifiers enhanced the multi-categorical classification performance. The transfer learning incorporated with ensemble classifier of clustering and voting approach presented the best performance with accuracy of 36.7%, 0.053 RCI, and 0.225 kappa in the 10 retinal diseases classification problem. First, due to the small size of datasets, the deep learning techniques in this study were ineffective to be applied in clinics where numerous patients suffering from various types of retinal disorders visit for diagnosis and treatment. Second, we found that the transfer learning incorporated with ensemble classifiers can improve the classification performance in order to detect multi-categorical retinal diseases. Further studies should confirm the effectiveness of algorithms with large datasets obtained from hospitals.

Identification of RNA-binding domains of RNA-binding proteins in cultured cells on a system-wide scale with RBDmap.

This protocol is an extension to: Nat. Protoc. 8, 491-500 (2013); doi:10.1038/nprot.2013.020; published online 14 February 2013RBDmap is a method for identifying, in a proteome-wide manner, the regions of RNA-binding proteins (RBPs) engaged in native interactions with RNA. In brief, cells are irradiated with UV light to induce protein-RNA cross-links. Following stringent denaturing washes, the resulting covalently linked protein-RNA complexes are purified with oligo(dT) magnetic beads. After elution, RBPs are subjected to partial proteolysis, in which the protein regions still bound to the RNA and those released to the supernatant are separated by a second oligo(dT) selection. After sample preparation and mass-spectrometric analysis, peptide intensity ratios between the RNA-bound and released fractions are used to determine the RNA-binding regions. As a Protocol Extension, this article describes an adaptation of an existing Protocol and offers additional applications. The earlier protocol (for the RNA interactome capture method) describes how to identify the active RBPs in cultured cells, whereas this Protocol Extension also enables the identification of the RNA-binding domains of RBPs. The experimental workflow takes 1 week plus 2 additional weeks for proteomics and data analysis. Notably, RBDmap presents numerous advantages over classic methods for determining RNA-binding domains: it produces proteome-wide, high-resolution maps of the protein regions contacting the RNA in a physiological context and can be adapted to different biological systems and conditions. Because RBDmap relies on the isolation of polyadenylated RNA via oligo(dT), it will not provide RNA-binding information on proteins interacting exclusively with nonpolyadenylated transcripts. Applied to HeLa cells, RBDmap uncovered 1,174 RNA-binding sites in 529 proteins, many of which were previously unknown.