For a better experience, click the Compatibility Mode icon above to turn off Compatibility Mode, which is only for viewing older websites.

Department of Surgery Clinical Research


Cancer Database Design

The rapid progress of the information age is presenting a crisis on the medical frontier. New medical technologies are generating data at an exponentially increasing rate. The crisis results from the lack of tools available to harness the power available within this information. Researchers are unable to take full advantage of the information embedded within these growing datasets. Data generation is not equivalent to knowledge. Knowledge is the ability to integrate data into an understandable and usable format. Databases are most often created prospectively based on a priori assumptions. Thus, the tools currently available for knowledge discovery are limited by this hypothesis-based approach. In medicine, researchers have a critical need for tools specifically designed to help turn this mass of information into relevant knowledge.

The broader capability of information-science technologies to organize, process, and evaluate data is dramatically growing and improving. Steps to amass these tools and use them within large datasets are the only means by which the flood of medical data can be exploited in a meaningful way. Rapid growth of disease-specific data warehouses, full of diverse data types, requires development and implementation of new approaches for knowledge discovery.

Our proposed OnLine Analytic Mining (OLAM) technology will expand the utility of the NCDB, enabling advanced analysis. Moreover, it can overcome limitations in the current analysis capabilities of NCDB. For example, OnLine Analytic Processing (OLAP) allows users to summarize data from many different hierarchical levels and easily view data from many factors in one execution. Data mining can uncover hidden relationships and association patterns among factors. Researchers may discover new relationships that they may not have considered to explore. Most importantly, this technology will assist researchers with hypothesis generation. Data mining can generate association rules in the form of IF X THEN Y. These rules can assist researchers in formulating new hypotheses not previously considered.

 Back to Top


Data Mining and Text Extraction

The medical community is constantly striving for new means to conduct research in the battle against diseases. One avenue frequently explored is chart review. This means of conducting a study is often fruitful, yet requires great attention to detail and is infinitely time-consuming. As a result, studies based on chart review are often limited, including only a small number of cases. Means to systematically examine patient charts will provide a method for clinicians to examine a significantly larger set of cases.

The value of considering more records simultaneously is the ability to then detect small variations, which may pinpoint important factors previously overlooked. Information scientists have the tools and capability to provide such a method to expand the research lens. We are working on the development of information extraction and mining techniques that accurately identify desirable information from transcribed consultation notes.

Clinical medical records contain a wealth of information, largely in free-text form. Means to extract structured information from free-text records is an important research endeavor. We are evaluating a system that extracts three types of information: numeric values, medical terms and categorical values from semi-structured patient records. Three different approaches are used to solve problems posed by each of the three data types and very good performance (precision and recall) is achieved.

A novel link-grammar-based approach was invented to associate feature and number in a sentence, and extremely high accuracy was achieved. A simple but efficient approach, using POS-based pattern and domain ontology, was adopted to extract medical terms of interest. Finally, an NLP-based feature extraction method coupled with an ID3-based decision tree is used to classify and extract categorical cases. This preliminary approach to categorical fields has, so far, proven to be quite effective.

 
 Back to Top

Close-up of gloved hands passing the surgical scissors.