DataNet Federation Consortium
CCI Project Lead: William Regli
Project Leaders: Reagan W. Moore, Arcot Rajasekar, John Orcutt, David Maidment, William Regli;
Senior Personnel: Jose-Marie Griffiths, Alan Blatecky, Andrea Chiba, Ken Galluppi, Ryan Boyles, Peter Robinson, Michael Wan, Richard Marciano, Helen Tibbo, Cal Lee, Jonathan Crabtree, Kenneth Bollen, Julian Lombardi, Chitta Baral, Sudha Ram, Thomas Palmeri, Paul Sheldon, Art Pasquinelli, Jorge Lobo.
Lead Institution: University of North Carolina at Chapel Hill;
Supported by: University of North Carolina at Chapel Hill, University of Texas, Austin, Drexel University, University of California, San Diego, Duke University, Arizona State University, University of Arizona, Vanderbilt University, North Carolina State University, Sun Microsystems, IBM.
About the Project
The DataNet Federation Consortium (DFC) will build a policy-driven national data management infrastructure that addresses both the science and engineering data life cycle and the sustainability of data collections and repositories. The motivation for building the DFC comes from the data management requirements from the NSF Science of Learning Centers (EEG / MRI sensor data, video), the NSF Ocean Observatories Initiative (real-time data streams, simulation output, video), the NSF Consortium of Universities for Advancement of Hydrologic Science (point data), the iPlant collaborative (genome databases), the Odum social science institute (statistics) and engineering projects in hydrology and CAD/CAM/CAE archives.
Our approach uses the integrated Rule Oriented Data System (iRODS) to characterize management polices as rules controlling the execution of remote procedures. The DFC federates six principal communities of practice:
- Science and engineering projects that drive requirements for construction of data collection
- Institutions that promote long-term sustainability through re-purposing of data for educational use as reference collections
- Data management technology providers that implement the services needed to enable advances in science
- Storage facility providers
- Data management communities that establish policies and data interchange standards
- Education initiatives that promote student access to active data collections
Each community of practice is itself a federation across academic institutions, regional consortia, state institutions, federal institutions, and international collaborations. We view federation as a socialization process that develops consensus on standards for data management policies. By differentiating data life cycle changes as evolution of management policies, it is possible to build generic data management infrastructure that sustains all of the communities of practice. We are already building the data life cycle management infrastructure through integration of iRODS with workflow systems, digital libraries, preservation environments, social networking tools, and education tools. We will promote sustainability by enabling multiple communities to use and “own” a data collection. We are collaborating with federal agencies, vendors, and international projects on the development and application of reference implementations of generic data management infrastructure for use in data sharing, data analysis, data publication, and data preservation.
We will integrate research and engineering collections into education classes through integration of appropriate analysis, governance, and publication policies, and give workshops and summer school sessions on application of the technology. The ability to involve students in research on “live” data, reinforced with research results from the Science of Learning Centers, can revolutionize interest in science. The ability to federate across existing collections will make it possible to build collections that span institutions, regions, and agencies enabling participation by local projects in national initiatives. The ability to compare archived results with real-time observations will drive new modes of research that dynamically control remote sensors. The integration across multiple communities of practice ensures representation from all stakeholders in the data life cycle. The DFC will implement a preservation environment to facilitate and encourage re-use of data for both research as well as education.