Integrity Constraints for XML and Beyond

IIS-0093168


Principal Investigator

Wenfei Fan
Internet Management Research Department
Bell Laboratories
600 Mountain Ave
Murray Hill, NJ 07974-0636
(908) 582-0424
(908) 582-1239
wenfei@research.bell-labs.com
http://www.bell-labs.com/user/wenfei

Keywords

XML constraints, static analyses, constraint propagation, data integration

Project Summary

The project is to develop XML specifications with integrity constraints, to advance understanding of consistency and implication of XML constraints, and to explore applications of constraints in XML data integration, schema normalization, information preservation and query processing. Results from the project are expected to answer fundamental questions in connection with integrity constraints for hierarchically structured data, including but not limited to XML. They will also provide methods, techniques and tools to deal with semantic specifications, constraint propagation, data integration, XML storage and query optimization.

Publications and Products

Project Impact

The project has produced specification languages for XML integrity constraints, complexity results for static analyses of XML specifications, and algorithms for reasoning about XML constraints. The project has also generated prototype systems for publishing, integrating and disseminating XML data based on or making use of XML constraints; these prototypes are to be extended and used by Lucent Technologies. The project has been supporting one Ph.D. student at Temple University.

Goals, Objectives and Targeted Activities

  • Accomplishments over the past year:
    • Integrity constraint languages for XML. We have proposed specification languages for a variety of XML constraints. These constraint languages have proved useful in data archiving, provenance, integration, schema normalization and query optimization. In particular, our key constraint specification language has been adopted by XML Schema, the W3C standard for XML specifications.
    • Static analyses of XML constraints and specifications. We have developed inference systems, complexity results and algorithms for reasoning about XML constraints; these results settle the central technical questions associated with the consistency and implication analyses of XML constraints in the presence and absence of XML types (DTDs).
    • Application of XML constraints. We have proposed a method for normalizing and improving relational storage of XML data based on constraint propagation from XML to relations. The method and algorithms have proved effective in practice. We have developed a schema-directed framework for publishing and integrating relational data in XML, which yields the first systematic and effective approach to ensuring both DTD-conformance and constraint satisfaction in XML integration and publishing. We have also developed middleware for schema-directed XML to XML transformations.
  • Objectives for the next year:
    • Explore techniques for using XML constraints in incremental maintenance of XML views, information preserving XML transformations, and security specifications and enforcement for XML data.
    • Investigate constraint propagation from relations to XML, and develop techniques for normalizing XML specifications based on constraint propagation.
    • Release of the PRATA prototype, middleware for schema-directed integration of XML data.

Area Background

XML has become the prime standard for data exchange on the Web, and is increasingly used to represent data currently residing in databases. With this comes the need for a full treatment of integrity constraints for XML such as key, foreign key, functional, inclusion and inverse constraints, which are commonly found in databases to convey an essential part of the semantics of the data. These constraints continue to be important for XML data management in semantic specification, query optimization, data integration and transformation. Furthermore, the study of XML constraints will also shed light on management and integration of scientific data, which, like XML, is hierarchically structured. To take advantage of XML constraints, a number of technical problems need to be resolved, since the constraint theory for traditional databases is no longer applicable in the XML setting. We have settled a number of theoretical questions in connection with XML constraints and specifications. We have also developed constraint-based algorithms and systems for improving relational storage of XML data and for conducting schema-directed XML integration, transformations and dissemination. The focus of the next stage is to explore practical techniques for using constraints in, e.g., XML security specifications and enforcement, information preserving transformations, and incremental maintenance of XML views.

Area References

Potential Related Projects

The project is related to most of projects on XML integration, transformations, publishing, and query processing.

Project Websites

http://www.bell-labs.com/user/wenfei/nsf_xml/career.html
Project description, publications, and reports.