Integrity Constraints for XML and Beyond
IIS-0093168
Principal Investigator
Wenfei
Fan
Internet Management Research Department
Bell Laboratories
600 Mountain Ave
Murray Hill,
NJ
07974-0636
(908) 582-0424
(908) 582-1239
wenfei@research.bell-labs.com
http://www.bell-labs.com/user/wenfei
Keywords
XML constraints,
static analyses,
constraint propagation,
data integration
Project Summary
The project is to develop XML specifications with
integrity constraints, to advance understanding of consistency
and implication of XML constraints, and to explore applications
of constraints in XML data integration, schema normalization,
information preservation and query processing.
Results from the project are expected to answer fundamental questions
in connection with integrity constraints for hierarchically structured
data, including but not limited to XML.
They will also provide methods, techniques and tools to deal with
semantic specifications, constraint propagation, data integration,
XML storage and query optimization.
Publications and Products
- Publications.
-
Michael Benedikt, Chee Yong Chan,
Wenfei Fan, Juliana Freire, and Rajeev Rastogi.
Capturing both Types and Constraints in Data Integration.
ACM SIGMOD Conference on Management of Data (SIGMOD), 2003.
-
Michael Benedikt, Wenfei Fan, and Gabriel Kuper.
Structural Properties of XPath Fragments.
The 9th International Conference on Database Theory (ICDT), 2003.
-
Susan Davidson, Wenfei Fan, Carmem Hara, and Jing Qin.
Propagating XML Constraints to Relations.
The 19th International Conference on Data Engineering (ICDE), 2003.
-
Wenfei Fan and Jérôme Siméon.
Integrity Constraints for XML.
Journal of Computer and System Sciences (JCSS), 66(1):254-291,
February 2003.
-
Peter Buneman, Wenfei Fan, and Scott Weinstein.
Interaction between Path and Type Constraints.
ACM Transactions on Computational Logic (TOCL) 4(4),
October 2003.
-
Peter Buneman, Susan Davidson, Wenfei Fan, Carmem Hara, and WangChiew Tan.
Reasoning about Keys for XML.
Information Systems, in press.
-
Aoying Zhou, Qing Wang, et al. and Wenfei Fan.
TREX: DTD-Conforming XML to XML Transformations.
SIGMOD demo, 2003.
-
Wenfei Fan and Leonid Libkin.
On XML Integrity Constraints in the Presence of DTDs.
Journal of the ACM (JACM), 49(3):368-406, 2002.
-
Michael Benedikt, Chee Yong Chan, Wenfei Fan, Rajeev Rastogi, Shihui Zheng,
and Aoying Zhou.
DTD-Directed Publishing with Attribute Translation Grammars.
The 28th International Conference on Very Large Data Bases (VLDB),
2002.
-
Chee Yong Chan, Wenfei Fan, Pascal Felber, Minos Garofalakis,
and Rajeev Rastogi.
Tree Pattern Aggregation for Scalable XML Data Dissemination.
The 28th International Conference on Very Large Data Bases (VLDB), 2002.
-
Marcelo Arenas, Wenfei Fan, and Leonid Libkin.
What's Hard about XML Schema Constraints?
The International Conference on Database and Expert Systems
Applications (DEXA), 2002.
- Prototype systems:
- PRATA: middleware for DTD-directed publishing of
relational data in XML.
- Schema-directed XML integration system.
- DTD-conformant XML-XML transformation
system.
- Content-based dissemination system
for XML data.
- System for computing constraint propagation from
XML to relations.
Project Impact
The project has produced specification languages for XML integrity
constraints, complexity results for static analyses of XML specifications,
and algorithms for reasoning about XML constraints. The project
has also generated prototype systems
for publishing, integrating and disseminating XML data based on
or making use of XML constraints; these prototypes
are to be extended and used by Lucent Technologies. The project has been
supporting one Ph.D. student at Temple University.
Goals, Objectives and Targeted Activities
- Accomplishments over the past year:
- Integrity constraint languages for XML. We have proposed
specification languages for a variety of XML constraints. These
constraint languages have proved
useful in data archiving, provenance, integration, schema normalization
and query optimization.
In particular, our key constraint specification language has been
adopted by XML Schema, the W3C standard for XML specifications.
- Static analyses of XML constraints and specifications.
We have developed inference systems, complexity results and
algorithms for reasoning about XML constraints; these results
settle the central technical questions associated with
the consistency and implication analyses of XML constraints in the
presence and absence of XML types (DTDs).
- Application of XML constraints.
We have proposed a method for normalizing and improving
relational storage of XML data based on constraint propagation
from XML to relations. The method and algorithms have proved
effective in practice.
We have developed a schema-directed framework for
publishing and integrating relational data in XML, which yields
the first systematic and effective approach to ensuring both
DTD-conformance and constraint satisfaction in XML integration
and publishing. We have also developed middleware for schema-directed
XML to XML transformations.
- Objectives for the next year:
- Explore techniques for using XML constraints in incremental
maintenance of XML views, information preserving XML transformations,
and security specifications and enforcement for XML data.
- Investigate constraint propagation from relations to XML,
and develop techniques for normalizing XML specifications based
on constraint propagation.
- Release of the PRATA prototype, middleware for schema-directed
integration of XML data.
Area Background
XML has become the prime standard for data exchange on the Web, and is
increasingly used to represent data currently residing in databases.
With this comes the need for a full treatment of integrity constraints
for XML such as key, foreign key, functional, inclusion and inverse
constraints, which are commonly found in databases to convey an
essential part of the semantics of the data. These constraints continue to be
important for XML data management in semantic specification, query
optimization, data integration and transformation. Furthermore, the study
of XML constraints will also shed light on management and integration of
scientific data, which, like XML, is hierarchically structured.
To take advantage of XML constraints, a number of technical problems
need to be resolved, since the constraint theory for traditional databases
is no longer applicable in the XML setting. We have settled a number of
theoretical questions in connection with XML constraints and specifications.
We have also developed constraint-based algorithms and systems for
improving relational storage of XML data and for conducting
schema-directed XML integration, transformations and dissemination.
The focus of the next stage is to explore practical techniques for
using constraints in, e.g., XML security specifications and
enforcement, information preserving transformations, and
incremental maintenance of XML views.
Potential Related Projects
The project is related to most of projects
on XML integration, transformations, publishing, and query processing.
Project Websites
http://www.bell-labs.com/user/wenfei/nsf_xml/career.html
Project description, publications,
and reports.