Abstract This project aims at the definition of techniques and the development of tools to validate several aspects of a database schema. In this way, we want to prevent specification errors being materialized into execution errors or in other inconveniencies during operation time.
This is an increasingly important problem in database engineering, particularly since database schemas are becoming more complex. We expect three different areas where we could apply the results of our research: More precisely, the main goals of our research in each of these areas can be summarized as follows: Relational database schema validation The main concern of this task is related to check whether a logical database relational schema correctly describes the requirements posed by the different users of the database.
Correctness and adequacy of the data managed by Data Base Management Systems DBMSs is vital to the more general aspect of quality of the data and thus of its usage by different applications.
Although a logical schema may be syntactically correct, as validated by current commercial DBMSs, it may contain several flaws which allow determining that the database is not semantically well- defined. These semantic flaws have been extensively reported in the literature and include the definition of properties such as satisfiability which ensures that the database may contain at least one tuple , table and view liveliness to guarantee that tables and views may be non-empty or 1 Email: Those are just some examples that illustrate the kind of flaws that may appear during the definition of a database schema.
Unfortunately, to our knowledge, no relational DBMS provides any tool to perform checks like the ones we have just seen. Our goal in this area is to develop a tool that would allow validating a logical database relational schema along the lines we have just outlined. Validation of the authorization rules of a relational database In order to protect the data stored in a database, the organizations must establish mechanisms to protect the information that may not be revealed to certain kinds of users.
Those strategies are usually based on a set of authorization rules that indicate, for each user profile, the actions that it may perform over the system objects. So, if a user wants to perform a query over a certain object, the appropriate authorization rule determines whether he is allowed to do that or, as it happens in Oracle VPD-Virtual Private Database, it rewrites the query such that it only retrieves the authorized data.
In this context, we plan to build a tool that helps to validate the definition of the authorization rules in a relational DBMS. More precisely, this tool should provide a means to the Data Base Administrator DBA to reason about the correctness of these rules. The approach is expected to be similar to the one proposed for logical database schema validation and we envisage the DBA to be able to express desirable properties and also to specify database states that stress appropriate requirements over the defined privileges.
Validation of multidimensional schemas A multidimensional schema is never defined from scratch and in an isolated way since it is always endowed to an underlying schema not multidimensional which contains all the data to be analyzed. Then, a multidimensional schema can always be seen as a view materialized or not defined over the operational systems of the company.
From all the available data, only a certain part would be of interest in a certain moment. Validation in this context can be seen from two different perspectives, which give raise to the different goals we have in this area. First, we plan to develop a tool that allows simplifying the work required to define multidimensional views. The main difference we have with regards to the goal of the previous areas is that now we will already start from a valid schema upon which we want to define a correct multidimensional schema.
Second, we plan to use validation techniques to detect and handle possible conflicts in the global schema resulting from the integration of different multidimensional schemas. Some of the expected results have already been achieved while unsolved problems are being tackled at the present moment. For these reasons, we do not foresee any problem to have a successful TIN accomplishment of the goals at the end of the project.
In the following we summarize the main achievements for each of the areas stated before. Relational database schema validation There are three significant contributions in this area. First, we have proposed a set of techniques to deal with conjunctive query containment checking in the presence of null values [RFT06, FNTU07]. An extension of those results has been developed to deal with logical database schema validation for databases that admit null values [Rull06].
This is not surprising since query containment is in the core of the method engine we use to check desirable properties of database schemas. Second, we have recently defined a method for computing explanations for unlively queries in databases [RFTU07a].
This method is independent of the particular method used to determine liveliness of a given query and it provides three levels of search: This method is useful for our purposes since it allows providing explanations when a certain desirable property of a database schema does not hold. Finally, we have also developed a web service folre. Validation of the authorization rules of a relational database Work in this area has been concerned with the study of the approach provided by the Oracle Virtual Private Database VPD to define the authorization rules in this DBMS, to understand the kind of rewritings that it performs to ensure that a given query retrieves only the data granted by the authorization rules, and to investigate how our techniques for database schema validation could be applied in this context.
Research in this area has begun, as scheduled, at the beginning of the present year. Validation of multidimensional schemas The first goal we have achieved in this direction has been the identification of valid dimensions to perform the analysis of a given fact. A technique that works over relational databases has been proposed [RA06] and an implementation of this technique has also been developed [RA07c]. We have also proposed a preliminary version of a technique that works over ontologies [RA07a], which is currently being implemented.
We are currently working also on the proposal of a guide to define valid hierarchies along a single dimension. This is being done on relational schemas by means of reconciling different star and snowflake relational schemas. In addition to the previous areas, which are the only ones we initially expected to address, we have identified a new area where we could provide new results as far as validation is concerned: Validation of mappings between data schemas Mappings between schemas are key elements in several contexts such as data exchange, data integration, peer data management systems, etc.
In all these contexts, the process of designing a mapping requires the participation of a mapping designer that needs a way to validate the mapping being defined, i. However, to date very little work has directly focused on the effective validation of schema mappings.
We have defined a new approach for validating schema mappings that allows the mapping designer to ask questions about the accomplishment of certain desirable properties of these mappings a preliminary version of this approach can be found in [Rull07, RFTU07b]. Four properties of mappings are defined in this approach: Then, any particular method for liveliness checking may be used to check whether the properties hold for a certain mapping. Moreover, we are also applying our techniques to an area, mappings between data schemas, which was not initially envisaged.
Publications We have already achieved several publications in top-level international conferences in databases and data warehouses. The quality of these conferences is a clear indicator of the relevance and originality of the results obtained so far. We have also obtained already several unpublished results that make us be optimistic regarding the number and quality of new publications in the remaining of the project.
Technology transfer Since the problems tackled in the project have a clear practical application, one of our main concerns is the technology transfer of the results of our research. In this sense, the work on mappings between database schemas is being partially supported by Microsoft Research Cambridge, who is financing a PhD student in this area.
Moreover, we have had several contacts with important companies that could be interested on the prototype tools we have already developed. Unfortunately, none of these contacts has been productive yet but we are still devoting our efforts towards this direction. Cooperation with other research groups As one may see from the publications below, we have collaborated with two international research groups: Moreover, the following students have developed their master thesis in the context of our project: Participation in academic events The members of the group have been and are being involved in the following academic events: Multidimensional Design by Examples.
Krakov Poland , September, Generating Multidimensional Schemas from the Semantic Web. Demo accepted, pending publication. Computing Explanations for Unlively Queries in Databases. Validation of Mappings between Schemas. Validation of Mappings between Data Models. Microsoft Research Summer School,