| Architect: Security
Authorizing Access to Dynamic Spatial-Temporal Data
Authorizing access to individual data objects based on spatial and temporal references is a complex task. Read this case study to learn one feasible approach.
by Bart van Velden
Published April 2008Authorizing access to individual data objects in a collection is a challenge when both the collection and the user group are large and dynamic. It becomes even more complex when the authorization policies are based on the spatial and temporal references of the data objects. Using the real-world scenario of my employer, CycloMedia Technology, this article will present a solution to this issue in the combination of two components of Oracle Database Enterprise Edition: the Spatial Option and the Virtual Private Database feature.
CycloMedia specializes in the large-scale and systematic visualization of environments based on 360-degree panoramic photographs, or cycloramas. To create a cyclorama, large areas are photographed and entered in an online database. With each recording, the location, orientation, and time are registered, which makes versatile applications possible, such as 3-D measurements and modeling. (See example below.)
Figure 1 Example cyclorama with geospatial data
DCR7 is CycloMedia's latest model in a chain of recording systems developed in-house and represents a quantum leap in visual quality, metric precision, and high-speed visual recording. With DCR7, which can produce cycloramas with a 5-meter interval with a speed of 80 km/hour, CycloMedia intends to visualize a large percentage of Europe's public space. Due to these developments, the cyclorama collection is expected to grow rapidly.
The Authorization Challenge
The CycloMedia case presents a scenario in which the access to a dynamic collection of objects with spatial as well as temporal references has to be authorized to a large group of users. The authorization parameters are based on the spatial and temporal dimensions of the collection.
The traditional approach to this issue is the creation of static data sets or an authorization matrix to describe each single relation between a subject (a client or user) and each object in the collection. The construction of these data sets and matrixes is commonly based on the use of a specialized tool to evaluate the spatial relation between objects and the authorized areas.
Building and maintaining ad hoc data sets or authorization matrixes to support access control is not suitable when both the user community and the collection are large and dynamic. Furthermore, ad hoc data does not support flexible protection granularities and dynamic changes in the access control policies.
Several recently proposed solutions show various disadvantages. The main cause of those disadvantages is related to the architecture of the proposed solutions: the enforcement of the authorization is performed outside the database or after a query has been executed, thereby restricting the usage of the data. Even the GeoXACML architecture, which is currently proposed as a new standard by the Open Geospatial Consortium, is found to be problematic. It is based on standards for spatial data (GML, WMS) and authorization (XACML) and provides a solution for access control of unprotected Web Map Services without modifying the existing infrastructure. For this, it intercepts messages to a WMS, executes the retrieval task on the targeted WMS, passes the retrieved results to a policy decision point, and creates a result set based on the derived authorization decisions.
This prototype architecture, on which the concept is largely based, shows some very inefficient consequences: all data is selected from the original database, is then converted to GML, and is split and evaluated on a feature-by-feature basis against the authorization policies by an external tool. Spatial indexes in the original database thus cannot be used, and additionally the spatial comparison functions have to be implemented in a separate component. Moreover, the architecture is incapable of executing complex analysis tasks, because the data is first selected and afterward filtered against the authorization policies. Even a simple nearest-neighbor query is problematic: the nearest object selected in the first place might turn out to be inaccessible afterward.
Database-Level Authorization Evaluation
Because mainstream databases have implemented spatial datatypes and spatial functions (commonly based on the SFS [spatial types and functions] standard from the Open Geospatial Consortium), it seems feasible to enforce the authorization policies on the database level. The current SQL authorization mechanism is, however, restricted to the level of tables, views, and columns. One could think of creating views for each single user, but again this approach is not scalable with a large number of users and changing policies.
Because of the above issues, current information systems typically bypass the database access control facilities and embed access control in the application program used to access the database. This can be the end-user application or a middleware application. It is problematic to embed the access control in the end-user application, if one has no control over the application source or if multiple applications are used. Changes in the authorization policies also have to be applied to multiple control mechanisms, but even then one depends on application update policies for the changes to become effective. Moreover, the risk of users or hackers submitting arbitrary queries is present. A second option is to construct the query in a middleware application. This application should then edit queries executed by the end-user applications to incorporate the authorization policies. This can, however, become tricky in the case of complex retrieval tasks and changing authorization policies. Another option is to provide several default retrieval tasks as functions to the end-user application. Although this would probably simplify the process, it directly limits the options for the users.
Database-Level Authorization Enforcement
For the above reasons, fine-grained access control should ideally be both specified and enforced at the database level. Shariq Rizvi of U.C. Berkeley and others have proposed the Truman model, which is based on query modification on the database level to include authorization policies. Although others have previously discussed the concept, the Truman model generalizes the approach of query modification on the database level using a parameterized view framework. The idea behind the Truman model is to provide each user with a personal and restricted view of the complete database. To realize this, user queries are modified to make sure that the user does not get to see anything more than is allowed. The authorization policies, which are in fact logical expressions, are added to posed queries as predicates. The figure below visualizes this model.
Because the modification of the query is done transparently to the user, the user may not even notice the existence of the access control mechanism. The architecture is thus perceived by the end user as if no restrictions are applied to the data, and that he has access to all objects in the table. This view of the Truman model is visualized below:
The Truman model is sometimes called fine-grained access control (FGAC) or row-level security (RLS). (The model's name was inspired by the artificial world spun around the character of Truman Burbank in the 1998 movie The Truman Show.)
The benefits of this model include a single point of authorization enforcement, the option of having a dynamic collection, end-user query abilities, and efficient data processing (which is a major aspect with spatial data).
Oracle Virtual Private Database
Oracle Virtual Private Database (VPD) has several other names within the Oracle documentation, including FGAC and RLS. Regardless of the name, VPD security provides a whole new way to control access to the data. It is based on the idea of having a defined security policy function attached to a database table or view, which is executed each time data in the table or view is queried or altered. This function returns an additional piece of SQL—called a predicate—that is attached to the original SQL's WHERE clause, before the SQL is used. It thus matches the concept of the Truman model. The query modification is done in the query optimizer and is actually performed when the SQL is parsed and executed. When the SQL is executed, it is actually the modified SQL that is executed on behalf of the user. This means that the policy function controls which rows of data are returned. The process can be thought of as a system trigger that is executed when a table is accessed that has a policy defined. An important characteristic is the dynamic nature of a VPD.
To investigate the suitability of the Truman model for the projected scenario, we set up a test implementation on Oracle Database 10g Enterprise Edition with the Spatial option enabled. The VPD component, which is included only in the Enterprise Edition of Oracle Database, is used for the query modification. The collection was relatively easy to construct in the database; a single table with columns for the image_id, recording_location, and recording_datetime was sufficient. The image_id is designated as the primary key. Because the location and datetime are used in the authorization predicates and retrieval tasks, indexes are created on both. The recording_location is of type SDO_GEOMETRY. This is a spatial datatype that can hold various kinds of spatial geometries.
Next, the raw data was imported into the database. As the original data was in the Dutch National Grid format, a conversion was made to the World Geodetic System (WGS84). The collection finally contains nearly 10 million images recorded in the Netherlands in a time period of about 10 years.
Next, we designed a data model to include the contracts, clients, users, and accessible ranges of each contract, being of spatial and temporal type. A public view called condition_sets was created that combined data from those tables, thus providing all "accessible" combinations of spatial and temporal ranges. Finally, some test data was inserted.
To restrict the user to only his own data, we defined the following predicate for the condition_sets:
(WHERE) client_id = SYS_CONTEXT('THE_CTX','THE_CLIENT_ID')This predicate prevents the user from accessing data that is not authorized to him, by removing all rows with a client_id, other than the one which was set in the the_client_id value in the session context (the_ctx). This session context is created with a login trigger that determines to which client this user belongs. As VPD also provides a function to set the context using a function, the session between the application server and the database can be made persistent, and the login trigger can be removed.
Now this predicate has to be attached to the condition_sets table. Oracle provides the function dbms_rls.add_policy for this. However, it cannot directly add the predicate but needs a function that returns the predicate. Such a function is called a policy function. I thus created a function client_id_security in a package I named exp_security:
CREATE OR REPLACE PACKAGE exp_security AS FUNCTION client_id_security(owner VARCHAR2, objname VARCHAR2) RETURN VARCHAR2; END exp_security;The body of the function is as follows:
CREATE OR REPLACE PACKAGE BODY exp_security IS FUNCTION client_id_security(owner VARCHAR2, objname VARCHAR2) RETURN VARCHAR2 IS predicate VARCHAR2(2000); BEGIN predicate := 'CLIENT_ID = sys_context(''THE_CTX'',''THE_CLIENT_ID'')'; RETURN predicate; END client_id_security; END;Now that we have a policy function, the special function dbms_rls.add_policy can be executed. This function attaches the policy function to a defined table or view. When data from the table is selected, the policy function is executed and returns the predicate. This predicate is used to modify the query before it is executed. The first parameter of the dbms_rls.add_policy function defines the user who owns the table (or view), which is defined as the second parameter. The third parameter gives this new policy a name, which might be used later to remove or alter it. The fourth and fifth parameters define which policy has to be added and where it can be found. The final parameter defines that the policy should be used only when data is selected.
CALL dbms_rls.add_policy('BART', 'condition_sets', 'condition_sets_policy', 'BART', 'exp_security.client_id_security', 'SELECT');Now each time the view bart.condition_sets is queried, the predicate that restricts the rows to the current user is returned from the policy function bart. exp_security.client_id_security. This predicate will now be used for query modification as described in the Truman model. In the following paragraphs, this view will be used to authorize the actual data. In the evaluation section, you will see the results of query modification for a user posing queries.
Images_authorized and images_unauthorized table
The images table should be authorized in two ways, by two different policies. The first policy should exclude all rows that do not qualify to the authorized ranges in condition_sets. The second policy should exclude all rows of images that are authorized and also hide the imageid of all remaining. This way a user can see where and when images were made that he currently cannot view. Later we will see that this can be used as a marketing instrument.
Two options are available to realize this: the use of two public synonyms and the use of views. But because the Oracle documentation states that a column-level policy, which is necessary for hiding the ImageId, cannot be applied to a synonym, the only available option is the creation of two views for the images table. Because the removal of the image_ids for the unauthorized images can as well be done in the definition of the view, as opposed to a specific column-level policy, I chose this option.
The spatial evaluation should check for the recording_location to be inside the specified area (the geo column). Oracle Spatial provides the function SDO_INSIDE(geometry1, geometry2) for this. The first parameter specifies a geometry column in a table, and the second parameter specifies a geometry from a table or a transient instance of a geometry. This means that this function cannot be used in a query such as this:
SELECT * FROM images WHERE sdo_inside(recording_location, SELECT geo FROM condition_sets) = 'TRUE';Instead, this query should be rewritten as follows:
SELECT * FROM images, condition_sets WHERE sdo_inside(recording_location,geo) = 'TRUE';This requirement makes it impossible to define a policy function that adds an authorization predicate to a spatial data table with the default spatial function. One could construct an alternative function that performs a check for each image on all spatial areas, but this would require extra work and would likely result in decreased performance. Therefore, performing complex spatial evaluations in a predicate added by the Truman model seems problematic. So I had to choose another option: instead of defining the views images_authorized and images_unauthorized as copies of the original images table, both are defined as a cross-join of the images table and the conditions_sets view. This way, a predicate based on the default spatial function can be used. The SQL statement for the images_authorized should be as follows:
CREATE VIEW images_authorized AS SELECT * FROM images, condition_setsWhen the temporal predicates are added, the complete predicate for the images_authorized view that would be defined in a policy function is this:
(WHERE) recording_datetime >= start_date AND recording_datetime�<= end_date AND SDO_INSIDE(recording_location,geo) = 'TRUE'But this solution contains a hitch: in case an object from the collection meets the conditions for multiple condition sets, it undesirably appears multiple times in the images_authorized view. To fix this problem, the user has to use the distinct selector in each query, which is a very unsuitable way out. A similar problem exists for the fields coming from the condition_sets view, which are also unwanted. Another new view accessible to the user could fix this problem:
CREATE VIEW images_authorized_fixed AS SELECT DISTINCT imageid, recording_datetime, recording_location FROM images_authorized;As you can see, the predicates added to the images_authorized (and also the images_unauthorized) view do not include a reference to the session context, because this was already present in the predicate for the condition_sets view. Because of this, the spatial and temporal predicates can also be included in the actual definition of the view. The fields from the condition_sets view can also be excluded. Here is the resulting SQL statement:
CREATE VIEW images_authorized AS SELECT images.* FROM images, authorized_sets WHERE recording_datetime >= start_date AND recording_datetime <= end_date AND SDO_INSIDE(recording_location,geo) = 'TRUE';Under this approach, no spatial and temporal predicates have to be defined in policy functions. An analysis of both designs showed no difference in performance. This indicates that the manipulation of queries is not a heavy job for the query optimizer.
The above solution fits the projected purpose: the data is indeed authorized on the spatial and temporal dimensions. The Truman model is used to restrict the rows in the condition_sets view to those of the current user, and based on this personalized set of conditions, the actual data is authorized by joining tables. The new views that are created this way only contain the data that matches the personalized conditions from the condition_sets. The single query modification performed by the Truman model is thus the basis for the full authorization mechanism.
The final architecture provides each user with three tables. One table contains the authorized ranges, one contains the accessible data, and one reveals what data is not accessible. The last table thus contains all data from the collection minus the data of the second table. The last table only describes the information objects by their spatial and temporal attributes.
Each of the tables is actually a public view based on a set of physical tables, which belong to a single database administrator and to which normal users do not have direct access. Although each view exists only once in the database, the contents vary for each user.
To utilize the architecture, a special software component has been developed. With the tool retrieved, results can be presented in an understandable manner. It is based on use of the 3-D GIS application Google Earth, which is able to dynamically retrieve data from a Web server in KML (Keyhole Markup Language) format. For authentication, Basic HTTP Authentication is used. As expected, this shows up in Google Earth after activating the network link, which targets the Web server:
The Web server uses the provided credentials to connect to the database. After a successful connection has been established, queries can be executed. The results are converted into KML format and returned to the GIS application. The query that the server poses does not include any data authorization mechanism, because this task is the sole responsibility of the VPD mechanism inside the database. The SQL statement to select all authorized images in the current viewport (the window) is therefore simply as follows:
SELECT imageid, recordingdate, recordinglocation FROM bart.images_authorized WHERE SDO_FILTER(recordinglocation, ?window ) = 'TRUE'
Visualizing Authorization Decisions
Figure 5 below shows the GIS application for an authenticated user. This user is authorized to the green area on the left-hand side, combined with a broad time range. Each green marker represents an authorized object (a cyclorama) as a result of the query specified above. Cycloramas outside the authorized area are removed from the result set by the database itself. As expected, all green markers are found inside the authorization polygons (and between the two layers representing the temporal range). The unauthorized objects are added to visualize the effectiveness of the authorization mechanism. To retrieve these unauthorized objects, a query similar to the one above was executed using the view images_unauthorized.
Nearest-Neighbor Query: Information Leakage
Besides window query, the nearest image to a location is important in the CycloMedia case scenario. Oracle Spatial provides the sdo_nn function for this:
SELECT| imageid, recordingdate, recordinglocation |FROM| bart.images_authorized |WHERE| |SDO_NN|(recordinglocation, ?geometry, |'sdo_batch_size=10'|) = |'TRUE'| and ROWNUM <2;|After executing this query, the first results looked satisfying. A nearest image was found in most cases. However, I stumbled upon the fact that at some locations no result was found. Because no constraints (such as a maximum distance) were defined, this was unexpected. An investigation using some test data revealed that the nearest-neighbor query may, as described in the manual, "need to be evaluated multiple times in order to return the desired number of results that also satisfy other conditions in the WHERE clause." However, the WHERE clause in the above statement doesn't contain any other conditions. After executing some queries on some test data, I found out what had happened: no result is returned if an unauthorized object is found nearer than the nearest authorized image. The consequence is that information is leaked about where unauthorized objects are located. To fix this, the sdo_nn function also needs to be evaluated multiple times in case the referenced table is created using a join that filters the data, as is the case in the image_authorized table. As a user might be unaware of this, this finding is surprising and signifies a serious problem because data is leaked by the database.
I was able to create a workaround for this problem by using the within_distance function to create an authorized subset, order the results, and return the first row. But this does not fix the issue with the normal nearest-neighbor function.
A Complex Spatial-Temporal Task
In the CycloMedia scenario, users are often interested in the newest cycloramas only. The current approach to realize this is based on the recording strategy and the use of data sets: areas are photographed completely and put in a new data set, and those replace any existing data set. The downside of this approach is that when a new data set is incomplete, a gap is created. This is of course undesirable for a solution in situational awareness. Besides this issue, the management of the data sets requires a lot of work.
To accomplish this task in the new architecture, a proper query has to be formulated. This is a rather complex task, as it involves both spatial and temporal relations between objects. For each object, the spatial distance to other older objects has to be analyzed, and when such an object is found, it should be excluded from the result set. However, when the temporal distance is small, it should not be excluded, because both objects are relevant. To understand this, one should realize that the cycloramas are recorded in a continuous sequence. Because of this, two cycloramas with a small spatial distance are also likely to have a small temporal distance. A certain temporal threshold is thus important. In SQL this task was relatively easy to define, and the results were as expected.
The figure below illustrates the results of this query in Google Earth. The vertical dimension, normally the altitude, is used to visualize the temporal dimension in the collection. The highest objects are the newest. The green cylinder around each newest object represents the spatial-temporal range it covers. Objects inside such a cylinder (depicted with a yellow marker) can be removed from the result set, because they are no longer relevant.
Besides correct authorization decisions, the performance of the architecture is important. The testing platform on which the architecture was built is not ideal: the Enterprise DBMS, the middleware software, and the GIS application are running on the same machine. However, the overall performance was sufficient, because the test implementation was able to satisfy the requirement of using a direct manipulation interface. Some scalability testing also confirmed this. Performance is not a problem for the Truman model, even with complex spatial and temporal authorization conditions.
This article proposed and confirmed that by utilizing two newly introduced concepts for a database management system—namely the Truman model and SFS—dynamic collections can dynamically be authorized on the spatial and temporal dimensions by the DBMS itself, while providing support for the retrieval.
A main advantage of this architecture is that the authorization mechanism doesn't obstruct the usage of the data in any way. Another important benefit of the architecture is that it is able to handle a dynamic data collection. A third benefit is that the common range of tools used for spatial data visualization and management can still be used without modification.
By visualizing the results of several retrieval tasks in the GIS application, we could monitor the correctness of the authorization mechanism was monitored. This shows that in general both spatial and temporal authorizations are correctly enforced. However, using the default function to find the nearest neighbor, we detected some information leakage in the authorization mechanism. This might be considered a major problem for some applications and affects the general confidence in the architecture. Additional research on this topic is necessary.
In an article recently published by Microsoft Research, the Truman model (which is not implemented in Microsoft SQL Server) is discussed. It presents a novel approach to assigning predicate grants by the design of a strict generalization of the current SQL authorization mechanism. Here is an example granting query:
GRANT SELECT employees WHERE emp_id = user_id() TO PUBLICPredicates are thus included in a normal grant statement. This is opposed to the Oracle VPD implementation, which decouples the policy specification from the SQL grant model. In my opinion, this new approach is the way fine-grained access control should be, and I would like to see this proposal become the basis of a full reference implementation.
Bart van Velden (email@example.com or firstname.lastname@example.org) is a junior product manager in the Technology Concepts and Propositions group within the Research and Developments department of CycloMedia Technology. He studied information science at Utrecht University in the Netherlands, with a thesis project on the authorized access of dynamic spatial-temporal collections. The CycloMedia case scenario was used in this research project.