2.17. Access Control

The access control feature of xGT adds security and can be used to restrict users’ permissions to view and modify data. To restrict access to data, the administrator will configure xGT as described in Access Control.

2.17.1. Security Labels

Access control in xGT is based on security labels. Each user has a set of security labels, which we call the user label set, and is configured by the administrator as described in Configuring Groups and Labels. Security labels are also attached to data. Data can only be accessed from a Python session if the authenticated user’s label set contains all the labels attached to that data. For example, if some data is protected with labels “apple”, “banana”, and “strawberry”, then it can only be accessed if the authenticated user has all those labels in their label set.

Access control in xGT is enabled both on the frame and the row. For frame access control, each frame has a label set for each of four access types: create, read, update, and delete (frame access control, CRUD). For row access control, labels are attached to individual row elements: vertices in a vertex frame, edges in an edge frame, and rows in a table frame (row access control).

Both frame and row security are checked when accessing data in xGT. Frame security restricts the type of operation that can be performed by a user on that frame, while row security controls which elements of the frame will be accessible during that operation.

The following sections describe both frame and row security in detail.

2.17.2. Frame Access Control

A frame has four sets of labels, one for each of the following access types:

  • Create: The create access type allows the insertion of new rows into the frame.

  • Read: The read access type allows reading operations on the data rows held by the frame.

  • Update: The update access type allows the updating of existing rows in the frame. Columns of existing rows can be modified if this access type is granted.

  • Delete: The delete access type allows the deletion of existing rows from a frame. Delete access on a frame is also required to delete the frame itself.

Having create, update, or delete access requires also having read access.

xGT uses the term CRUD labels for the sets of labels attached to a frame.

2.17.2.1. Creating Frames with Frame Security

Security labels are attached to a frame by passing into a frame creation method the dictionary parameter frame_labels that contains a key for each of the access types:

frame_labels = { 'create' : ['label1', 'label2'],
                 'read'   : ['label2'],
                 'update' : ['label1', 'label2', 'label3'],
                 'delete' : ['label1', 'label2', 'label3']
               }
vertex_frame = server.create_vertex_frame(name = 'VertexFrame',
                                          schema = [['id', xgt.INT]],
                                          key = 'id',
                                          frame_labels = frame_labels)

Any number of labels can be attached for each CRUD access type. However, once a frame is created, its access labels cannot be changed.

2.17.2.2. Namespace Labels

Namespaces can also have CRUD access control.

  • Create: Required to create new frames in the namespace.

  • Read: Required to view the frames in the namespace or perform any operation on them.

  • Update: Not currently used.

  • Delete: Required to delete frames in the namespace.

Labels are attached to namespaces during creation with the parameter frame_labels, which has one key for each of the access types:

namespace_labels = { 'create' : ['label1'],
                     'delete' : ['label1']
                   }
server.create_namespace(name = 'graph',
                        frame_labels = namespace_labels)

If a namespace is implicitly created when a frame is created, the labels attached to the frame are also attached to the namespace.

2.17.2.3. Access Types for xGT Operations

xGT operations will, in general, require multiple access types to multiple frames simultaneously.

  • Performing a Cypher query on a frame requires the appropriate access to that frame but does NOT require any access to the enclosing namespace.

  • TQL MATCH operations require read access to the frames present in the structural part of the query (graph pattern) and require create access for the results table. In addition, create access is required for the enclosing namespace of a results table if the table doesn’t already exist.

  • CREATE and MERGE operations insert new rows into a frame and require create access on the relevant frame.

  • SET operations modify properties of existing elements of a frame and require update access on the relevant frame.

  • DELETE and DETACH DELETE operations remove existing elements of a frame and require delete access on the relevant frame.

  • load() and insert() operations read data into xGT from external sources and require create access on the relevant frame.

  • save(), get_data(), and get_data_pandas() operations write data from an xGT frame to an external target and require read access on the relevant frame.

  • create_table_frame(), create_vertex_frame(), and create_edge_frame() operations require create access on the enclosing namespace frame. For example, creating the frame graph__EdgeFrame requires create access on the namespace frame graph.

  • create_namespace() operations require create access on __, the top-level namespaces frame.

  • drop_frame() and drop_frames() operations require delete access on the enclosing namespace frame and on the frame being deleted. For example, deleting the table frame results__Result requires delete access on the enclosing results namespace and the Result frame.

  • drop_namespace() operations require delete access on __, the top-level namespaces frame. Requires read and delete access on all frames in that namespace.

  • get_frame() and get_frames() operations require read access on the frames being read. When no specific frames are requested for get_frames(), xGT returns the set of frames for which the user has read access.

  • get_namespaces() requires read access on __, the top-level namespaces frame. If read access on __ is granted, the names of all namespaces are returned regardless of the security labels on them.

  • schedule_job() and run_job() operations require create access on the xgt__Running_Jobs system frame (see System Frames).

  • wait_for_job() requires read access on the xgt__Running_Jobs system frame.

  • cancel_job() requires update access on the xgt__Running_Jobs system frame.

  • get_jobs() requires read access on the xgt__Running_Jobs and xgt__Job_History system frames.

  • get_config() operations require read access on the xgt__Config system frame.

  • set_config() operations require update access on the xgt__Config system frame.

2.17.3. Row Access Control

In addition to frame labels, security labels can be attached to individual rows of a frame. The word “row” means a frame element, such as a vertex in a vertex frame, an edge in an edge frame, or a row in a table frame. Each row can have up to 128 labels attached and permission to view the row is determined by matching its labels against those in the user label set of the authenticated user. Unlike frame security, row access control is binary and does not have CRUD types: a user can either access a row or not. Currently, row access control is not supported for namespaces.

In order to read, update, or delete a row of a frame, the user must have all the corresponding frame CRUD labels and must also have all the labels attached to that row. For example, only a user that has both read and update permissions on a frame can run a TQL query that uses the SET operation on elements of that frame, but it will only affect those elements the user can access.

2.17.3.1. Attaching Row Labels to Data

When the frame is created, the universe of all security labels that can be attached to any row in that frame is set using the parameter row_label_universe:

row_labels = [ 'label1', 'label2', 'label3', 'label4', 'label9' ]
vertex_frame = server.create_vertex_frame(name = 'VertexFrame',
                                          schema = [['id', xgt.INT]],
                                          key = 'id',
                                          row_label_universe = row_labels)

This universe cannot be changed after frame creation and is restricted to at most 128 unique labels. Note that the parameter row_label_universe is different from the parameter frame_labels. The former controls the universe of potential labels attached to future rows of the frame, while the latter controls the CRUD labels attached to the frame. Setting row_label_universe does not in itself restrict access to the data because access is only restricted when labels are attached to rows. Rows do not need to have any labels attached even if row_label_universe is non-empty. On the other hand, setting frame_labels does restrict access to the frame.

The security label of a row in a frame can only be set when that row is first created and cannot later be changed or removed. If the labels need to be changed, the data should be removed and re-created with new labels attached.

Security labels can be attached to rows when data is ingested into the frame by calling load() or insert() on a frame object. There are several ways to attach labels to the newly ingested data, as described in the section Setting Row Labels.

2.17.3.2. Security Labels Created by TQL Queries

Security labels may be attached to new rows that are created by running a TQL query. This includes result rows created with the RETURN... INTO... clause of a TQL query, new vertices created with a MERGE or CREATE clause, and new edges created with a CREATE clause. In these cases, the security labels attached are automatically computed based on the labels attached to the input data used to create the new rows.

If a query is run without the INTO keyword, the result rows are not stored in a frame on the server, as described in Storing Temporary Query Results in Jobs. In this case, the result rows do not have any row labels attached. The remainder of this subsection discusses row labels attached to new rows that are written to a frame on the server.

Each new row created through a TQL query and written to a frame is protected by the union of security labels found on any input rows used to produce that row. The input rows are those rows that match any element in the pattern in the MATCH clause of the TQL query.

Simple Queries

For example, in the simple query below, the pattern searched for is a single vertex satisfying certain property constraints. Each instance of the pattern that is found corresponds to one vertex and causes one result row to be written. The input row is the matched vertex and therefore the row labels on that vertex are attached to the result row.

MATCH (v:Vertex)
WHERE v.id > 100
RETURN v.id, v.name
INTO QueryResults

Note that only the row labels of those elements that are part of the single matching instance of the pattern are assigned to the result row. For example, if the query above finds 3 matches to the pattern, a vertex with row labels “label1” and “label3”, a vertex with no row labels, and a vertex with row labels “label1”, “label2”, “label9”, then there will be three result rows created, one with row labels “label1” and “label3”, one with no row labels, and one with row labels “label1”, “label2”, and “label9”. The row labels of the other matches do not affect the labels of a particular result row.

In the next example, the graph pattern contains two edges along with three endpoint vertices. The row labels attached to each result row are the union of the row labels on the matched elements: on the edges e1 and e2, the vertex v, and the two vertices that have not been given a name. The row labels of all three vertices involved in the pattern are assigned to the result because they were all required to produce the result, even if they are unnamed.

MATCH ()-[e1:Edge]->(v:Vertex)-[e2:Edge]->()
WHERE v.id < 500
RETURN v.id, w.id, e1.duration
INTO QueryResults

A TQL query will only ever find patterns in which all elements are viewable by the authenticated user initiating the query (in the user’s label set). Therefore, the row label attached to any result produced by the query will always be in the user’s label set. The user will be able to view any results produced by a TQL query they run, unless the user’s permissions change.

Degree Computation

When computing the indegree or outdegree of a vertex, the row labels of incoming or outgoing edges are checked before including them in the degree count. Many of the rows examined to compute the degree are not considered input rows as they do not match an element in the MATCH clause pattern for the current match, and the labels from these non-input rows are not attached to the result row. In the example below, an edge is matched and two degree computations are returned: one is the outdegree of v for edges in the OtherEdge frame and the other is the indegree of v for edges in Edge. The row labels of each result row contain the union of row labels of the matched vertices w and v and the matched edge e. However, labels of other edges in Edge connected to vertex v, other than e, are not added to the result row. Similarly, labels of any edges in OtherEdge connected to v are not added to the result row.

MATCH (w:Vertex)-[e:Edge]->(v:Vertex)
RETURN outdegree(v, OtherEdge), indegree(v, Edge)
INTO QueryResults

Aggregation Computation

TQL queries that involve aggregation may use many rows to produce a single result row and the labels of all rows involved in producing the result are applied to that result. The query shown below simply returns the number of pattern matches that were found. In this case, the row labels attached to the single result row will be the union of row labels on v, e, and w across all instances of the matched pattern.

MATCH (v:Vertex)-[e:Edge]->(w:Vertex)
WHERE e.duration = 1
RETURN count(*)
INTO QueryResults

The aggregation query shown below computes the minimum value of the duration property for each unique value of the port property across all pattern matches. There will be one result row for each unique e.port value that was found and the row labels of each result will be the union of v, e, and w labels for each pattern that had that e.port value.

MATCH (v:Vertex)-[e:Edge]->(w:Vertex)
RETURN e.port, min(e.duration)
INTO QueryResults

Creating Rows

The rules for automatically attached row labels are the same for new vertices created with the MERGE or CREATE clause, new edges created with the CREATE clause, and new result rows created with the RETURN clause. In the example below, the row label attached to each new vertex x will be the union of the row labels attached to the two edges and three vertices in the matched pattern.

MATCH (v:Vertex)-[:Edge]->(w:Vertex)-[:Edge]->()
WHERE v.country = US
MERGE (x:Vertex {id: v.id + 1000 })

Note that for the above query to succeed, the row label universe of the vertex frame Vertex must include all labels that could be assigned to any newly created vertex. For this query, any label in the row label universe of Edge that is in the user label set of the user initiating the query must also be in the row label universe of Vertex. For a discussion of the row label universe of a frame, see Attaching Row Labels to Data. The general rule is that if a frame is the target of a CREATE, MERGE or RETURN ... INTO ... clause, then the row label universe of that frame must contain all labels that are both in the calling user’s label set and in the row label universe of any input frame found in the MATCH clause. This requirement exists because the row label universe of any frame for which new rows may be created is checked before running the query. Therefore, this requirement depends only on the row label universes of input frames and the user label set, not on the labels attached to actual matches of the pattern. In the case of the example above, even if label “label9” is in the row label universe of frame Edge, it may be that no instance of the pattern has that label and so no new vertex will be created with that label. Nevertheless, “label9” would need to be in the row label universe of Vertex if it is in the calling user’s label set.

2.17.3.3. Performance

Row access control comes with a memory and performance cost compared to frame access control because labels must be stored for each row and validated each time the row is accessed. If row access control is not needed, it is recommended to create a frame with the row_label_universe parameter not passed in or an empty list. This will allow xGT to optimize away the storage for row labels and the row access control checks.

2.17.3.4. Result Frames with Row Labels

When running a TQL query, the INTO keyword may be used to specify a result TableFrame into which results are written. This result frame can either be created before the query is run or be automatically created by the query if it does not exist.

If the result frame is automatically created, it will be assigned the necessary row label universe, which includes every security label that might be attached to a result row. Specifically, the universe is the union of all labels in the row label universe of any frame used in the query that are also in the authenticated user’s label set.

In the example below, a query is run on the frames VertexFrame and EdgeFrame. The union of these frames’ row label universes is “label1”, “label2”, “label3”, and “label4”. This is the case even if some of these labels are not attached to any rows in the frames because the universe is determined at frame creation. If the label set of the authenticated user is “label1”, “label3”, and “label5”, then the result frame Results is automatically created with the universe containing “label1” and “label3”.

vertex_frame = server.create_vertex_frame(
    name = 'VertexFrame',
    schema = [['id', xgt.INT]],
    key = 'id',
    row_label_universe = ["label1", "label2"])

edge_frame = server.create_edge_frame(
    name = 'EdgeFrame',
    schema = [['source', xgt.INT], ['target', xgt.INT]],
    source = 'VertexFrame',
    target = 'VertexFrame',
    source_key = 'source',
    target_key = 'target',
    row_label_universe = ["label2", "label3", "label4"])

vertex_frame.load('vertex_data_file')
edge_frame.load('edge_data_file')

query =
"""
  MATCH (v:VertexFrame)-[e:EdgeFrame]->(w:VertexFrame)
  WHERE outdegree(v) < 10
  RETURN w.id
  INTO Results
"""
server.run_job(query)

If the result frame is created before running the query, all the required labels must be passed into the parameter row_label_universe when calling create_table_frame(). If a query is run using an already created results table, the results table’s row label universe must contain at least all the labels that would be assigned if it were automatically created by the query. Otherwise, xGT will report an error.

If the required row label universe is greater than the maximum allowed 128 labels, the query will fail and xGT will report an error. Because of this restriction, if several frames are expected to be accessed together in a TQL query, the union of their row label universe’s should not generally exceed 128 unique labels.

2.17.3.5. Viewing Security Labels

The security labels attached to the rows of a frame can be viewed by setting the parameter include_row_labels to true when egesting data. This is described in section Saving Row Labels.

2.17.4. Additional Rules

2.17.4.1. Edge Access

In order to view an edge, the authenticated user initiating an operation must also have access to both its source and target vertex. This means that the user must have the frame labels attached to the frames containing the source and target and must have the row labels attached to the source and target vertices. When running a TQL query, an edge will only be accessed if the user has the row labels of the endpoint vertices, even if these vertices are unnamed as in the example below:

MATCH ()-[e:Edge]->()

In order to ingest a new edge using load() or insert(), the user must have access to both the source and target vertices of this edge if they already exist:

  • If they do not exist, the ingest operation will implicitly create the vertices.

  • If they exist and the user has access to them, the ingest operation succeeds.

  • If they exist, but the user does not have access to either of them, inserting that edge will fail.

Because the user will not be able to view vertices they do not have access to, it may appear to the user that they are inserting an edge for which new vertices will be implicitly created, but instead the insert may fail.

2.17.4.2. Degree Computation

When computing the indegree or outdegree of a vertex, the result depends on the calling user’s label set. If a vertex has outgoing or incoming edges with security labels attached that are not in the user’s label set, they will not be included in the degree computation. In the example, below, the outdegree returned may vary for users with different label sets:

MATCH (v:Vertex)
RETURN outdegree(v)

Even if a degree computation is included in the WHERE or RETURN clauses, the labels attached to any new row created by a TQL query will not include the row labels of edges accessed to compute the degree. In the example above, only the labels of v are attached to each result row.

Degree computations, both global and relative, can access frames that are not used anywhere else in a query. Consider the query above that computes the global outdegree of v. It sums the outdegree of v across all edge frames that have Vertex as the source vertex frame. It may be that the user doesn’t have read access to some of those frames. In this case the query succeeds, but the outdegree of v from the frames the user doesn’t have access to is not included in the outdegree result. This is analogous to not including rows with security labels that the user doesn’t have access to see.

However, the behavior is different when a user explicitly gives an edge frame to request a local degree computation. Consider the following query:

MATCH (v:Vertex)
RETURN outdegree(v, OtherEdge)

If the user doesn’t have permission to read OtherEdge, then the query will fail with a permission error.

2.17.4.3. Row Label Universe Requirements

As discussed in Result Frames with Row Labels, if a TQL query is run with the INTO keyword, the result frame specified must have the required row label universe. This is done to ensure that the result frame supports all labels that might be attached to any new row that could be inserted into it by the query. A TQL query may fail due to this requirement. If the result frame was created before running the query, the query may fail if the frame was created with a row label universe that does not contain all labels required. If the result frame was not created before running the query, the query mail fail when trying to create it if the required row label universe is greater than 128 labels.

Any frame that is the target of a CREATE or MERGE clause in a TQL query has the same requirements for its row label universe as a result frame does. Its row label universe must contain all labels that are both in the calling user’s label set and in the row label universe of any frame found with the pattern in the MATCH clause. This is done to ensure that the frames support all labels that might be attached to any new row that could be created by the query. A TQL query may fail if any frame does not satisfy this requirement.

The requirements on the row label universes of frames that are targets of CREATE, MERGE, or RETURN... INTO... means that the same query may run for one user and fail for another user if they have different user label sets. This can occur because when a frame’s row label universe is checked for correctness, labels that are not in the user’s label set are not required. Consider the query shown below, where the security labels of each new edge created will be the union of row labels of v, w, and e.

MATCH (v:Vertex)-[e:Edge]->(w:Vertex)
CREATE (v)-[new_edge:OtherEdge]->(v)

The frame OtherEdge’s row label universe must support the labels that will be attached to any edge created. This can include labels from the row label universe of Vertex and the row label universe of Edge. However, because a query can only access data the user has permissions for, it will only match v, w, and e whose labels are included in the user’s label set. Therefore, xGT knows that OtherEdge need only support row labels that are in the user label set.

Dropping a Frame

Dropping a frame only requires frame permissions: read and delete access on the frame and delete access on the enclosing namespace. A user can drop a frame even if they do not have access to all rows in the frame.