4.6. Introduction to the Trovares Query Language (TQL)

At this point you have loaded data into the system and defined vertex and edge frames with their associated schemas in order to represent your data as a graph. This document discusses how to phrase and ask interesting queries on graph data stored in the xGT system.

The Trovares Query Language (TQL) uses a subset of the Cypher language to express queries. The supported subset of Cypher enables powerful and expressive queries, while taking advantage of xGT’s strongly typed graph elements in order to achieve very high performance and scalability.

A TQL query consists of several components (some of which are optional):

  • Optional: structure (description of the “shape” of the pattern).

  • Optional: constraints on the graph elements’ properties.

  • Optional: modifications to the graph elements’ properties.

  • Optional: additions to the graph topology itself.

  • Optional: deletions from the graph topology itself.

  • Required: query section separator (if query contains multiple sections).

  • Optional: description of the final answer set.

  • Optional: solution modifiers (including ordering & limiting results).

  • Optional: results table to store the query result in.

TQL’s Cypher subset has specific syntax for each component of a query:

MATCH "structure"
WHERE "optional constraints on properties"
SET "optional property modifications"
MERGE "optional additions of vertices"
CREATE "optional additions of vertices and edges"
DETACH DELETE "optional deletions of vertices"
DELETE "optional deletions of edges"
WITH "query section separator"
RETURN "optional description of the answer set"
"optional solution modifiers"
INTO "optional results table name"

Note that if there are no constraints supplied, the WHERE keyword should not be present in the query.

We describe each of these components in detail.

4.6.1. Structure

The basic concept behind querying graph data in xGT is the representation of what to search for in the graph. A structure is a template of what to match in the large data graph stored in xGT’s system memory.

xGT uses the query structure as a template to match elements of a large data graph and extract them into an answer set.

More precisely, the structure describes the type and topology of the graph elements that are part of the answer.

The type of a graph element is its membership in a particular vertex or edge frame. Each instance of a vertex or edge in xGT can belong to only one frame.

A vertex frame is a collection of vertices that have the same property names and types. We can think of these vertices as representing the same type of entity. For example, in an employment graph, all vertices representing a person may belong to a Person frame, while all vertices representing a company may belong to a Company frame. Similarly, an edge frame is a collection of edges that share the same property names and types.

The topology of the graph elements describes how vertices and edges connect to each other. The topology is restricted by the types of the vertices and edges. The declaration of an edge frame includes the frames of vertices it connects (one frame for the source of the edge and one for the target).

TQL’s Cypher subset expresses the structure as a syntactic construct in its MATCH statements. The structure is described as a collection of connected vertices and edges. Each vertex is represented by a variable enclosed in parentheses (), while each edge is represented by a variable surrounded by brackets [].

Inside each vertex or edge component, there are two elements that can be provided. The first one is an annotation to indicate to which frame (type) the vertex or edge must belong. This is expressed by the use of a colon followed by a qualified frame name: (:career__People) or [:career__WorksFor]. In these examples, the vertex must belong to the People frame and the edge must belong to the WorksFor frame. Note that in both cases the frames are stored in the career namespace. An optional variable name can be added to the vertex or edge to bind that graph element to a particular name. Consider the examples of (a:career__People) and [b:career__WorksFor]. In these cases the graph elements that match can be referred to in the rest of the query by the names a and b. If a variable is not given a name, it will force the specified structure but will remain anonymous. It cannot be used in value constraints, involved in graph cycles in the query, and its properties cannot be extracted in answer set.

In xGT, edges are directed, and the structure must indicate the edge direction desired in the resulting answer. Vertex components of the pattern used as sources are indicated by the use of a - character connecting the vertex to the edge: (:career__People)-[:career__WorksFor]. Vertex components of the pattern used as targets are indicated by the use of the -> character pair connecting the edge to the vertex: (:career__People)-[:career__WorksFor]->(:corp__Companies). This example describes a pattern that matches persons that work for a company. The WorksFor edge goes from a person to the company that person works for. The edge is said to be outgoing from the person vertex and incoming to the company vertex. It is also possible to write the same pattern in the reverse direction: (:corp__Companies)<-[:career__WorksFor]-(:career__People). Despite appearing reversed on paper, the query described is identical. Using the two different edge directions is a convenient facility when composing larger patterns with multiple edge and vertex frames.

The structure becomes optional when queries just want to modify a few components of the graph. It is possible to create new edges and vertices without specifying a structure. However, the properties of the newly created edges and vertices would be restricted to expressions over constants. More information about graph element creation is described in Topology Additions.

4.6.1.1. Putting It All Together

A structure in TQL is composed of one or more sequences of vertex-edge-vertex subsequences, where the types of the components must be compatible. An example of a longer pattern is as follows:

MATCH (:corp__Companies)<-[:career__WorksFor]-(a:career__People)
                         -[:relations__FriendOf]->(b:career__People)

In this example, we make use of several of TQL’s pattern facilities, including forward and reverse directions in the edges, multiple vertex frames (People and Companies) and multiple edge frames (WorksFor and FriendOf). We also bind the two people in the query to variables a and b.

4.6.1.2. Non-linear Patterns

So far, we have discussed linear patterns in the sense that all vertices and edges in the pattern follow a sequence in the graph’s topology. Describing more complex patterns is done using the , (comma) operator. The best way to think about these patterns is that they “branch off” from a linear pattern into another part of the graph.

For example:

MATCH (:career__People)-[:career_WorksFor]->(c:corp__Companies)
                       -[:corp__IsLocated]->(:corp__Cities),
      (c)-[:corp__CompetesAgainst]->(:corp__Companies)

In this case, the first linear pattern describes people working for companies of interest and where they are located. The second linear pattern is connected to the first one by the intermediate company vertex c. More complex structure patterns can consist of multiple connected linear patterns in this fashion.

xGT has a constraint on non-linear patterns that doesn’t exist in Cypher. A subsequent linear path after a comma operator must have a connecting anchor vertex occurring somewhere in the path. The following example is valid in Cypher but xGT cannot process it currently since the only named vertex b in the second path is not connected to the first path. xGT requires that all paths in a non-linear pattern be connected to each other via at least one anchor vertex. In graph theoretical terms, all paths must form a “connected component”. No path can be outside of the connected component.

MATCH (:career__People)-[:career_WorksFor]->(c:corp__Companies)
                       -[:corp__IsLocated]->(:corp__Cities),
      (:corp__Companies)<-[:corp__CompetesAgainst]-(b)

4.6.1.3. Variable-length Edge Traversal

TQL supports specifying that a particular edge in the pattern could be traversed more than once. This is known as a variable-length edge. A variable-length edge is specified by indicating the minimum and maximum number of edges that can match this element in the path before the query moves on to the rest of the path.

[:corp__CompetesAgainst *1..5]

In this example, [:corp__CompetesAgainst] can involve from one up to five edge traversals before the query moves on to the rest of the path. The lower bound of the traversal must be strictly less than the upper bound. The lower bound must also be greater or equal to one.

Of particular importance is that the edge frame corp__CompetesAgainst maps vertices of type corp__Companies to vertices of the same type. That is the source and target vertex frames are the same. This is a requirement of xGT’s support for variable-length edge traversal.

Variables cannot be bound to a variable-length edge. The following is invalid in TQL: [c:corp__CompetesAgainst *1..2]. The reason for not allowing bound variables is that it would be ambiguous to identify which edge does the variable c in the example refer to. It could be the first edge, the second edge or potentially either in different contexts.

During execution of the variable-length edge, the target vertex instance of the traversal is mapped back to the source of the next traversal, thus requiring the vertices to be part of the same vertex frame. This is a current limitation of xGT’s support for variable-length edges and not part of the Cypher language.

If multiple traversals of a variable-length edge result in continuing the rest of the query’s execution, then a variable-length edge could result in several matches to the query. In the example above it could happen that a single traversal (one edge) results in a match to the query, in addition to a traversal of four edges results in another match to the query. Answers can be produced by traversals of any of the lengths indicated by the variable edge’s bounds.

The minimum lower bound for a variable-length traversal is one and the upper bound must be specified. Unbounded traversals are not supported by xGT.

4.6.2. Constraints on Properties

We have discussed how to express the topological part of a TQL query using structures. We now discuss how to express constraints on the data stored in the components of the graph.

Recall that each graph component in xGT has an associated schema with named properties. Each property in the schema has an associated data type (see Data Types and Operators in xGT).

Constraints on the properties of the graph elements are described in the WHERE clause of a TQL query. The WHERE clause consists of expressions involving the properties of the graph elements, combined with constants (of the appropriate data type) and commonly used comparison, arithmetic, and boolean operators.

The combination of the structure and constraints in a query is called the graph pattern in TQL.

Properties are referred to by using the . (dot) operator in between the name of a bound variable and the name of a property: a.name would access the property named name of the graph element bound to the variable a.

Property expressions can be combined with other property expressions and constants using TQL’s Cypher operator subset.

Supported operators are as follows:

  • Arithmetic: +, -, *, /, % (modulus), (unary) -

  • Boolean: AND, OR, NOT

  • Comparison: = (equality), <> (difference), <, >, <=, >=, IS NULL, IS NOT NULL

  • String: STARTS WITH, ENDS WITH, CONTAINS, + (concatenation operator).

  • Constant collection: IN

Constants are supported for the following data types:

  • Integer numbers

  • Floating-point numbers (32 bit)

  • Boolean true and false

  • String constants (surrounded by double or single quotes)

  • Null constant (NULL)

  • Dates

  • Times

  • Datetimes

  • IPv4 addresses

Parentheses () can be used to indicate precedence when dealing with nested expressions.

Examples of WHERE constraints are as follows:

WHERE p.name = "John" AND p.age < 40
WHERE c.value > 10.0 OR c.value < 2.5
WHERE (p.name STARTS WITH "D" OR p.name STARTS WITH "F") AND p.address IS NOT NULL
WHERE c.time > 20 AND (c.value > 10.0 OR c.value < d.value)

4.6.3. Functions

TQL provides a set of functions that can be used in queries. These functions include standard Cypher functions and a few that are unique to TQL.

4.6.3.1. Aggregation Functions

Sometimes it is useful to know cumulative information about results. Aggregation functions provide this ability by combining results from multiple rows into a single result. For instance, the following query returns a single row that has the total number of executives in all the companies in the graph:

MATCH (c:corp__Companies)
RETURN sum(c.num_executives)

In addition to a single cumulative result, rows can be grouped by user-specified keys. The grouping keys are non-aggregated expressions that are given alongside the aggregation function expressions. Input rows with the same value for the grouping keys are combined together.

Consider the example where the corp__Companies frame has a column state indicating which state the company headquarters is located in. The following query returns a single row for each unique value of state along with the total number of executives in each state:

MATCH (c:corp__Companies)
RETURN c.state, sum(c.num_executives)

There can be multiple grouping keys and aggregation functions. Consider this example that groups results by the unique combinations of city and state:

MATCH (c:corp__Companies)
RETURN c.state, c.city, sum(c.num_executives), avg(c.num_executives)

The DISTINCT keyword can be used in conjunction with an aggregation function. Using DISTINCT with an aggregation function generates the unique set of input values for the aggregation function, and the function is applied to the set of distinct values instead of all the values.

For example, consider this query that returns the number of unique cities that company headquarters are located in:

MATCH (c:corp__Companies)
RETURN count(DISTINCT c.city)

The following aggregation functions are supported by TQL:

Aggregation Functions

Name

Description

count(*)

Returns the number of elements in the answer set.

count(expr)

Returns the number of non-null values of an expression in the answer set.

avg(expr)

Returns the numerical average of all values of an expression in the answer set.

sum(expr)

Returns the sum of all values of an expression in the answer set.

min(expr)

Returns the smallest of all values of an expression in the answer set.

max(expr)

Returns the largest of all values of an expression in the answer set.

The variant count(*) includes null values in the count while the variant count(expr) excludes expressions that evaluate to null. For all other aggregation functions, expressions that evaluate to null are excluded from the result. The functions avg() and sum() take the argument expr that is an expression that evaluates to either integer or float type and return an aggregate value of the same type as expr. Expressions that evaluate to any other type result in an error. The functions min() and max() take the argument expr that is an expression that evaluates to any type and return an aggregate value of the same type as expr.

The aggregation functions can only be used in RETURN and WITH expressions.

Examples of their usage are as follows:

RETURN count(*) AS total
RETURN avg(a.age)
WITH sum(p.sale_price - p.production_cost) AS profit
WITH max(a.dob) AS youngest, min(a.dob) AS oldest, count(DISTINCT a.dob) AS unique_dobs

4.6.3.2. Conversion Functions

Conversion functions allow converting one type into a different type, for instance converting an integer into a string. TQL supports the following conversion functions:

Conversion Functions

Name

Description

toBoolean(expr)

Converts the given expression to a bool.

toInteger(expr)

Converts the given expression to an integer.

toFloat(expr)

Converts the given expression to a IEEE 754 float representation.

toString(expr)

Converts the given expression to a string.

The conversion functions take the single argument expr and can be applied to properties and expressions of any type. They can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

The toBoolean() function will convert strings to boolean. Strings can be 0, 1, true, or false. True and false are case insensitive. Other types used in conversion besides boolean or string will result in an error.

The toInteger() function will truncate floats and convert booleans to 0 or 1 for false and true, respectively. IPv4 addresses are converted to their integer representation. Date and datetime types are converted to seconds since the epoch where a date type is considered to occur at midnight. The function converts time types to seconds since midnight. Strings are parsed and converted to integers, if possible. When parsing a string, the function expects a format similar to Neo4j where the string value can written as positive or negative integer with an optional decimal fractional part that will be truncated.

The toFloat() function works similarly to toInteger(), but will use 32-bit IEEE 754 float precision. For time and datetime types, the fractional precision of the seconds will be represented in the float. Some considerations may need to be taken given that 32-bit floats have precision relative to magnitude. For more recent dates this precision can be 10 to 10^2 given the magnitude because the seconds since the epoch can be fairly large. This means the precision will be seconds to tens of seconds. Similar precision may be expected from IP addresses given that each value in the upper range represents 256^3.

If toBoolean(), toFloat(), or toInteger() fails at parsing a string, they will return a null.

The conversion functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

Some examples are:

WHERE toString(a.property) = "5"
WITH toString(10.5 + a.float_property) AS aString
RETURN toInteger(10.5 + a.float_property) AS anInt

4.6.3.3. Mathematical Functions

TQL supports the following mathematical functions:

Mathematical Functions

Name

Description

abs(expr)

Returns the absolute value of an expression.

ceil(expr)

Returns the ceiling of an expression.

floor(expr)

Returns the floor of an expression.

round(expr)

Returns the rounded value of an expression.

The mathematical functions take the argument expr that must be an expression that evaluates to either an integer or float type and return a value of the same type as the input expression. If expr evaluates to null, null is returned. Using these functions on any non-numeric type results in an error.

The mathematical functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

4.6.3.4. String Functions

TQL supports the following string functions:

String Functions

Name

Description

substring(orig, start [,length])

Returns a substring of the original string.

The substring() function has two required arguments and a third optional argument. The argument orig is an expression that evaluates to a string to operate on. The argument start is an expression that evaluates to a 0-based starting position in the original string. The optional argument length is an expression that evaluates to the length of the substring to extract. If length is not given, the function returns the substring beginning at start through the end of orig. If orig evaluates to null, null is returned. If orig evaluates to any non-string type, an error is thrown. If start or length evaluates to either a negative integer or null, an error is thrown. If start or length evaluates to any non-integer type, an error is thrown.

The string functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

4.6.3.5. Degree Functions

TQL supports the following degree functions:

Degree Functions

Name

Description

indegree(vertex [, edge_frame])

Returns an integer value with the number of incoming edges to the specified bound vertex variable. Can be global or relative to a particular edge frame.

outdegree(vertex [, edge_frame])

Return an integer value with the number of outgoing edges from the specified bound vertex variable. Can be global or relative to a particular edge frame.

The degree functions have one required and one optional argument. The argument vertex must be a bound variable to a vertex component in the structure. The optional second argument edge_frame must be the qualified name of an edge frame in xGT. Any other values for vertex or edge_frame results in an error.

Degree computations without the optional edge frame name are global, in the sense that the degree of the bound vertex is computed across edge frames incident on the owning vertex frame in xGT. When using the optional edge frame name argument, the degree computation becomes relative to that particular edge frame. That is, the degree of the vertex is computed only for edges of the named frame.

The degree functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

Examples are as follows:

WHERE indegree(a) = 10
WHERE outdegree(b) > 10
WHERE (outdegree(c, relations__FriendOf) + outdegree(c, career_WorksFor)) < 5

4.6.3.6. Function to Enforce Distinct Vertices

By default xGT and TQL do not impose restrictions on the identity of the vertices and edges in a structure. In particular, cycles are allowed (graph paths from one vertex to the same vertex) and will be reported if present in the data. There are cases where the identity of the vertices in a query is not important, but there are also cases in which at least some of the vertices must be different from each other.

It is easy to express distinct vertices with just two: a <> b, but if we have more – say a, b, c and d – then expressing all the pairwise constraints becomes tedious and error-prone: a <> b AND a <> c AND a <> d AND b <> c AND b <> d AND c <> d. For this reason, TQL provides a shortcut:

Unique Vertices Function

Name

Description

unique_vertices(vertex_list)

Enforces that a set of vertices are all distinct.

The function unique_vertices() takes a list of bound vertex variables. Any other input values results in an error. It can only be used as part of a WHERE clause with its arguments being the bound variables for vertices that must be distinct from each other. xGT automatically generates all the necessary constraints and adds them to any other user-provided constraints in the query.

4.6.4. Constant Expressions

xGT and TQL support expressing numerical and string constants directly in a query. TQL also provides mechanisms for expressing constraints based on date, time, datetime, and IPv4 address types. The functions date(), time(), datetime() and ipaddress() let the TQL query represent constants of the corresponding types built from string expressions in an appropriate format.

The following are the formats that are supported for each constant type:

  • date(): The string must be in the format YYYY-MM-DD.

  • time(): The string must be in the format HH:MM:SS or HH:MM:SS.S where the fractional part must be at least 1 digit, but no more than 6 digits. Time zones are supported as input but will be converted to Coordinated Universal Time (UTC) and the time zone will not be retained. Time zones are represented as either Z for UTC or by an offset containing an hour part and optional minute part. The hour part must be between 14 and -14 and the minute part must be in 15 minute increments. Valid formats are HH:MM:SSZ, HH:MM:SS+HH:MM, HH:MM:SS-HH:MM, HH:MM:SS+HH, and HH:MM:SS-HH.

  • datetime(): The string must be in the format of a date followed by a time separated by either a T or a space. The date and time must follow the formats given above. Examples of valid formats are YYYY-MM-DDTHH:MM:SS, YYYY-MM-DD HH:MM:SS.µµµµµµ, YYYY-MM-DDTHH:MM:SS.µµµµµµ+HH:MM.

  • ipaddress(): The string must be in the format NUM.NUM.NUM.NUM where each dot-separated value is a number between 0 and 255.

Examples of the use of these constant expressions are as follows:

WHERE a.date > date("2018-01-01")
WHERE a.time = time("16:00:00.000000")
WHERE b.datetime <> datetime("2017-12-31T00:00:00")
WHERE b.ipaddr = ipaddress("192.168.1.1")

Note that =, <>, >, <, >= and <= operators are supported for date, time, and datetime properties and constants. However, IPv4 addresses only support = and <> comparisons.

4.6.5. CASE Expressions

xGT and TQL support the CASE clause which provides conditional expressions. There are two types of CASE expressions: simple and generic. The simple form compares values to a test expression while the generic form evaluates a series of conditional expressions.

CASE expressions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

4.6.5.1. Simple CASE Expressions

The simple CASE expression has the following form:

CASE test
  WHEN value THEN result
  [WHEN value THEN result]
  [ELSE default]
END

Each of test, value, result, and default are expressions. The list of values are compared against the test expression in order until one is found that is equal to the test expression. If a value is found equal to the test expression, the result associated with that value is returned. If no value is equal to the test expression, the default is returned. If no value is equal to the test expression and no default is given, null is returned.

All of the result expressions and the default expression must be of the same type. The test expression and all of the value expressions must be of the same type.

4.6.5.2. Generic CASE Expressions

The generic CASE expression has the following form:

CASE
  WHEN predicate THEN result
  [WHEN predicate THEN result]
  [ELSE default]
END

Each of test, predicate, result, and default are expressions. The list of predicates are evaluated in order until one is found true. If a predicate is found true, the result associated with that predicate is returned. If no predicate is true, the default is returned. If no predicate is true and no default is given, null is returned.

All of the result expressions and the default expression must be of the same type. All of the predicate expressions must be of boolean type.

4.6.5.3. Examples

This example uses a simple case expression in a RETURN clause:

MATCH (v:graph__VertexFrame)
WHERE v.ID < 10
RETURN v.ID,
       CASE v.ID
         WHEN 2 THEN 1
         ELSE 0
       END AS result

The example returns a row for each vertex with an ID < 10 containing two columns. The first column is the ID. The second column is 1 if the ID is 2 and 0 otherwise.

This example uses a generic case expression in a WITH clause:

MATCH (v:graph__VertexFrame)
WHERE v.ID < 10
WITH v,
     CASE
       WHEN v.ID % 2 = 1 THEN "odd"
       ELSE "even"
     END AS type
WHERE type = "even"
RETURN v.ID

The example returns a row for each vertex with an even ID less than 10 with a single column holding the ID.

4.6.6. Cypher Parameters

xGT and TQL support user-defined parameters indicated in a query by a variable name starting with $. The values of the parameter are supplied via a Python map at the time of running or scheduling a job via via run_job() or schedule_job().

Cypher parameters can be used in the following ways:

  • For literals and expressions:

MATCH (c:corp__Companies)
WHERE c.num_executives >= $param
conn.run_job(query, parameters = { "param": 10 })

Python boolean, float, integer, and string types are automatically converted to the appropriate Cypher type. If the parameters are intended to be date, time, datetime, or ipaddress, one must use the appropriate casting function on the parameter in the query itself:

MATCH (c:corp__Companies)
WHERE c.date_founded = DATE($date)
conn.run_job(query, parameters = { "date": "2019-02-10" })
  • String pattern matching:

MATCH (c:corp__Companies)
WHERE c.company_name STARTS WITH $name
conn.run_job(query, parameters = { "name": "Trov" })
  • In the creation of an object:

CREATE (c:corp_Companies { company_name : $name, date_founded: DATE($date) })
conn.run_job(query, parameters = { "name": "NewCompany", "date": "2021-01-01" })
  • In the setting of object properties:

MATCH (c:corp__Companies)
WHERE c.company_name = "NewCompany"
SET c.num_executives = $param
conn.run_job(query, parameters = { "param": 10 })
  • For skip and limit values:

MATCH (c:corp__companies)
WHERE c.num_executives >= 10
RETURN c
SKIP $s
LIMIT $l
conn.run_job(query, parameters = { "s": 10, "l": 5 }

4.6.7. Modifications, Additions and Deletions

These optional parts of a query allow for changes to happen to the graph as part of its execution.

4.6.7.1. Property Modifications

Modifications involve changing the values of properties in specific vertex and/or edge instances that match the graph pattern declared in the query. Note that the properties themselves must have been declared as part of the respective frame creation.

The power of these property modifications is that the values to modify them to can be computed from the query itself. That is, the user can programmatically determine what to change the values to as part of the query. A simple example of this would be computing a duration for each edge, assuming that all of the edges have a start_time and end_time property:

MATCH ()-[e:graph__EdgeFrame]->()
SET e.duration = e.end_time - e.start_time

In this case, all edges belonging to the frame EdgeFrame are visited (there are no WHERE constraints) and duration is computed from two of the other properties on that edge. Note that the schema for EdgeFrame must already include a duration property (even if the values are null).

Note that property modification requires a structure description (MATCH pattern) for the query since it must first find the elements in the graph before being able to modify them.

4.6.7.2. Topology Additions

Additions to the graph can also be done as part of query execution. It is possible to add new vertices and/or edges as part of a query.

Additions of new vertices is simpler since it involves providing the name of the vertex frame to add the vertex to as well as at least enough values for the key properties of the vertex:

MATCH ()-[e:graph__EdgeFrame]->()
CREATE (v:graph__VertexFrame { id: e.source_id + 1000, data: "test" })

In this simple example a new vertex is created for each source endpoint of an edge in the EdgeFrame frame. The value of the vertex key is computed from the key of the source endpoint of the edge. This example assumes that the computed vertex key (id) is outside the range of existing vertices in the vertex frame. Note that it is an error to try and create a vertex with the same key value as an existing vertex in the same frame.

In addition to supporting the CREATE keyword, TQL supports the MERGE keyword which indicates to the system that the vertex should be created if it does not exist or retrieved from the current data store if it does. As with the CREATE keyword the MERGE keyword requires the specification of at least the key properties for the merged vertex. The example below illustrates the use of the MERGE keyword when the non-existence of entries with the same key value on the vertex frame cannot be guaranteed.

MATCH ()-[e:graph__EdgeFrame]->()
MERGE (v:graph__VertexFrame { id: e.source_id + 1000, data: "test" })

Adding an edge involves matching existing vertices to use as endpoints for the edge being created:

MATCH (a:graph__VertexFrame)-(:graph__EdgeFrame)->(b:graph__VertexFrame)
WHERE a.id > 0 AND b <> a
CREATE (a)-[new_edge:graph__EdgeFrame { float_count: -0.5, data: 10 }]->(a)

In this example, we create a self-edge from a to itself and add it to the edge frame EdgeFrame, for distinct pairs of vertices a and b, where a’s identifier is positive and there is an edge already in EdgeFrame from a to b. Note that the specification of the properties of the newly created edge cannot include the values of key properties – those are derived from values of the key properties of the endpoints.

The endpoints of the created vertex can be any pair of vertices that is compatible with the declaration of the edge frame (that is they are of the right type for source and target of the edge frame). The only requirement is that the endpoints have to be matched as part of the query. For convenience, the direction of the edge can be specified in either sense: source-to-target or target-from-source, with the resulting created edge being the same.

While xGT does not currently support the creation of the vertex endpoints of an edge and the edge itself in the same query statement, combining MERGE on a vertex and CREATE on the edge can be used to achieve the same effect:

MATCH (a:graph__VertexFrame)
WHERE a.id >= 0
MERGE (b:graph__VertexFrame { id: -1, data: "negative_vertex" })
CREATE (b)-[new_edge:graph__EdgeFrame { float_count: -0.5, data: 10 }]->(a)

In this example, we match an existing vertex a with non-negative key property id, then create a vertex with the key id (if necessary), and finally use that vertex to create a new edge from it to the originally-matched a vertex. Note that MERGE is only supported for vertices because multiple edges can exist for the same key property values. It would be ambiguous if more than one edge matched.

The following example illustrates the optionality of the MATCH pattern (or structure description) to add elements to the graph:

MERGE (a:graph__VertexFrame { id: 0, data: "target vertex" })
MERGE (b:graph__VertexFrame { id: -1, data: "negative vertex" })
CREATE (b)-[new_edge:graph__EdgeFrame { float_count: -0.5, data: 10 }]->(a)

In this case, we MERGE two possibly new vertices with keys 0 and -1 and then proceed to use those vertices to create a new edge between them. The MATCH pattern becomes optional because we know exactly the constant key values (id property in this example) that we want the vertices to have. The limitation on not using a MATCH pattern is that the number of vertices and edges created is limited to the ones literally described in the syntactic query. When a MATCH pattern is used, the multiplicity of vertices and edges created is driven by the number of elements in the answer set of the query.

4.6.7.3. Topology Deletions

Deletions from the graph can also be done as part of query execution. It is possible to delete vertices and/or edges as part of a query.

Deleting edges is a simpler process than deleting vertices, because it is localized to the affected edge frame:

MATCH ()-[e:graph__EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
DELETE e
RETURN e

In this case, the matched edges are removed from the edge frame as part of the query execution. Note that it is still possible to return values from the deleted edges since deletion occurs after matches have already been recorded (as in this example where e is returned).

Deleting a vertex from a vertex frame is a more involved process since there could still be edges incident on that vertex across a collection of different frames.

MATCH (a:graph__VertexFrame)
WHERE a.id = 0
DETACH DELETE a

TQL uses DETACH DELETE (as opposed to just DELETE) to delete a vertex and all adjacent edges. In this case, even if just the vertex frame VertexFrame is specified in the query, xGT will have to look into all edge frames where VertexFrame is a source or a target and delete edges from them that are incident on the matching vertex a.

Consider the case where a user does not have visibility (see Access Control) to all of the incident edges of a vertex. In such a situation, the user has requested the deletion of at least one edge that they do not have the permission to delete. Thus, the query will fail and the transaction will roll back with a security violation.

Note that topology deletions do require a structure description (MATCH pattern) for a query since the query must first find the requested elements before deleting them from the graph. It is not possible to specify deletion of a vertex or edge without a MATCH pattern in the query.

4.6.7.4. Visibility of Changes to the Graph

Property modifications, additions and deletions from the graph are not immediately visible in the same query in which they are executing. They are applied to the graph and all its frames as part of the final steps in query execution after the pattern matching and query results have been recorded.

They are fully available in subsequent queries. More information is provided in the document titled Transactions in xGT.

4.6.8. What Is an Answer to a TQL Query?

Now that we have discussed how to describe a structure as well as property constraints on the graph elements, we can define what an answer is to a TQL query:

  • A match to a TQL query is a collection of graph elements that satisfy both the topological and type constraints in the structure as well as the property constraints in the WHERE clause. The graph elements must be connected in the manner specified by the structure.

  • An answer is described as a sequence of information gathered from a match. This can be thought of as a row in a table.

  • A partial answer set in a TQL query is the set of all answers found for a particular query section.

  • An answer set to a TQL query is the set of all answers found in the graph data for the entire query.

4.6.9. Describing the Answer to a TQL Query

The RETURN keyword in a TQL query describes what the answer set should contain for each match. Each answer will be recorded as a row in the results table, and the sequence of expressions in the RETURN clause will each be returned as a column.

The returned expressions can be any combination of properties from bound variables in the query, constants, TQL operators, and function results. Each column can also be renamed to an alias name using the AS keyword. A convenient shorthand for returning all properties in a graph element is to list its bound variable as part of the RETURN clause.

Examples of RETURN clauses are:

RETURN a.name, (a.age + 10) AS ageplus, indegree(a) AS indeg

This query returns three columns with the names name, ageplus, and indeg.

RETURN a, b, c, d

This query returns all the properties of graph element a followed by all the properties of b, c, and d.

If a structure description (MATCH pattern) is not provided by a query it is possible to use a RETURN clause to produce constant values in a results table:

RETURN 10 AS ten, NULL AS nothing, "abc" AS abc

Or to return properties from recently created vertices and edges:

MERGE (v0:graph__positiveVertex { id : 5 })
MERGE (v1:graph__negativeVertex { id : -1 })
CREATE (v0)-[e0:graph__forwardEdge {}]->(v1)
RETURN v0.id, v0.data, v1.id, e0.sourceID, e0.targetID

4.6.10. Modifying How the Answer Set Is Reported

As we have described in the previous sections, the result of a TQL query is a table with one row per answer in the answer set. By default, xGT inserts resulting rows into this table in the order that it finds them, which can be arbitrary and vary from execution to execution.

In many cases, this is satisfactory since the content of the answer set may be more important than the way it is represented in the results table. In other cases, a different representation of the results table is required.

TQL supports the following solution modifiers to the answer set as represented in the results table:

  • DISTINCT: Requests that only unique entries in the answer set be inserted as rows into the table.

  • ORDER BY: Requests a sorted representation of the answer set, where the rows in the table are sorted by a value. This value can be a combination of multiple expressions and need not be an existing expression in the RETURN clause. Each expression in the order by clause can indicate whether to sort in ascending (by default) or descending (by adding the keyword DESC) order.

  • LIMIT: Requests that the xGT server produce only the first k rows of the answer set.

  • SKIP: Requests that the xGT server skip over the first k rows of the answer set.

Solution modifiers are very useful when used in combinations, for example the combination of ORDER BY with LIMIT is very useful when doing initial exploration of a data set, since it reports the top k rows according to some sorting criteria.

4.6.11. Specifying the Results Table

The answer to a query is returned in the form of a table frame. TQL requires specifying the qualified name of the table frame in which to return the query results using the INTO keyword. Here is a simple example that filters the edges in the edge frame graph__EdgeFrame and returns the results into the table frame results__QueryResults:

MATCH ()-[e:graph__EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
RETURN e
INTO results__QueryResults

If the table named in the INTO clause doesn’t already exist, it will be created and populated with the results of the query. If the table exists and the table schema types match the types of the implied schema for the RETURN clause, the query results will be appended to the already existing table. If the table exists and the table schema types don’t match the types of the implied schema for the RETURN clause, xGT will report an error.

4.6.12. Querying over a Table Frame

TQL supports the additional functionality of querying against a table. This can be used, for instance, to do some additional exploration of a result table acquired from a graph query.

A table query is performed by using the MATCH keyword with just a single element (no edges) using the syntax for vertices: parentheses (). Any of the constraints or aggregation functions can be performed in such a query. For example we could look at the result of a previous query and count all results of persons over a certain age:

MATCH (row:graph__Result)
WHERE row.age > 50
RETURN count(*)

4.6.13. Multiple Query Sections

This section describes TQL queries that contain more than one query section. Note that most examples shown in the documentation are TQL queries with only one query section.

A query section can contain a structure (MATCH), constraints (WHERE), property modifications (SET), and topology modifications (MERGE, CREATE, DELETE, and DETACH DELETE). The last query section can also return results (RETURN and INTO).

In a query with multiple sections, the WITH keyword is used to separate query sections and can carry information from one section to the next. WITH and RETURN clauses function similarly. While a RETURN clause optionally follows the last query section to describe the results returned, a WITH clause is required after each query section except for the last one and describes the results produced by that query section. The WITH clause carries results to the next query section and for each result row carried, the next query section is executed. Solution modifiers and aggregation functions can also be applied in the WITH clause.

The example below shows a TQL query with three sections:

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0
WITH v
MATCH (u:graph__VertexFrame)
WHERE outdegree(u, graph__EdgeFrame) > 100
WITH v, u
LIMIT 300
MATCH (t)-[:graph__EdgeFrame]->()
RETURN v.id, u.id, t.id

The first query section consists of:

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0

This query section works the same as all the previously shown TQL queries with only one query section. The matches found are carried to the next query section using a WITH clause. For each result row carried by the WITH expression, the next query section is performed, which searches for all instances of the vertex pattern:

MATCH (u:graph__VertexFrame)
WHERE outdegree(u, graph__EdgeFrame) > 100

In this toy example, the patterns in each section do not depend on information carried in the WITH expression. Therefore, the same process is repeated in the second section for each result produced by the first section. For example, if the graph contains 20 edges of type graph__EdgeFrame with duration less than 10, then the second search is performed 20 times. Furthermore, if the graph also contains 30 vertices of type graph__VertexFrame with out degree greater than 100, then there will be 20 * 30 = 600 matched rows found by the second query section.

Because the WITH expression that follows the second query section has the modifier LIMIT 300 attached, the third query section is only executed for 300 of the 600 matches found in the second query section. For each of the 300 carried results, the third query section executes the search:

MATCH (t)-[:graph__EdgeFrame]->()
RETURN v.id, u.id, t.id

If there are 500 edges of type graph__EdgeFrame in total, then the third query section will produce 300 * 500 = 150,000 result rows, which are returned using the RETURN keyword.

Note that this query is a very simple example in which each query section does not actually depend on the previous one because the information carried by the WITH clause is not used. It is shown to illustrate the basic mechanics of multiple query sections: the query in each section is fully performed for each result produced by the previous query section. Using information carried by the WITH clause is discussed next.

4.6.13.1. Carrying Information Using WITH

In the example above, a WITH keyword connected two query sections, but the two sections were not connected by any information carried over. Typically, the WITH keyword is used to carry information that will be used in the next query section. For example, a variable bound to a vertex or edge may be carried and used in the next query section. In this example, it is used in the MATCH and RETURN clauses of the next section:

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
WITH v
MATCH (v)-[:graph__EdgeFrame]->(u)
WHERE outdegree(u, graph__EdgeFrame) > 100
RETURN v.id, u.id

For each pattern matched by the first query section, the vertex v that is part of the pattern is carried in the WITH expression to the following MATCH statement. In the second query section, for each v carried, the search only looks for instances of the pattern that contain that instance of v.

The results returned by the query above are equivalent to those returned by the following query with only one query section:

MATCH (u)<-[:graph__EdgeFrame]-(v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10 AND outdegree(u, graph__EdgeFrame) > 100
RETURN v.id, u.id

Note that when a carried bound variable is referenced in the next query section, the frame name can be included or not and this does not change the behavior. The statement WITH v MATCH (v)-[:graph__EdgeFrame]->() is equivalent to WITH v MATCH (v:graph__VertexFrame)-[:graph__EdgeFrame]->().

If a variable occurs in multiple query sections, but was not carried using the WITH keyword, it refers to a new element unrelated to the previous use of the variable. This is shown in the example below for u:

MATCH (v)-[:graph__EdgeFrame]->(u)
WITH v
MATCH (v)-[:graph__EdgeFrame]->(u)

The variable u in the first MATCH statement and the variable u in the second MATCH statement do not refer to the same vertex. While reusing a variable name that isn’t carried by a WITH clause in a later query section is legal, it can make queries harder to understand and is not recommended.

In addition to a vertex or edge, a WITH clause can carry any value available in the previous query section. The example below shows carrying one edge entity and one vertex property, both of which are then referenced in the next section.

MATCH (v)-[e:graph__EdgeFrame]->()
WITH e, v.location
MATCH ()-[f:graph__EdgeFrame]->(u)
WHERE u.location = v.location AND f.duration < e.duration

An aggregate value computed in one query section can be carried to the next. In the example below, the matches found in the first query section are aggregated over each unique value of the vertex v. For each unique v, both v and the number of matches found for that v are carried into the next section using the WITH keyword. In this query, the second section will execute once per unique value of v found in the first section. This type of grouping operation is explained in Aggregation Functions. The second query section can then use both the vertex v in its MATCH structure and the corresponding value of count(*) in the constraints. Note that to carry an expression over to the next query section an alias is required. In this example, the alias freq is needed to be able to refer to the result of the count(*) operation.

MATCH (v)-[e:graph__EdgeFrame]->()
WITH v, count(*) AS freq
MATCH (v)-[:graph__EdgeFrame]->(u)
WHERE outdegree(u) > freq

4.6.13.2. Aliasing in the WITH Clause

Just like with the RETURN keyword, the values carried by a WITH can be aliased. The property of a vertex or edge can either be carried directly with e.duration or they can be aliased using WITH e.duration AS target_duration. If aliased, they can only be referenced in the next query section through the alias.

xGT does not allow aliasing of variables referring to a whole entity (vertex or edge). Those must be carried with the same name they have. The following is not valid due to the aliasing of v:

MATCH (v:graph__VertexFrame)
WITH v AS vother

Instead, the vertex v must be carried without an alias as shown below:

MATCH (v:graph__VertexFrame)
WITH v

An alias is required to refer to any expression that is not a whole entity (vertex or edge) or a property of an entity in the next query section. For example, it is not possible to carry the degree of a vertex using WITH outdegree(v) or to carry the sum of two values (WITH a + b). Instead, an alias must be used for these expressions, as shown below:

MATCH (v)-[e:graph__EdgeFrame]->()
WITH outdegree(v) AS min_degree, v.val + e.duration AS target_val
MATCH (u:graph__VertexFrame)
WHERE indegree(u) > min_degree AND u.val = target_val

4.6.13.3. Additional Filtering of WITH Partial Results

TQL’s WITH clause allows for filtering of results produced by a query section before starting the next section. To do this, include a WHERE clause after the WITH clause:

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
WITH v, count(*) AS freq
WHERE freq > 100
MATCH (v)-[:graph__EdgeFrame]->(u)

The user can add a WHERE clause that will further filter results coming from the first query section. In this case, only counts greater than 100 will be passed on to the next query section. Note that only variables (and aliases) carried by the WITH clause are available to filter on. Only those matching answers that pass the filter will be available to the second query section.

4.6.13.4. Visibility of Changes

If a query section contains property modifications or topology additions or deletions, these changes will be visible in any later query sections. In the query below, the new duration property value is visible in the second query section:

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0
WITH v
MATCH (v)-[f:graph__EdgeFrame]->(u)
WHERE f.duration = 0

In the following query, an edge is created in the first query section and will be visible in the next. This means that the degree computed in the second query section will include any new edges.

MATCH (v)-[e:graph__EdgeFrame]->()
WHERE e.duration < 10
CREATE (v)-[new_edge:graph__EdgeFrame { duration: 0 }]->(v)
WITH v
MATCH (v)
WHERE indegree(v) > 100

For any new vertices or edges created with MERGE or CREATE, xGT allows the properties of the new elements to be carried by the WITH keyword, but not the whole entity variable. This is currently a limitation of xGT’s implementation. For the example above in which new_edge is created, using WITH new_edge.duration is allowed, but not WITH new_edge.

4.6.13.5. Use Cases of Multiple Query Sections

One use of multiple query sections is to limit the branching of a pattern search. The example below searches for occurrences of a pattern with three edges, but only searches for the third edge for the 100 occurrences of the two-edge pattern with lowest duration property. This is done by adding solution modifiers ORDER BY and LIMIT to the WITH clause, causing only 100 results from the first query section to be carried to the next:

MATCH (v)-[e:graph__EdgeFrame]->(w)<-[:graph__EdgeFrame]-()
WITH v, e, w
ORDER BY e.duration LIMIT 100
MATCH (w)-[f:graph__EdgeFrame]->()

Another use is to compute aggregate functions in one query section and use them to filter results in a later query section, as shown here:

MATCH (v)-[e:graph__EdgeFrame]->(w)
WHERE e.duration < 10
WITH avg(e.duration) AS target_duration
MATCH ()-[f:graph__EdgeFrame]->()
WHERE f.duration < target_duration

4.6.13.6. Additional Rules

xGT has restrictions on some combinations of TQL clauses and the WITH keyword:

  • DISTINCT can be used for query sections producing properties, expressions and whole entity vertex variables. DISTINCT cannot be used currently for whole entity non-vertex variables.

  • DISTINCT can be used in RETURN clauses without this consideration.

  • The grouping idiom with a key given by a whole entity variable can only be used for vertex variables in a WITH clause: WITH v, count(*) AS cnt, min(v.property) AS small.

  • It is not currently possible to use a non-vertex whole entity variable in a grouping idiom as the key.

  • The grouping idiom does not have these restrictions for a RETURN clause.

Query sections and the WITH clause can be used with table queries:

MATCH (t:graph__MyTable)
WHERE t.property > 10
WITH t
MATCH (v:graph__VertexFrame)
WHERE v.id = t.property + 5

Note that table MATCH query sections can only have one element in the pattern (the table frame).

4.6.13.7. Performance Considerations

The use of the WITH clause and multiple query sections is a powerful tool to compose sophisticated queries in TQL. However, the user should be aware of certain considerations related to how those queries will perform on a system:

  • Each WITH clause is describing, in essence, a temporary table frame. It is possible to create large temporary tables that could take up resources in the xGT server (memory and compute time) for their processing.

  • Each query section induces an iteration over the rows produced in the preceding query section, thus any MATCH pattern in this query section will execute ONCE for each row produced.

4.6.14. Row Access Control in Queries

A TQL query will only access vertices, edges, or rows that the authenticated user initiating the query has permission to view. Frames that have row access control enabled may have security labels attached to each row and a row is only visible to a query if the user has the necessary security labels in their label set. The label set of a user will be configured by the administrator as described in Configuring Groups and Labels.

Row access means that the result of the same query may vary for different users. For example, when running a query to count the number of occurrences of a pattern, a user with more labels in their label set might see a higher count value than a user with fewer labels.

If a TQL query accessed frames with row security to produce a result, the result table will also have row security enabled. Each row of the result table is protected by the union of security labels found on any input row used to produce the resulting row. For example, for the simple query shown below, each output row has the same security labels as the corresponding input vertex row from the frame graph__VertexFrame.

MATCH (v:graph__VertexFrame)
WHERE v.id > 2000
RETURN v
ORDER BY v.age
INTO results__QueryResults

For the following query, each result row is obtained by accessing vertices v and w in graph__VertexFrame and an edge e in graph__EdgeFrame. Therefore, attached to each result row will be the union of row labels attached to the vertex v, the vertex w, and the edge e that contributed to the match.

MATCH (v:graph__VertexFrame)-[e:graph__EdgeFrame]->(w:graph_VertexFrame)
WHERE v.val = 2000 AND e.duration > 10
RETURN v.id AS vid, v.val AS vertex_value, e.duration AS duration
INTO results__QueryResults

For the aggregate query shown below, the labels of the single result row will be the union of labels attached to any edge in graph__EdgeFrame that the user has permission to see and the labels of any source or target vertex of such an edge in graph__VertexFrame. Note that in order to have permission to see an edge, the user must also have permission to see both its source and target vertex. If these frames had other elements not viewable by the user, they will not affect the labels attached to the result. For example, if the query accessed vertices graph__VertexFrame with labels “label1”, “label3”, “label5” and accessed edges in graph__EdgeFrame with labels “label5”, “label6”, then the result row would have attached labels “label1”, “label3”, “label5”, “label6”.

MATCH (v:graph__VertexFrame)-[e:graph__EdgeFrame]->(w:graph__VertexFrame)
WHERE v.val = 2000 AND e.duration > 10
RETURN count(*)
INTO results__QueryResults

As described in section Access Control, the universe of possible labels that can be attached to any row of a frame is set during frame creation and cannot be changed. Therefore, if the result table has been created before running a TQL query, its row label universe must contain all labels that will be attached to any of its rows during the query. Otherwise, xGT will report an error. The maximum number of security labels in a frame’s row label universe is 128. Running a query that would produce a result table with more than 128 unique row labels will produce an error. This might occur, for example, in the count query above if there were 100 unique security labels attached to vertices of graph__VertexFrame accessed during the query and 100 different unique security labels attached to edges of graph__EdgeFrame accessed during the query. If the table named in the INTO clause doesn’t already exist, it will be created with the appropriate row label universe.