2.14. Query Results

Now that we have discussed how to describe a structure as well as property constraints on the graph elements, we can define what an answer is to a TQL query:

  • A match to a TQL query is a collection of graph elements that satisfy both the topological and type constraints in the structure as well as the property constraints in the WHERE clause. The graph elements must be connected in the manner specified by the structure.

  • An answer is described as a sequence of information gathered from a match. This can be thought of as a row in a table.

  • A partial results set in a TQL query is the set of all answers found for a particular query part.

  • A results set to a TQL query is the set of all answers found in the graph data for the entire query.

2.14.1. Describing the Query Results

The RETURN keyword in a TQL query describes what the results set should contain for each match. Each answer will be recorded as a row in the results table, and the sequence of expressions in the RETURN clause will each be returned as a column.

The returned expressions can be any combination of properties from bound variables in the query, constants, TQL operators, and function results. Each column can also be renamed to an alias name using the AS keyword. A convenient shorthand for returning all properties in a graph element is to list its bound variable as part of the RETURN clause.

Examples of RETURN clauses are:

RETURN a.name, (a.age + 10) AS ageplus, indegree(a) AS indeg

This query returns three columns with the names name, ageplus, and indeg.

RETURN a, b, c, d

This query returns all the properties of graph element a followed by all the properties of b, c, and d.

If a structure description (MATCH pattern) is not provided by a query it is possible to use a RETURN clause to produce constant values in a results table:

RETURN 10 AS ten, NULL AS nothing, "abc" AS abc

Or to return properties from recently created vertices and edges:

MERGE (v0:PositiveVertex { id : 5 })
MERGE (v1:NegativeVertex { id : -1 })
CREATE (v0)-[e0:ForwardEdge {}]->(v1)
RETURN v0.id, v0.data, v1.id, e0.source_id, e0.target_id

2.14.2. Modifying the Results

As we have described in the previous sections, the result of a TQL query is a table with one row per answer in the results set. By default, xGT inserts resulting rows into this table in the order that it finds them, which can be arbitrary and vary from execution to execution.

In many cases, this is satisfactory since the content of the results set may be more important than the way it is represented in the results table. In other cases, a different representation of the results table is required.

TQL supports the following solution modifiers to the results set as represented in the results table:

  • DISTINCT: Requests that only unique entries in the results set be inserted as rows into the table.

  • ORDER BY: Requests a sorted representation of the results set, where the rows in the table are sorted by a value. This value can be a combination of multiple expressions and need not be an existing expression in the RETURN clause. Each expression in the order by clause can indicate whether to sort in ascending (by default) or descending (by adding the keyword DESC) order.

  • LIMIT: Requests that the xGT server produce only the first k rows of the results set.

  • SKIP: Requests that the xGT server skip over the first k rows of the results set.

Solution modifiers are very useful when used in combinations, for example the combination of ORDER BY with LIMIT is very useful when doing initial exploration of a data set, since it reports the top k rows according to some sorting criteria.

2.14.3. Where the Results Are Placed

The answer to a query is returned in the form of a table frame. The table frame can either be stored as an explicitly named frame or in the job that executed the query. In a Named Frame

TQL provides the INTO keyword that allows specifying the qualified name of the table frame in which to return the query results. Here is a simple example that filters the edges in the edge frame EdgeFrame and returns the results into the table frame QueryResults:

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
INTO QueryResults

If the table named in the INTO clause doesn’t already exist, it will be created and populated with the results of the query. If the table exists and the table schema types match the types of the implied schema for the RETURN clause, the query results will be appended to the already existing table. If the table exists and the table schema types don’t match the types of the implied schema for the RETURN clause, xGT will report an error. In a Job

If the INTO keyword is not given in a query, the results are placed in the job object that ran the query. In this case, a maximum of 1,000 rows is returned. The job object is returned from both run_job() and wait_for_job() and is stored in the job history.

2.14.4. Querying a Table Frame

TQL supports the additional functionality of querying against a table. This can be used, for instance, to do some additional exploration of a results table acquired from a graph query.

A table query is performed by using the MATCH keyword with just a single element (no edges) using the syntax for vertices: parentheses (). Any of the constraints or aggregation functions can be performed in such a query. For example, we could look at the result of a previous query and count all results of persons over a certain age:

MATCH (row:Result)
WHERE row.age > 50
RETURN count(*)

2.14.5. Multiple Query Parts

This section describes TQL queries that contain more than one query part. Note that most examples shown in the documentation are TQL queries with only one query part.

A query part can contain a structure (MATCH), constraints (WHERE), property modifications (SET), and topology modifications (MERGE, CREATE, DELETE, and DETACH DELETE). The last query part can also return results (RETURN and INTO).

In a query with multiple parts, the WITH keyword is used to separate query parts and can carry information from one part to the next. WITH and RETURN clauses function similarly. While a RETURN clause optionally follows the last query part to describe the results returned, a WITH clause is required after each query part except for the last one and describes the results produced by that query part. The WITH clause carries results to the next query part and for each result row carried, the next query part is executed. Solution modifiers and aggregation functions can also be applied in the WITH clause.

The example below shows a TQL query with three parts:

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0
MATCH (u:VertexFrame)
WHERE outdegree(u, EdgeFrame) > 100
WITH v, u
MATCH (t)-[:EdgeFrame]->()
RETURN v.id, u.id, t.id

The first query part consists of:

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0

This query part works the same as all the previously shown TQL queries with only one query part. The matches found are carried to the next query part using a WITH clause. For each result row carried by the WITH expression, the next query part is performed, which searches for all instances of the vertex pattern:

MATCH (u:VertexFrame)
WHERE outdegree(u, EdgeFrame) > 100

In this toy example, the patterns in each part do not depend on information carried in the WITH expression. Therefore, the same process is repeated in the second part for each result produced by the first part. For example, if the graph contains 20 edges in frame EdgeFrame with duration less than 10, then the second search is performed 20 times. Furthermore, if the graph also contains 30 vertices in frame VertexFrame with an out degree greater than 100, then there will be 20 * 30 = 600 matched rows found by the second query part.

Because the WITH expression that follows the second query part has the modifier LIMIT 300 attached, the third query part is only executed for 300 of the 600 matches found in the second query part. For each of the 300 carried results, the third query part executes the search:

MATCH (t)-[:EdgeFrame]->()
RETURN v.id, u.id, t.id

If there are 500 edges in frame EdgeFrame in total, then the third query part will produce 300 * 500 = 150,000 result rows, which are returned using the RETURN keyword.

Note that this query is a very simple example in which each query part does not actually depend on the previous one because the information carried by the WITH clause is not used. It is shown to illustrate the basic mechanics of multiple query parts: the query in each part is fully performed for each result produced by the previous query part. Using information carried by the WITH clause is discussed next. Carrying Information Using WITH

In the example above, a WITH keyword connected two query parts, but the two parts were not connected by any information carried over. Typically, the WITH keyword is used to carry information that will be used in the next query part. For example, a variable bound to a vertex or edge may be carried and used in the next query part. In this example, it is used in the MATCH and RETURN clauses of the next part:

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
MATCH (v)-[:EdgeFrame]->(u)
WHERE outdegree(u, EdgeFrame) > 100
RETURN v.id, u.id

For each pattern matched by the first query part, the vertex v that is part of the pattern is carried in the WITH expression to the following MATCH statement. In the second query part, for each v carried, the search only looks for instances of the pattern that contain that instance of v.

The results returned by the query above are equivalent to those returned by the following query with only one query part:

MATCH (u)<-[:EdgeFrame]-(v)-[e:EdgeFrame]->()
WHERE e.duration < 10 AND outdegree(u, EdgeFrame) > 100
RETURN v.id, u.id

Note that when a carried bound variable is referenced in the next query part, the frame name can be included or not and this does not change the behavior. The statement WITH v MATCH (v)-[:EdgeFrame]->() is equivalent to WITH v MATCH (v:VertexFrame)-[:EdgeFrame]->().

If a variable occurs in multiple query parts, but was not carried using the WITH keyword, it refers to a new element unrelated to the previous use of the variable. This is shown in the example below for u:

MATCH (v)-[:EdgeFrame]->(u)
MATCH (v)-[:EdgeFrame]->(u)

The variable u in the first MATCH statement and the variable u in the second MATCH statement do not refer to the same vertex. While reusing a variable name that isn’t carried by a WITH clause in a later query part is legal, it can make queries harder to understand and is not recommended.

In addition to a vertex or edge, a WITH clause can carry any value available in the previous query part. The example below shows carrying one edge entity and one vertex property, both of which are then referenced in the next part.

MATCH (v)-[e:EdgeFrame]->()
WITH e, v.location
MATCH ()-[f:EdgeFrame]->(u)
WHERE u.location = v.location AND f.duration < e.duration

An aggregate value computed in one query part can be carried to the next. In the example below, the matches found in the first query part are aggregated over each unique value of the vertex v. For each unique v, both v and the number of matches found for that v are carried into the next part using the WITH keyword. In this query, the second part will execute once per unique value of v found in the first part. This type of grouping operation is explained in Aggregation Functions. The second query part can then use both the vertex v in its MATCH structure and the corresponding value of count(*) in the constraints. Note that to carry an expression over to the next query part an alias is required. In this example, the alias freq is needed to be able to refer to the result of the count(*) operation.

MATCH (v)-[e:EdgeFrame]->()
WITH v, count(*) AS freq
MATCH (v)-[:EdgeFrame]->(u)
WHERE outdegree(u) > freq

Lists can also be carried using a WITH clause:

MATCH (v:VertexFrame)
WITH collect(v.id) as list

Path variables can also be carried using a WITH clause:

MATCH p = (v:VertexFrame)-[:EdgeFrame *1..10]->()
WITH p Aliasing in the WITH Clause

Just like with the RETURN keyword, the values carried by a WITH can be aliased. The property of a vertex or edge can either be carried directly with e.duration or they can be aliased using WITH e.duration AS target_duration. If aliased, they can only be referenced in the next query part through the alias.

An alias is required to refer to any expression that is not an entity (vertex, edge, or table row) or a property of an entity in the next query part. For example, it is not possible to carry the degree of a vertex using WITH outdegree(v) or to carry the sum of two values (WITH a + b). Instead, an alias must be used for these expressions, as shown below:

MATCH (v)-[e:EdgeFrame]->()
WITH outdegree(v) AS min_degree, v.val + e.duration AS target_val
MATCH (u:VertexFrame)
WHERE indegree(u) > min_degree AND u.val = target_val Additional Filtering of WITH Partial Results

TQL’s WITH clause allows for filtering of results produced by a query part before starting the next part. To do this, include a WHERE clause after the WITH clause:

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
WITH v, count(*) AS freq
WHERE freq > 100
MATCH (v)-[:EdgeFrame]->(u)

The user can add a WHERE clause that will further filter results coming from the first query part. In this case, only counts greater than 100 will be passed on to the next query part. Note that only variables (and aliases) carried by the WITH clause are available to filter on. Only those matching answers that pass the filter will be available to the second query part. Visibility of Changes

If a query part contains property modifications or topology additions or deletions, these changes will be visible in any later query parts. In the query below, the new duration property value is visible in the second query part:

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
SET e.duration = 0
MATCH (v)-[f:EdgeFrame]->(u)
WHERE f.duration = 0

In the following query, an edge is created in the first query part and will be visible in the next. This means that the degree computed in the second query part will include any new edges.

MATCH (v)-[e:EdgeFrame]->()
WHERE e.duration < 10
CREATE (v)-[new_edge:EdgeFrame { duration: 0 }]->(v)
WHERE indegree(v) > 100

For any new vertices or edges created with MERGE or CREATE, xGT allows the properties of the new elements to be carried by the WITH keyword, but not the entity variable. This is currently a limitation of xGT’s implementation. For the example above in which new_edge is created, using WITH new_edge.duration is allowed, but not WITH new_edge. Use Cases of Multiple Query Parts

One use of multiple query parts is to limit the branching of a pattern search. The example below searches for occurrences of a pattern with three edges, but only searches for the third edge for the 100 occurrences of the two-edge pattern with lowest duration property. This is done by adding solution modifiers ORDER BY and LIMIT to the WITH clause, causing only 100 results from the first query part to be carried to the next:

MATCH (v)-[e:EdgeFrame]->(w)<-[:EdgeFrame]-()
WITH v, e, w
ORDER BY e.duration LIMIT 100
MATCH (w)-[f:EdgeFrame]->()

Another use is to compute aggregate functions in one query part and use them to filter results in a later query part, as shown here:

MATCH (v)-[e:EdgeFrame]->(w)
WHERE e.duration < 10
WITH avg(e.duration) AS target_duration
MATCH ()-[f:EdgeFrame]->()
WHERE f.duration < target_duration Additional Rules

Query parts and the WITH clause can be used with table queries:

MATCH (t:MyTable)
WHERE t.property > 10
MATCH (v:VertexFrame)
WHERE v.id = t.property + 5

Note that table MATCH query parts can only have one element in the pattern (the table frame).

xGT has restrictions on some combinations of TQL clauses and the WITH keyword:

  • DISTINCT can be used for query parts producing properties, expressions and entity variables.

  • When using DISTINCT with an entity variable, every instance will be different from each other instance, independent of whether property values are the same or not.

  • The grouping idiom with a key given by an entity variable can be used in a WITH or RETURN clause: WITH v, count(*) AS cnt, min(v.property) AS small.

  • However, the combination of ORDER BY on a property with the grouping idiom with an entity variable key, is not allowed in a WITH or RETURN clause: RETURN v, count(*) AS mycnt ORDER BY v.id. Performance Considerations

The use of the WITH clause and multiple query parts is a powerful tool to compose sophisticated queries in TQL. However, the user should be aware of certain considerations related to how those queries will perform on a system:

  • Each WITH clause is describing, in essence, a temporary table frame. It is possible to create large temporary tables that could take up resources in the xGT server (memory and compute time) for their processing.

  • Each query part induces an iteration over the rows produced in the preceding query part, thus any MATCH pattern in this query part will execute once for each row produced.

2.14.6. Union Subqueries

TQL supports the use of the UNION and UNION ALL keywords to aggregate results from multiple subqueries into a single final answer. Each union subquery can contain multiple parts, topology and property modifications, solution modifiers and any other feature described in this chapter. The UNION keyword removes any duplicate rows from the final answer while the UNION ALL keyword returns all rows including any duplicates.

Each subquery in a union must have a RETURN keyword specifying its answer. The RETURN specification of each subquery must have the same number of columns, with the same name and in the same order.

Each subquery in a union is independent and variables are not shared between them. The subqueries are run successively, so subsequent subqueries do see any frame modifications made by earlier subqueries.

The example below shows how to use the UNION ALL keyword to aggregate results from two subqueries. In this case, the first union subquery searches the graph for young employees of small companies and reports back the name of the employees and the companies they work for. The second union subquery searches for older employees of large companies, also reporting back the name of the employees and companies. The answer for the entire query is the aggregated results of each individual union subquery.

MATCH (p:People)-[:WorksFor]->(c:Companies)
WHERE indegree(c, WorksFor) < 10 AND p.age < 25
RETURN p.name, c.company_name
MATCH (p:People)-[:WorksFor]->(c:Companies)
WHERE indegree(c, WorksFor) > 10000 AND p.age > 50
RETURN p.name, c.company_name

This second example illustrates the use of the UNION keyword to aggregate distinct results across two subqueries. The first union subquery searches the graph for young employees of small companies, while the second subquery searches the graph for young contractors of large companies. With the UNION keyword each duplicate row is reported only once. That is each unique combination of a person’s name, company name and the boolean is_contractor status will appear only once in the final results set. The UNION ALL keyword allows for duplicates to appear multiple times in the results set.

MATCH (p:People)-[:WorksFor]->(c:Companies)
WHERE indegree(c, WorksFor) < 10 AND p.age < 25
RETURN p.name, c.company_name, false AS is_contractor
MATCH (p:People)-[:ContractorFor]->(c:Companies)
WHERE indegree(c, ContractorFor) > 10000 AND p.age < 25
RETURN p.name, c.company_name, true AS is_contractor Visibility of Changes

Union subqueries are executed in the order in which they appear in the original query. This implies that changes to graph frames performed in earlier union subqueries are visible to subsequent subqueries. Query parts in each union subquery follow the rules specified in Visibility of Changes. Query parts will be able to see changes to graph frames performed in previous query parts within the same union subquery, as well as query parts executed for previous union subqueries.

2.14.7. Optional Matches

Each query part of a TQL query can contain one MATCH structure. If a MATCH structure is present, then one or more optional matches are allowed in the query part. An optional match is not allowed without a required match. In this section, the primary MATCH structure will be referred to as the required match to distinguish it from any optional matches.

An optional match has the same components as a required match. It has a structure with a graph pattern and may be followed by a WHERE clause of constraints. In the example below, the optional match has the structure (c)-[:CompetesAgainst]->(c2:Companies) and the constraint c.city = c2.city.

MATCH (p:People)-[:WorksFor]->(c:Companies)
WHERE indegree(c, WorksFor) < 10 AND p.age < 25
OPTIONAL MATCH (c)-[:CompetesAgainst]->(c2:Companies)
WHERE c.city = c2.city
RETURN p.name, c.company_name, c2.company_name

For each result returned by the required match, xGT will run the optional match. If there are no results for the optional match, a single row is added to the results frame with references to variables in the required match populated and any references to variables in the optional match set to null. If there are one or more matches to the optional structure, each match generates a new result row.

In the example above, suppose that the required match structure and constraints matched three instances: the first with “John Smith” and “CompanyA”, the second with “Jane Doe” and “CompanyB”, and the third with “Amy Li” and “Company B”. For each of these results, the optional match is executed. Suppose that for the first result of the required match there are no matches to the optional structure, for the second result of the required match there are three matches to the optional structure (with “CompanyC”, “CompanyD”, and “CompanyE”), and for the third result of the required match there are three matches to the optional structure (with “CompanyC”, “CompanyD”, and “CompanyE”). Then the set of results to the above query would be:

John Smith, CompanyA, null
Jane Doe, CompanyB, CompanyC
Jane Doe, CompanyB, CompanyD
Jane Doe, CompanyB, CompanyE
Amy Li, CompanyB, CompanyC
Amy Li, CompanyB, CompanyD
Amy Li, CompanyB, CompanyE Valid Optional Structure

Each optional match structure must connect to the required match structure in its query part through a shared vertex. This can be done through a shared variable, by using variable equality, or by using key equality. Valid TQL queries include the following examples:

MATCH (p:People)
OPTIONAL MATCH (p)-[:WorksFor]->(c)
MATCH (p:People)
OPTIONAL MATCH (q)-[:WorksFor]->(c)
WHERE q = p
MATCH (p:People)
OPTIONAL MATCH (q)-[:WorksFor]->(c)
WHERE q.key = p.key

The following is invalid because the optional structure does not connect to the required structure:

MATCH (p:People)
OPTIONAL MATCH (q)-[:WorksFor]->(c)

The optional structure can define new variables and can contain multiple linear chains connected by commas:

MATCH (p:People)
OPTIONAL MATCH (p)-[:WorksFor]->(c1), (p)-[:WorksFor]->(c2)

The optional structure can also capture path variables, which could be NULL for failed matches:

MATCH (p:People)
OPTIONAL MATCH q = (p)-[:WorksFor]->(c1)
WITH q Constraints

Note that both the required match and the optional match can have a constraint clause. There can be a WHERE clause for only the required match, for only the optional match, or for both. The constraints only apply to the match structure that immediately preceded it. In the following example, the constraint WHERE c.city = "London" follows the optional match structure and therefore only constrains it. This means that the required match may return results with a c that does not match the city constraint.

MATCH (p:People)-[:WorksFor]->(c:Companies)
OPTIONAL MATCH (c)-[:CompetesAgainst]->(c2:Companies)
WHERE c.city = "London"
RETURN p, c, c2 Multiple Optional Matches

A query part can have more than one optional match, as shown below.

MATCH (c:Companies)
OPTIONAL MATCH (p)-[e1:WorksFor]->(c)
WHERE p.age > 60 AND e1.duration > 30
OPTIONAL MATCH ()-[e2:WorksFor]->()-[:CompetesAgainst]->(c)
WHERE e2.duration > 40

TQL currently restricts the use of variables across optional matches. The same variable can only be used in multiple optional matches in the same query part if that variable was used in the required match. In the examples above, the variable c is used in both optional matches because it is from the required match. However, the second optional match could not reuse the variable p introduced in the first optional match. This restriction also applies to variables used in an optional match’s WHERE clause. The second optional match could not use the variable p in its WHERE clause. Carrying Information Using WITH

It is possible to carry properties and variables obtained from an optional match through a WITH boundary. However, for failed optional matches those values will be NULL.

MATCH (c:Companies)
OPTIONAL MATCH (p)-[e1:WorksFor]->(c)-[]->(d)
WITH c, d, p.age AS age

In this example, d and p.age could have NULL values for some instances of c where the optional match is not satisfied. c is guaranteed to not be NULL as it appears in the required match.

2.14.8. Returning Path Variables

Path variables can be returned as results from a query:

MATCH p = (:Companies)-[:CompetesAgainst *1..5]->(:Companies)
RETURN p AS mypath
INTO Results

In this case, each row of the results will contain a list of the graph elements that form each matched path.

The results can be used in further queries by unwinding the lists and matching them against elements of the frames:

MATCH (a:Results)
UNWIND a.mypath AS graph_elem
WITH graph_elem
MATCH (b:Companies)
WHERE b = graph_elem
RETURN count(*)

The previous example extracts all graph elements from each stored path and then matches only the ones that correspond to vertices of the Companies frame.