2.12. CALL Clause¶
The CALL clause executes procedures from a catalog of available procedures. Procedures optionally take arguments as inputs and either produce resulting arguments or modify the system through side effects. Procedures that have output arguments produce rows of results where the output arguments define the columns of the rows. The YIELD sub-clause defines which output parameters of the procedure are included in the result and their order. The syntax for the CALL clause is
CALL procedure_name([input_param][, input_param...])
YIELD output_param [, output_param...]
The YIELD sub-clause is usually required. The CALL clause must also usually be followed by a RETURN or WITH statement. Here are a couple of examples of valid CALL usage.
CALL procedure_name(input_param1, input_param2)
YIELD output_param1
RETURN output_param1
CALL procedure_name(input_param1)
YIELD output_param1, output_param2
WITH output_param1, output_param2
WHERE output_param1 > 7
RETURN output_param1, output_param2
The one exception to requiring YIELD and RETURN or WITH clauses is a stand-alone CALL clause. In this case, the RETURN clause or both the YIELD and RETURN clauses may be omitted, and all the outputs of the procedure are returned in the default order. Examples are
CALL procedure_name(input_param1, input_param2)
CALL procedure_name(input_param1, input_param2)
YIELD output_param1
A CALL can also be followed by a WHERE clause to filter the results.
CALL procedure_name(input_param1)
YIELD output_param1, output_param2
WHERE output_param1 > 7
RETURN output_param1, output_param2
2.12.1. Whole Graph Algorithms¶
xGT supports the following whole graph algorithms:
One common concept used in the graph algorithms is the graph on which an algorithm operates. In xGT a graph is a collection of vertex frames and edge frames that are connected. A vertex frame without any attached edge frames would just be a bunch of isolated vertices so can’t be in a graph. An edge frame can’t be in a graph if one of its source or target vertex frames is not included. Graphs are defined by a list of edge frames. The source and target vertex frames of each of the edge frames are included in the graph. The frames in the graph will often be all connected, but it is possible to have multiple separate components.
Another common concept used in the graph algorithms is a unique representation of a vertex across all frames. The pair of a frame name and a vertex key are used to uniquely represent a vertex.
All graph algorithms take two parameters.
The first describes the graph on which the algorithm operates, and the second is a map giving any additional parameters to the algorithm.
Here is an example call for page_rank()
:
CALL page_rank("graph_namespace",
{ tolerance : .0001 })
YIELD frame, key, rank
RETURN frame, key, rank
INTO Results
The first parameter can be either a string literal representing a namespace or a list of string literals representing a list of edge frames. When a namespace name is given, all the edge frames the user has access to in the namespace along with their source and target vertex frames the user has access to in the namespace are used to define the graph. If one of the source or target vertex frames of an edge frame is not in the given namespace or the user doesn’t have access to the source or target vertex frame, that edge frame is not included. The example above shows calling a whole graph algorithm giving a namespace to define the graph.
When a list of edge frames is given, those frames along with their source and target vertex frames are used to define a graph. If a user gives an edge frame they don’t have access to or if the user doesn’t have access to one of the source or target vertex frames, a security error is thrown. Passing an empty list of edge frames causes all edge frames the user has access to in all namespaces in the system along with their source and target vertex frames to be used to define the graph. As before if the user doesn’t have access to the source or target vertex frame of one of the edge frames, that edge frame is not included. This example shows calling a whole graph algorithm giving a list of edge frames to define the graph.
CALL breadth_first_search(["Graph__Edge1", "Graph__Edge2"],
{ sources : [["Graph__Vertex1", "0"]],
max_depth : 2 })
YIELD frame, key, depth, parent_frame, parent_key
RETURN frame, key, depth, parent_frame, parent_key
INTO Results
2.12.1.1. Breadth First Search¶
The algorithm breadth_first_search()
performs a breadth first search on a graph.
This algorithm starts from one or more source vertices and visits all the vertices reachable from the source vertices by traversing incident edges, expanding out a level of neighbors at a time.
The algorithm supports the following input parameters given in the map:
Name |
Type |
Default |
Description |
---|---|---|---|
sources |
List of lists of string |
N/A |
A list of frame name and vertex key pairs representing the source vertices. |
max_depth |
Integer |
No limit |
The depth at which to stop the search. |
direction |
String |
forward |
The direction to perform the search. Valid values are “forward”, “reverse”, or “undirected”. |
all_edges |
Boolean |
false |
False returns only edges in the BFS tree. True returns all edges traversed by the search. |
The “sources” parameter is required. All other parameters are optional. The “sources” parameter is a list of lists of string where each sublist has two elements: a frame name and a vertex key. Note that the vertex key must be given as a string regardless of the type of the key.
The “max_depth” parameter indicates the depth at which to stop the search. A depth of 1 indicates one level out from the sources. The default behavior is to continue searching until no new vertices are discovered.
The “direction” parameter indicates the direction the search traverses incident edges. The default is the “forward” direction which traverses outgoing edges of vertices. The “reverse” direction traverses incoming edges of vertices. The “undirected” direction traverses both outgoing and incoming edges of vertices essentially treating the graph as if it were undirected. For a forward search, the parent is always the source of the edge. For a reverse search, the parent is always the target of the edge. For an undirected search, the parent could be either the source or target of the edge.
The “all_edges” parameter determines which edges traversed by the search are included in the results. The default value of “false” returns only the edges in the BFS tree. These are the edges where the child vertex is newly discovered by the search. This behavior returns a row for each unique vertex discovered by the search. A value of “true” returns all the edges traversed by the search. This behavior returns a row for each unique edge visited by the search. Child vertices are not unique with this behavior.
For all combinations of options, the search traverses edges only once. Thus, edge uniqueness is guaranteed in the results. For an all_edges search, the same result row could appear multiple times in a search, but each entry represents a different edge between the same parent, child vertex pair. The duplicate entries could come from duplicate edges in a multigraph or from edges going in both directions for an undirected search.
Breadth first search has the following output parameters:
Name |
Type |
Description |
---|---|---|
frame |
String |
Name of the frame of a found vertex. |
key |
Original key type or string |
Key of a found vertex. |
depth |
Integer |
Depth at which the vertex was found. |
parent_frame |
String |
Name of the frame of the parent of a found vertex. |
parent_key |
Original key type or string |
Key of the parent of a found vertex. |
direction |
String |
Direction the search traversed the edge. Either “forward” or “reverse”. |
The breadth first search algorithm outputs the vertices it finds along with the depth the vertex was found at, the parent vertex, and the direction the edge between the child and parent vertices was traversed. The type of the keys in the results is dependent on the types of the keys in the input vertex frames. If the types of the keys of all the input vertex frames is the same, then that type is used for the output key type. Otherwise, string is used as the output key type.
When the direction output is “forward” that means the search traversed an edge with the parent as the source and the child as the target. When the direction output is “reverse” that means the search traversed an edge with the child as the source and the parent as the target. For a search in the “forward” direction, the direction output is always “forward”. For a search in the “reverse” direction, the direction output is always “reverse”. The direction output is more useful when the search is in the “undirected” direction as the search can traverse edges in both directions.
2.12.1.2. PageRank¶
The algorithm page_rank()
assigns a score to each vertex in the graph based on the number of incoming edges to the vertex and the scores of the sources of those edges.
This is the well-known PageRank algorithm developed by Larry Page.
The algorithm supports the following optional input parameters given in the map:
Name |
Type |
Default |
Description |
---|---|---|---|
damping_factor |
Float |
0.85 |
The damping factor used by PageRank. |
max_iterations |
Integer |
20 |
The maximum number of iterations before stopping the algorithm. |
tolerance |
Float |
0.0001 |
The algorithm stops after the norm of the difference between the score vectors of successive iterations is less than this value. |
norm |
String |
infinity |
The norm used to decide when to stop the algorithm. Valid values are “L1”, “L2”, and “infinity”. |
sources |
List of lists of string |
Empty list |
A list of frame name and vertex key pairs representing the set of personalized vertices. |
The “damping_factor” parameter is the standard PageRank damping factor that balances between visiting an adjacent vertex or teleporting to a random vertex. PageRank is an iterative algorithm. When the iteration stops is determined by a combination of the “max_iterations”, “tolerance”, and “norm” parameters. The iteration stops when either the maximum number of iterations has been reached or the norm of the difference between the score vectors of successive iterations is less than the tolerance. Each iteration generates a score vector that contains a score for each vertex in the graph. Taking the norm of the difference between successive score vectors gives a value to how much the last iteration improved the scores, and the algorithm stops if this value is less than a tolerance. The standard “L1”, “L2”, and “infinity” norms are supported. The “sources” parameter is used to perform personalized PageRank which restricts the vertices visited by a teleportation to the set of vertices given.
PageRank has the following output parameters:
Name |
Type |
Description |
---|---|---|
frame |
String |
Name of the frame of a vertex. |
key |
Original key type or string |
Key of a vertex. |
rank |
Float |
PageRank of a vertex. |
The PageRank algorithm outputs all the vertices in the graph along with the PageRank score for each vertex. The type of the keys in the results is dependent on the types of the keys in the input vertex frames. If the types of the keys of all the input vertex frames is the same, then that type is used for the output key type. Otherwise, string is used as the output key type.
2.12.1.3. Strongly Connected Components¶
The algorithm strongly_connected_components()
finds the strongly connected components of a directed graph.
A strong component is a subset of the vertices of a graph that are all reachable from each other both by traversing only forward edges and by traversing only reverse edges.
The algorithm has no input parameters beyond the graph.
Strongly connected components has the following output parameters:
Name |
Type |
Description |
---|---|---|
frame |
String |
Name of the frame of a vertex. |
key |
Original key type or string |
Key of a vertex. |
component |
Integer |
Integer representing the component. |
The strongly connected components algorithm outputs all the vertices in the graph along with an integer representing the component each vertex belongs to. Each component has a unique integer, and the integers are not contiguous. The type of the keys in the results is dependent on the types of the keys in the input vertex frames. If the types of the keys of all the input vertex frames is the same, then that type is used for the output key type. Otherwise, string is used as the output key type.
2.12.1.4. Weakly Connected Components¶
The algorithm weakly_connected_components()
finds the weakly connected components of a directed graph.
A weak component is a subset of the vertices of a graph that are all reachable from each other by traversing edges, ignoring direction.
The algorithm has no input parameters beyond the graph.
Weakly connected components has the following output parameters:
Name |
Type |
Description |
---|---|---|
frame |
String |
Name of the frame of a vertex. |
key |
Original key type or string |
Key of a vertex. |
component |
Integer |
Integer representing the component. |
The weakly connected components algorithm outputs all the vertices in the graph along with an integer representing the component each vertex belongs to. Each component has a unique integer, and the integers are not contiguous. The type of the keys in the results is dependent on the types of the keys in the input vertex frames. If the types of the keys of all the input vertex frames is the same, then that type is used for the output key type. Otherwise, string is used as the output key type.