5.2. Analytics Idioms

There are many analytics that are specific to a user’s dataset. Other analytics are more generic and can be written as a reusable code segment. We show several examples.

5.2.1. Compute Column Statistics

This example shows the computation of fundamental statistics about a single column.

def column_stats(connection, edge_frame, property):
    """
    Compute fundamental statistics about a single property on an edge frame.

    This function returns data containing a single list of the computed statistics,
    or it raises an exception indicating the type of failure (e.g., the
    transaction rolled back).
    """
    query = """
    MATCH ()-[e:{edge_frame}]->()
    RETURN MAX(e.{property}) AS max, MIN(e.{property}) AS min,
           SUM(e.{property}) AS sum, AVG(e.{property}) AS avg
    """.format(edge_frame=edge_frame.name, property=property)
    data, _ = run_query(connection, query)
    return data

Note that this function uses the function described in Run Query.

5.2.2. Compute a Column Histogram

def histogram(connection, edge_frame, property):
    """
    Compute a histogram of the values in one property of an edge frame.

    The result is a sequence of rows holding (value, frequency-count).

    The sequence begins with the most frequent value and proceeds in
    descending order of frequency.
    """
    query = """
    MATCH ()-[e:{edge_frame}]->()
    RETURN e.{property} AS {property}, COUNT(*) AS count
    ORDER BY count DESC
    """.format(edge_frame=edge_frame.name, property=property)
    data, _ = run_query(connection, query)
    return data

5.2.3. Generic Frame-type Idiom

In many situations, a generic function is desired that can work on properties of edge, properties of vertices, or columns of tables. The previous code examples only show support for an edge frame, but they could easily be extended to support any of the xGT frame types. One part of this abstraction is the computation of the MATCH clause. The idea is to formulate a generic MATCH clause and then pass that on to any analytic function that accepts a connection to a server, frame, and a match clause string as parameters. This abstract analytic function can refer to the query variable r, representing the concept of a row in a frame.

def compute_match_clause(connection, frame, analytic_function, **kwargs):
    """
    This function computes an abstract analytic function over some
    frame on the server.  This frame can be any type: vertex, edge, or table.

    A generic format for the MATCH clause is generated, based on frame type.
    Then this generic clause is passed on to the caller-supplied analytic
    function along with other keyword arguments.
    """
    if isinstance(frame, xgt.EdgeFrame):
        match_clause = "MATCH ()-[r:{edge}]->()".format(edge=frame.name)
    elif isinstance(frame, xgt.VertexFrame):
        match_clause = "MATCH (r:{vertex})".format(vertex=frame.name)
    else:
        match_clause = "MATCH (r:{table})".format(table=frame.name)
    return analytic_function(connection, frame, match_clause, **kwargs)