4.4. Frame Management

This section descibes how to manage frames, which are used to store data in xGT. A detailed description of the frame data model is found in Graph Data Model: Frames and Namespaces.

Frames in xGT can be created directly by specifying the frame schema. In this case frames are empty after creation and data must be loaded in as a separate step. The API calls are:

Frames can also be created implicitly from a data source. In this case, the schema is inferred from the data, the frame is created, and the data is loaded in one step. The beta API calls are:

4.4.1. Direct Frame Creation

Frames in xGT can be created by the client API calls create_vertex_frame(), create_edge_frame() and create_table_frame(). Each of these API calls requires passing in the frame name and schema.

If the name parameter is a fully qualified name, the frame is created in the namespace specified. If it contains only the name of the frame, the frame is created in the default namespace. For more information on namespaces, see Graph Data Model: Frames and Namespaces. If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.

The schema describes the names and data types of each column of a frame. These columns correspond to the properties of each element of the frame. For example, in the example below, the Employees frame has a schema with three columns and any vertex that will belong to this frame will have these three properties (the value of a property can be null if that column is not a key column):

server = xgt.Connection()

person_frame = server.create_vertex_frame(name = 'Employees',
                                          schema = [['person_id', xgt.INT],
                                                    ['name', xgt.TEXT,
                                                    ['start_date', xgt.DATE]],
                                          key = 'person_id')

A list of supported data types is found in Data Movement.

To create a vertex frame, one column of the schema must be specified as the key with the key parameter and will be used to uniquely identify vertices in the frame.

To create an edge frame, additional parameters specifying the source and target vertices must be provided. The source and target parameters indicate which vertex frames the edge frame is connecting. Each of these is either the name of a vertex frame or a VertexFrame object. They can be the same or different frames. The source_key and target_key parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge. For each row of the edge frame, the value assigned to the source_key column corresponds to the key column of a row in the vertex frame given by the source parameter. Similarly, for each row of the edge frame, the value assigned to the target_key column corresponds to the key column of a row in the vertex frame given by the target parameter. The remaining columns are properties of the edge.

4.4.1.1. Example

server = xgt.Connection()

person_frame = server.create_vertex_frame(name = 'Employees',
                                          schema = [['person_id', xgt.INT],
                                                    ['name', xgt.TEXT],
                                                    ['start_date', xgt.DATE]],
                                          key = 'person_id')

company_frame = server.create_vertex_frame(name = 'Companies',
                                           schema = [['company_id', xgt.INT],
                                                     ['name', xgt.TEXT],
                                                     ['profit', xgt.INT]],
                                           key = 'company_id')

friend_frame = server.create_edge_frame(name = 'FriendsWith',
                                        source = 'Employees',
                                        target = 'Employees',
                                        schema = [['source_id', xgt.INT],
                                                  ['target_id', xgt.INT]],
                                        source_key = 'source_id',
                                        target_key = 'target_id')

work_frame = server.create_edge_frame(name = 'WorksFor',
                                      source = 'Employees',
                                      target = 'Companies',
                                      schema = [['source_id', xgt.INT],
                                                ['target_id', xgt.INT],
                                                ['position', xgt.TEXT],
                                                ['years' , xgt.INT]],
                                      source_key = 'source_id',
                                      target_key = 'target_id')

The frame Employees contains vertices representing people. The key column person_id uniquely identifies each vertex, while the columns name and start_date give additional information, which need not be unique. Note that because the frame schema of Employees has three columns, each vertex within it must have three properties, one of them the unique key. The Companies frame contains vertices representing companies, with the company_id column uniquely identifying each vertex.

Frames FriendsWith and WorksFor define two edge frames, each connecting already defined vertex frames. With direct frame creation using create_edge_frame(), the source and target frames must be created before an edge frame is created. Note that an edge frame may connect vertices from the same vertex frame, as FriendsWith does, or may connect vertices from two different frames, as WorksFor does.

4.4.2. Implicit Frame Creation

xGT supports implicit frame creation from input data through the beta methods: create_vertex_frame_from_data(), create_edge_frame_from_data(), and create_table_frame_from_data(). These methods provide an easier way to create graphs in xGT, combining the frame creation and data loading steps, as well as automatically inferring the schema of the frame.

If the name parameter is a fully qualified name, the frame is created in the namespace specified. Otherwise, it is created in the default namespace. For more information on namespaces, see Graph Data Model: Frames and Namespaces. If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.

4.4.2.1. Implicit Vertex Frame Creation

To create a vertex frame from data using create_vertex_frame_from_data(), the data source, frame name, and key name must be passed in. The key parameter is the name of the column that contains the unique key identifying each vertex. If the data source is a CSV file with no header, then the key parameter should be an integer representing the position of the key column.

The example below shows creating a vertex frame named my_vertex from a pyarrow table, which must contain a column named id.

a_frame = conn.create_vertex_frame_from_data(pytab, name = 'my_vertex', key = 'id')

4.4.2.2. Implicit Edge Frame Creation

To create an edge frame from data using create_edge_frame_from_data(), the data, frame name, source vertex frame name, target vertex frame name, source key, and target key must be passed in.

The source and target parameters indicate which vertex frames the edge frame is connecting. Each of these is either the name of a vertex frame or a VertexFrame object. If either endpoint vertex frame does not already exist, it is created with a single column named “id”. Otherwise, the existing vertex frame is used if its schema is compatible.

The source_key and target_key parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge. These parameters should be string column names or in the case of a CSV file with no header, integer column positions. If the source key or target key columns of the data refer to any vertices not already in the source or target vertex frames, these will be implicitly inserted into the vertex frames.

The example below shows creating an edge frame named “worksFor” along with two endpoint vertex frames. The source vertex frame named Employees is created beforehand, but the target vertex frame named Companies is automatically created. For each edge, the column of data.csv named employee_id identifies the source vertex in the Employee vertex frame and the column of data.csv named department_id identifies the target vertex in the Companies vertex frame.

conn.create_vertex_frame_from_data('employees.csv', name = 'Employees', key = 'person_id')

conn.create_edge_frame_from_data('data.csv', name = 'worksFor',
                                 source = 'Employees', target = 'Companies',
                                 source_key = 'employee_id', target_key = 'department_id')

4.4.2.3. Implicit Table Frame Creation

To create a table frame from data using create_table_frame_from_data(), the data source and table frame name must be passed in. The example below shows creating a table frame named my_table from a Parquet file on the server filesystem.

a_frame = conn.create_table_frame_from_data('xgtd://data.parquet', name = 'my_table')

4.4.2.4. Schema Inference

The methods create_vertex_frame_from_data(), create_edge_frame_from_data(), and create_table_frame_from_data() allow the xGT frame schema, including the column names and data types, to be automatically inferred from the source data.

The schema inference can also be done directly with get_schema_from_data(). This returns an xGT frame schema from the data without creating any frame. This method can be used to see what schema would be used before creating a graph frame from the data. It also allows the user to adjust the schema before it is used to create a frame.

While xGT will try to infer data types, users who want more control over specifying the data types or column names of the schema can pass in a schema.

The example below shows using get_schema_from_data() to get a schema from the data, modifying its third column, and then passing the schema into create_vertex_frame_from_data(). The vertex frame is then created with a schema that includes a column of type IPADDRESS named “source_ip”.

# Get a schema from a CSV file.
my_schema = conn.get_schema_from_data('data.csv')
# Change the name of the third column in the schema.
my_schema[2][0] = 'source_ip'
# Change the data type of the third column of the schema.
my_schema[2][1] = xgt.IPADDRESS

# Create a frame from the data, but pass in the manually adjusted schema.
a_frame = conn.create_vertex_frame_from_data('data.csv', name = 'v', key = 'source_ip',
                                             schema = my_schema)

Any schema can be be passed in as long as it is compatible with the data source. It does not have to be a schema returned by get_schema_from_data().

Currently, only scalar data types are supported in the input data for automatic schema inference. The input data cannot contain lists or other composite data types. The following xGT data types can be automatically inferred from data:

  • BOOLEAN

  • INTEGER

  • FLOAT

  • DATE

  • TIME

  • DATETIME

  • TEXT

Schema inference will fail and return an XgtTypeError if the data contains any unsupported types.

4.4.2.5. Data Sources Supported

Frames can be automatically created from the following data types:

  • Arrow table

  • Pandas dataframe

  • CSV file

  • Parquet file

Arrow tables

To create an xGT frame from an Arrow table, use pyarrow. The data_source parameter must be a pyarrow table.

The pyarrow table schema must contain only the following data types:

  • boolean: maps to xGT BOOLEAN

  • signed integer: maps to xGT INTEGER

  • unsigned integer: maps to xGT INTEGER

  • float32: maps to xGT FLOAT

  • float64: maps to xGT FLOAT

  • decimal128: maps to xGT FLOAT

  • decimal256: maps to xGT FLOAT

  • time32: maps to xGT TIME

  • time64: maps to xGT TIME

  • date32: maps to xGT DATE

  • date64: maps to xGT DATETIME

  • timestamp: maps to xGT DATETIME

  • string: maps to xGT TEXT

Pandas frames

To create an xGT frame from Pandas, the data_source parameter must be a pandas dataframe. The data types are first translated to pyarrow types, which then map to xGT types as described above.

CSV files

To create an xGT frame from a CSV file, the data source parameter be a string with the file path. The file can be on the file sytem local to the client or on the file system local to the server. As described in Data Movement, a file on the server filesystem if indicated with the xgtd:// protocol.

When creating from a CSV file, the delimiter and header_mode parameters can be passed in to parse the file. Note that if the CSV file has no header, the column names of the inferred schema will be named by default: “f0”, “f1”, etc. Otherwise, the header is used to assign schema column names.

The data in the file is first translated to pyarrow types, which then map to xGT types as described above.

Parquet files

To create an xGT frame from a Parquet file, the data source parameter be a string with the file path. The file can be on the file sytem local to the client or on the file system local to the server. A file on the server filesystem if indicated with the xgtd:// protocol. For any additional restrictions of loading into xGT from a Parquet file, see Reading Parquet Files.

The data in the file is first translated to pyarrow types, which then map to xGT types as described above.

4.4.3. Retrieving Frames

If a frame has been previously created and already exists on the server, the client can be used to retrieve a proxy object to that frame. This is done with get_vertex_frame(), get_edge_frame(), and get_table_frame(). Additionally, client API calls are provided to list the existing table, vertex and edge frames in a running xGT instance: get_vertex_frames(), get_edge_frames() and get_table_frames(). Those calls can be restricted to list the frames resident in particular namespaces.

4.4.4. Dropping Frames

All frame types can be deleted using the client API call drop_frame(). Note that a frame cannot be dropped if doing so creates an invalid graph. In order to drop a vertex frame, it must not be the source or target of any edge frame. This is the case even if the frames are empty. Frame Drop provides additional information about dropping frames.