4.4. Frame Management¶
This section descibes how to manage frames, which are used to store data in xGT. A detailed description of the frame data model is found in Graph Data Model: Frames and Namespaces.
Frames in xGT can be created directly by specifying the frame schema. In this case frames are empty after creation and data must be loaded in as a separate step. The API calls are:
Frames can also be created implicitly from a data source. In this case, the schema is inferred from the data, the frame is created, and the data is loaded in one step. The beta API calls are:
4.4.1. Direct Frame Creation¶
Frames in xGT can be created by the client API calls create_vertex_frame()
, create_edge_frame()
and create_table_frame()
.
Each of these API calls requires passing in the frame name and schema.
If the name
parameter is a fully qualified name, the frame is created in the namespace specified.
If it contains only the name of the frame, the frame is created in the default namespace.
For more information on namespaces, see Graph Data Model: Frames and Namespaces.
If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.
The schema describes the names and data types of each column of a frame. These columns correspond to the properties of each element of the frame. For example, in the example below, the Employees frame has a schema with three columns and any vertex that will belong to this frame will have these three properties (the value of a property can be null if that column is not a key column):
server = xgt.Connection()
person_frame = server.create_vertex_frame(name = 'Employees',
schema = [['person_id', xgt.INT],
['name', xgt.TEXT,
['start_date', xgt.DATE]],
key = 'person_id')
A list of supported data types is found in Data Movement.
To create a vertex frame, one column of the schema must be specified as the key with the key
parameter and will be used to uniquely identify vertices in the frame.
To create an edge frame, additional parameters specifying the source and target vertices must be provided.
The source
and target
parameters indicate which vertex frames the edge frame is connecting.
Each of these is either the name of a vertex frame or a VertexFrame
object.
They can be the same or different frames.
The source_key
and target_key
parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge.
For each row of the edge frame, the value assigned to the source_key
column corresponds to the key
column of a row in the vertex frame given by the source
parameter.
Similarly, for each row of the edge frame, the value assigned to the target_key
column corresponds to the key
column of a row in the vertex frame given by the target
parameter.
The remaining columns are properties of the edge.
4.4.1.1. Example¶
server = xgt.Connection()
person_frame = server.create_vertex_frame(name = 'Employees',
schema = [['person_id', xgt.INT],
['name', xgt.TEXT],
['start_date', xgt.DATE]],
key = 'person_id')
company_frame = server.create_vertex_frame(name = 'Companies',
schema = [['company_id', xgt.INT],
['name', xgt.TEXT],
['profit', xgt.INT]],
key = 'company_id')
friend_frame = server.create_edge_frame(name = 'FriendsWith',
source = 'Employees',
target = 'Employees',
schema = [['source_id', xgt.INT],
['target_id', xgt.INT]],
source_key = 'source_id',
target_key = 'target_id')
work_frame = server.create_edge_frame(name = 'WorksFor',
source = 'Employees',
target = 'Companies',
schema = [['source_id', xgt.INT],
['target_id', xgt.INT],
['position', xgt.TEXT],
['years' , xgt.INT]],
source_key = 'source_id',
target_key = 'target_id')
The frame Employees
contains vertices representing people.
The key column person_id
uniquely identifies each vertex, while the columns name
and start_date
give additional information, which need not be unique.
Note that because the frame schema of Employees
has three columns, each vertex within it must have three properties, one of them the unique key.
The Companies
frame contains vertices representing companies, with the company_id
column uniquely identifying each vertex.
Frames FriendsWith
and WorksFor
define two edge frames, each connecting already defined vertex frames.
With direct frame creation using create_edge_frame()
, the source and target frames must be created before an edge frame is created.
Note that an edge frame may connect vertices from the same vertex frame, as FriendsWith
does, or may connect vertices from two different frames, as WorksFor
does.
4.4.2. Implicit Frame Creation¶
xGT supports implicit frame creation from input data through the beta methods: create_vertex_frame_from_data()
, create_edge_frame_from_data()
, and create_table_frame_from_data()
.
These methods provide an easier way to create graphs in xGT, combining the frame creation and data loading steps, as well as automatically inferring the schema of the frame.
If the name
parameter is a fully qualified name, the frame is created in the namespace specified.
Otherwise, it is created in the default namespace.
For more information on namespaces, see Graph Data Model: Frames and Namespaces.
If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.
4.4.2.1. Implicit Vertex Frame Creation¶
To create a vertex frame from data using create_vertex_frame_from_data()
, the data source, frame name, and key name must be passed in.
The key
parameter is the name of the column that contains the unique key identifying each vertex.
If the data source is a CSV file with no header, then the key
parameter should be an integer representing the position of the key column.
The example below shows creating a vertex frame named my_vertex
from a pyarrow
table, which must contain a column named id
.
a_frame = conn.create_vertex_frame_from_data(pytab, name = 'my_vertex', key = 'id')
4.4.2.2. Implicit Edge Frame Creation¶
To create an edge frame from data using create_edge_frame_from_data()
, the data, frame name, source vertex frame name, target vertex frame name, source key, and target key must be passed in.
The source
and target
parameters indicate which vertex frames the edge frame is connecting.
Each of these is either the name of a vertex frame or a VertexFrame
object.
If either endpoint vertex frame does not already exist, it is created with a single column named “id”.
Otherwise, the existing vertex frame is used if its schema is compatible.
The source_key
and target_key
parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge.
These parameters should be string column names or in the case of a CSV file with no header, integer column positions.
If the source key or target key columns of the data refer to any vertices not already in the source or target vertex frames, these will be implicitly inserted into the vertex frames.
The example below shows creating an edge frame named “worksFor” along with two endpoint vertex frames.
The source vertex frame named Employees
is created beforehand, but the target vertex frame named Companies
is automatically created.
For each edge, the column of data.csv named employee_id
identifies the source vertex in the Employee
vertex frame and the column of data.csv named department_id
identifies the target vertex in the Companies
vertex frame.
conn.create_vertex_frame_from_data('employees.csv', name = 'Employees', key = 'person_id')
conn.create_edge_frame_from_data('data.csv', name = 'worksFor',
source = 'Employees', target = 'Companies',
source_key = 'employee_id', target_key = 'department_id')
4.4.2.3. Implicit Table Frame Creation¶
To create a table frame from data using create_table_frame_from_data()
, the data source and table frame name must be passed in.
The example below shows creating a table frame named my_table
from a Parquet file on the server filesystem.
a_frame = conn.create_table_frame_from_data('xgtd://data.parquet', name = 'my_table')
4.4.2.4. Schema Inference¶
The methods create_vertex_frame_from_data()
, create_edge_frame_from_data()
, and create_table_frame_from_data()
allow the xGT frame schema, including the column names and data types, to be automatically inferred from the source data.
The schema inference can also be done directly with get_schema_from_data()
.
This returns an xGT frame schema from the data without creating any frame.
This method can be used to see what schema would be used before creating a graph frame from the data.
It also allows the user to adjust the schema before it is used to create a frame.
While xGT will try to infer data types, users who want more control over specifying the data types or column names of the schema can pass in a schema.
The example below shows using get_schema_from_data()
to get a schema from the data, modifying its third column, and then passing the schema into create_vertex_frame_from_data()
.
The vertex frame is then created with a schema that includes a column of type IPADDRESS
named “source_ip”.
# Get a schema from a CSV file.
my_schema = conn.get_schema_from_data('data.csv')
# Change the name of the third column in the schema.
my_schema[2][0] = 'source_ip'
# Change the data type of the third column of the schema.
my_schema[2][1] = xgt.IPADDRESS
# Create a frame from the data, but pass in the manually adjusted schema.
a_frame = conn.create_vertex_frame_from_data('data.csv', name = 'v', key = 'source_ip',
schema = my_schema)
Any schema can be be passed in as long as it is compatible with the data source.
It does not have to be a schema returned by get_schema_from_data()
.
Currently, only scalar data types are supported in the input data for automatic schema inference. The input data cannot contain lists or other composite data types. The following xGT data types can be automatically inferred from data:
BOOLEAN
INTEGER
FLOAT
DATE
TIME
DATETIME
TEXT
Schema inference will fail and return an XgtTypeError
if the data contains any unsupported types.
4.4.2.5. Data Sources Supported¶
Frames can be automatically created from the following data types:
Arrow table
Pandas dataframe
CSV file
Parquet file
Arrow tables
To create an xGT frame from an Arrow table, use pyarrow
.
The data_source
parameter must be a pyarrow
table.
The pyarrow
table schema must contain only the following data types:
boolean: maps to xGT
BOOLEAN
signed integer: maps to xGT
INTEGER
unsigned integer: maps to xGT
INTEGER
float32: maps to xGT
FLOAT
float64: maps to xGT
FLOAT
decimal128: maps to xGT
FLOAT
decimal256: maps to xGT
FLOAT
time32: maps to xGT
TIME
time64: maps to xGT
TIME
date32: maps to xGT
DATE
date64: maps to xGT
DATETIME
timestamp: maps to xGT
DATETIME
string: maps to xGT
TEXT
Pandas frames
To create an xGT frame from Pandas, the data_source
parameter must be a pandas
dataframe.
The data types are first translated to pyarrow
types, which then map to xGT types as described above.
CSV files
To create an xGT frame from a CSV file, the data source
parameter be a string with the file path.
The file can be on the file sytem local to the client or on the file system local to the server.
As described in Data Movement, a file on the server filesystem if indicated with the xgtd://
protocol.
When creating from a CSV file, the delimiter
and header_mode
parameters can be passed in to parse the file.
Note that if the CSV file has no header, the column names of the inferred schema will be named by default: “f0”, “f1”, etc.
Otherwise, the header is used to assign schema column names.
The data in the file is first translated to pyarrow
types, which then map to xGT types as described above.
Parquet files
To create an xGT frame from a Parquet file, the data source
parameter be a string with the file path.
The file can be on the file sytem local to the client or on the file system local to the server.
A file on the server filesystem if indicated with the xgtd://
protocol.
For any additional restrictions of loading into xGT from a Parquet file, see Reading Parquet Files.
The data in the file is first translated to pyarrow
types, which then map to xGT types as described above.
4.4.3. Retrieving Frames¶
If a frame has been previously created and already exists on the server, the client can be used to retrieve a proxy object to that frame.
This is done with get_vertex_frame()
, get_edge_frame()
, and get_table_frame()
.
Additionally, client API calls are provided to list the existing table, vertex and edge frames in a running xGT instance: get_vertex_frames()
, get_edge_frames()
and get_table_frames()
.
Those calls can be restricted to list the frames resident in particular namespaces.
4.4.4. Dropping Frames¶
All frame types can be deleted using the client API call drop_frame()
.
Note that a frame cannot be dropped if doing so creates an invalid graph.
In order to drop a vertex frame, it must not be the source or target of any edge frame.
This is the case even if the frames are empty.
Frame Drop provides additional information about dropping frames.