Data management intro
The xGT tool implements a strongly-typed property graph data model.
Users need to define the schema of a graph prior to doing anything with
any data.
Data is loaded into the appropriate components of the graph (vertex or edge) by
getting a proxy object from xgt
.
For example, if the schema has an edge type called ReportsTo
, the proxy
object can be retrieved with this python statement:
repTo = myGraph.edges.ReportsTo
The rest of this document assumes you have set up a graph component in a python
variable in a way that is similar to repTo
above.
Getting data into xGT
There are two ways to get data into xGT: across a network and from a filesystem.
All forms of getting data into xGT can be expressed by calling the load()
method in the xgt
python module.
The signature of the load method is: entity.load(paths, headerMode=xgt.HeaderMode.NONE)
,
where entity
may be a table, vertex, or edge.
The paths
parameter describes where xgt
can find the CSV file or files.
The headerMode
parameter is a flag indicating whether or not the CSV data
sources contain a first line that is a header (i.e., column names) and how it
should be handled. Modes include xgt.HeaderMode.NONE, xgt.HeaderMode.IGNORE,
xgt.HeaderMode.NORMAL, and xgt.HeaderMode.STRICT. Nonw means no header. Ignore
means ignore the header. Normal means map the header to the schema in a relaxed
way. It will ignore columns it can't map and fill in columns with null that
aren't mapped. Strict means to error if the schema isn't fully mapped or if
additional columns exist in the file not in the schema. The IGNORE keyword in
the header can be used to ignore a column and not error.
There are four variants of a path value based on where to find the CSV file.
1. Reading from the client filesystem
This method is the most straightforward. The path
parameter simply is
an absolute or relative path on the client's local filesystem.
graphcomponent.load("../data/path/my.CSV")
repTo.load("../../company/reportsTo.csv")
graphcomponent.load(("../data/path/myfirst.csv", "c:/absolute/path/mysecond.csv"))
These load calls request xgt
to search the local filesystem for the files.
Note that if the xGT server is running on a remote system, there will be
communication I/O involved in servicing these function calls.
2. Reading from the server filesystem
This method is is essentially a request for the xGT server to go looking for
and ingest the CSV files. The path is preceded with a protocol specification
telling xgt
to pass off the request to the server (the xGT daemon).
graphcomponent.load("xgtd://../data/path/my.CSV")
graphcomponent.load(("xgtd://../data/path/myfirst.csv", "xgtd://../data/path/mysecond.csv"))
3. Reading from a URL
This method asks the xGT server to retrieve CSV-formatted data from a URL.
The protocol can be either http
or https
.
graphcomponent.load("http://www.example.com/data/path/my.CSV")
graphcomponent.load(("http://www.example.com/data/myfirst.csv", "https://www.example.com/data/mysecond.csv"))
4. Reading from an AWS S3 bucket
This method asks the xGT server to pull CSV-formatted data directly from an AWS S3 bucket.
graphcomponent.load("s3://my-s3-bucket/data/path/my.CSV")
graphcomponent.load(("s3://my-s3-bucket/data/myfirst.csv", "s3://my-s3-bucket/data/mysecond.csv"))
Getting data out of xGT
The save()
method is used to request xGT to write data into a CSV file on
either the client or server filesystem.
The signature of the save method is:
entity.save(path, offset=0, length=None, headers=False)
,
where entity
may be a table, vertex, or edge.
This is the preferred method for extracting the result of a query (saving from the result table of a MATCH query).
There are only two variants of the path for a save method.
1. Saving to the client filesystem
This method simply provides an absolute or relative path to the client's local filesystem
graphcomponent.save("../data/path/myoutput.CSV")
searchResult.save("../data/interesting.data.csv", offset=10, length=100, headers=True)
Note that the searchResult save pulls at most 100 rows, starting from row 10 of a result table, and creates a CSV file with the column names included. This file can be read directly into some other analytic tool such as MS Excel.
2. Saving to the server filesystem
This method provides an absolute or relative path on the server's filesystem
as indicated by the xgtd://
protocol.
graphcomponent.save("xgtd://../data/path/myoutput.CSV")
searchResult.save("xgtd://../data/interesting.data.csv", offset=10, length=1000000, headers=True)
This method should be used when you know the size of the data is prohibitively large. It is certainly possible to copy the data elsewhere after saving on to the server filesystem.