Introduction
Beginning in version 8.0 AllegroGraph added a second distributed triple store implementation. The new version, called Fedshard Triple Store, is a more flexible than the original version. With Fedshard you can define distributed repositories while the server is running. You can also split shards that have gotten too large.
Fedshard is described in New Dynamic Cluster Setup and New Dynamic Cluster Tutorial.
In AllegroGraph version 9.0 the original distributed triple store implementation has been removed. In this document we describe how to convert from the old distributed triple store to the new Fedshard triple store.
The conversion has four parts
- export all triples into a text file
- write a Fedshard definition of the repository
- upload the Fedshard definition to the server
- load the exported triples from step 1 into the Fedshard repository.
You can't migrate a distribute repository using backup/restore due to the different way the shard repositories are named and the way that the triples are divided between shards. You must reload the repository from a text file listing all the triples.
Exporting triples
You can use agtool export to export the triples to a text file. You have to do this from a version of AllegroGraph before version 9.0 since the old style distributed repository cannot be opened in version 9.0 of Allegrograph. It's very likely that if you have a distributed repository that you are actually storing quads in the repository, so you'll want to do something like this:
agtool export -o nquads myrepo myrepo.nq If the repo is large you'll want to use multiple workers when doing the export
agtool export -o nquads --workers 10 myrepo myrepo.nq in which case the triples are written to files myrepo-1.nq, myrepo-2.nq, etc and you can combine the output files with
cat myrepo-*.nq > myrepo.nq Write a Fedshard definition file
The repositories in the old distributed triple store implementation were defined in the agcluster.cfg file (in the lib directory of the server). You've got a text file containing all the triples. You can choose to just load them into a single repository and not make the repository distributed. However if you want to continue using a distributed repository you will need to define a Fedshard repository which could be the same as your previous design or something different.
Next we show two examples of writing a Fedshard definition that matches the design of the old distributed triple store.
Example 1
Here is a simple definition of a distributed repo with three shards:
User test
Password xyzzy
Catalog tests
Port 10035
Scheme http
server shardmachine.com local
db myrepo
key part graph
server local
shardsPerServer 2 The Fedshard equivalent definition is:
fedshard
repo myrepo
key part
secondary-key graph
shards-per-server 3
scheme http
port 10035
user test
password xyzzy
server
host shardmachine.com
catalog tests
Example 2
One can federate each shard with one or more normal repositories we call knowledgebases. The reason for this is to supply triples that would be could be used by Sparql queries over the shard data.
Here is a definition of a distributed repo with knowledgebase (assuming all the top level directives from Example 1:
db myrepo-with-kb
key part graph
server local
shardsPerServer 3
kb tests:distdb-kb and here is the Fedshard definition
fedshard
repo myrepo-with-kb
key part
secondary-key graph
shards-per-server 3
scheme http
port 10035
user test
password xyzzy
server
host 127.0.0.1
catalog tests
kb
repo myrepo-kb
catalog tests
host 127.0.0.1
Upload the Fedshard repo definition to the server
The Fedshard definition is put in a file (e.g. myrepo.def) and then sent to the agraph server (e.g. mymachine:10077) using
agtool fedshard define --server username:password@mymachine:10077 myrepo.def In the old distributed triple store implementation the distributed repo could be in any catalog. The Fedshard implementation puts all Fedshard repos in the fedshard catalog. The shard repositories are put in the catalog specified by the server declaration.
Loading triples into the new Fedshard repo
To load our quads into this repo we would do the following. Note that the name of the Fedshard repo is fedshard:myrepo since all Fedshard repos are in the fedshard catalog.
agtool load username:password@mymachine:10077/fedshard:myrepo myrepo.nq