Pure Danger Tech


navigation
home

SPARQL datasets and named graphs

09 Nov 2011

I’ve been working on understanding thoroughly how RDF datasets used in SPARQL 1.1 are defined and how named graphs and GRAPH graph patterns are evaluated. I have based this write-up solely on the SPARQL 1.1 specification (mostly section 13) and intentionally not on how existing SPARQL engines or stores actually work.

Datasets

When evaluating a SPARQL query, the query is answered in the context of an RDF dataset. An RDF dataset is comprised of 1 default graph and 0 or more named graphs. Each named graph holds RDF triples. Triples may be included in one or more of the default graph or named graphs but are considered independently when coming from each graph (for example, blank nodes are not comparable).

There are three places a dataset can be defined:

  1. Query processor – by default, the query processor will interpret the query in terms of the default dataset. The contents of the default dataset are determined by the query processor and the SPARQL query does not affect it.
  2. In the query – FROM and FROM NAMED in the SPARQL query define a dataset.
    • Default graph = RDF-MERGE of all of the FROM graphs (empty if none).
    • Named graphs = set of graphs specified in FROM NAMED (empty if none).
  3. In the protocol – if the SPARQL query is executed via the SPARQL protocol, the dataset information (default and named graphs) may be specified in the protocol instead.

When executing a query, exactly one of these datasets is chosen in order of precedence:

Protocol > SPARQL > default

For the rest of this discussion, “the dataset” refers to the dataset chosen in the prior step.

Active graph

When matching triple patterns in the query, the “active graph” is used to determine the scope of matching:

  1. By default, the active graph is set to the default graph of the dataset.
  2. The GRAPH graph pattern can be used to alter the active graph. If the GRAPH graph pattern is used, only the named graphs are considered.
    1. Fixed GRAPH patterns specify an IRI to use as a named graph. In this case, only the specified named graph in the dataset will be used. If the IRI is not one of the named graphs in the dataset, the active graph will be the empty graph.
    2. Variable graph patterns specify a variable to bind to the graph of each solution. In this case, the whole graph pattern is matched against each named graph in the data set, the graph variable is bound in each solution and the results are unioned.

Examples

In all examples, assume there is a default dataset defined by the query processor with default graph = DG and named graphs NG1 and NG2. No dataset is provided in the protocol.

Example 1

SELECT ?a ?b ?c ?g
WHERE { 
  { ?a … }
  GRAPH <NG1> {
    ?b … 
  }
  GRAPH ?g {
    ?c … 
  }}
  • Dataset: There is no FROM or FROM NAMED so the processor’s default dataset is used.
  • Default graph: DG
  • Named graphs: NG1, NG2
  • Active graph
    • Containing ?a: DG
    • Containing ?b: NG1
    • Containing ?c: NG1, NG2

Example 2

SELECT ?a ?b ?c ?g
FROM graph1
FROM graph2
WHERE { 
  { ?a … }
  GRAPH <graph1> {
    ?b … 
  }
  GRAPH ?g {
    ?c … 
  }}
  • Dataset: There is a FROM, so the default dataset is discarded.
  • Default graph: RDF-MERGE(graph1, graph2)
  • Named graphs: None
  • Active graph
    • Containing ?a: RDF-MERGE(graph1, graph2)
    • Containing ?b: empty graph
    • Containing ?c: empty graph

Example 3

SELECT ?a ?b ?c ?g
FROM NAMED graph1
FROM NAMED graph2
WHERE { 
  { ?a … }
  GRAPH <graph1> {
    ?b … 
  }
  GRAPH ?g {
    ?c … 
  }}
  • Dataset: There is a FROM NAMED, so the default dataset is discarded.
  • Default graph: empty
  • Named graphs: graph1, graph2
  • Active graph
    • Containing ?a: empty graph
    • Containing ?b: graph1
    • Containing ?c: graph1, graph2

Example 4

SELECT ?a ?b ?c ?g
FROM graph1
FROM graph2
FROM NAMED graph3
FROM NAMED graph4
WHERE { 
  { ?a … }
  GRAPH <graph3> {
    ?b … 
  }
  GRAPH ?g {
    ?c … 
  }}
  • Dataset: There is a FROM and FROM NAMED, so the default dataset is discarded.
  • Default graph: RDF-MERGE(graph1, graph2)
  • Named graphs: graph3, graph4
  • Active graph
    • Containing ?a: RDF-MERGE(graph1, graph2)
    • Containing ?b: graph3
    • Containing ?c: graph3, graph4

Questions

  1. What happens if you specify a fixed IRI in a GRAPH pattern that is not one of the named graphs in the dataset? The specification does not explicitly cover this but I believe this statement in the spec: “The GRAPH keyword is used to make the active graph one of all of the named graphs in the dataset for part of the query.” implies that only named graphs in the dataset will return data in a GRAPH.