Pure Danger Tech


Revelytix announces open source Spark and Sherpa projects

07 Jun 2011

Revelytix is pleased to announce the release of two new open source projects: Spark and Sherpa.


Spark is a Java client API for accessing remote SPARQL processors, similar in style to JDBC, the Java Database Connectivity API, for accessing remote relational databases. SPARQL is a query language for RDF, commonly used to access data in semantic web applications.

Jena and Sesame are popular open-source Java frameworks for working with RDF and SPARQL, however these frameworks tend to be both too much and not enough for some common needs. Both frameworks provide the ability to model RDF datasets, import/export a variety of formats, plug in custom storage engines, and execute queries over models with those storage engines.

After working with Jena, Sesame, a variety of triple stores, and our own SPARQL products, we concluded that there is a missing spot in the stack for a JDBC-style connection-oriented library for accessing remote SPARQL processors. Jena and Sesame both open up storage APIs (for potentially remote stores) but those are at a lower level; the client API is assumed to work at the levels of graphs. The SPARQL HTTP protocol is widely used and supported but offers no client-side programming API or server-side library, does not support connection-oriented use cases, and relies on results sent in a text form (usually XML or JSON) so is not as performant as triple-store specific APIs.

In light of this gap, Spark is:

  • Connection-oriented, so connections can hold the state of an interaction
  • Client-server to leverage either ubiquitous SPARQL endpoints or custom SPARQL processor APIs
  • Interface-oriented and system-agnostic so one API can be used for many SPARQL processors and communication protocols
  • A query API focusing on cursored access to results
  • Lightweight RDF data API, and NOT a graph API (you should still use Jena, Sesame, etc to work with in-memory graphs)
  • [work in progress] A metadata API, defining a common way to retrieve metadata from SPARQL processors
  • [future] Able to support updates and transactions

Spark consists of the following artifacts:

  • spark-api – the Spark API, only interfaces (javadoc) (example)
  • spark-spi – the Spark SPI, classes helpful for building an implementation of the API (javadoc)
  • spark-protocol – an implementation of the Spark API for accessing SPARQL endpoints over HTTP (javadoc)


Sherpa is a high-performance, language-agnostic, binary protocol for SPARQL processor communication. It seeks to mitigate some costs associated with the SPARQL Protocol and provide an alternative. Sherpa uses Apache Avro to define the protocol and provide interop for multiple languages. Avro currently supports languages including Java, C, C++, C#, Ruby, Python, and PHP (this list does not imply Sherpa implementations in those languages).

Sherpa consists of the following artifacts:

  • sherpa-protocol – the protocol definition and generated Java bindings (javadoc)
  • sherpa-java – an implementation of the Spark client API using Sherpa as the protocol (javadoc)
  • sherpa-clojure – a lightweight query api in Clojure using Sherpa and a framework for writing a Sherpa server in Clojure, as well as some utilities for working with Avro data from the Java binding in native Clojure forms

Open source

Both Spark and Sherpa are open source projects hosted on GitHub in the spark project under the Revelytix organization account. They are released under the Apache License, Version 2.0.

Both Spark and Sherpa are being developed for use within the Revelytix product suite, however, we felt that many users of SPARQL processors could benefit from such an API. We welcome participation in defining the Spark API, building implementations of Spark for other SPARQL processors, building implementations of the Sherpa tools in other languages, and more.

If you’re interested in using Spark or Sherpa or working on it, please discuss those ideas on the revelytix-oss Google group.