Pure Danger Tech


navigation
home

Hello Terracotta

08 Aug 2007

I’m getting spooled up at Terracotta this week and I thought it would be helpful for others to blog a simple HelloTerracotta program that illustrates some useful points about Terracotta and how it works.

Below is a simple class that illustrates a couple interesting things about Terracotta:

[source:java]

package test;

import java.util.HashSet;

import java.util.Set;

public class HelloTerracotta {

private final Set root;

public HelloTerracotta() {

if (root != null) { // Item #1

System.err.println(“map is already non-null, size ” + root.size());

}

Set newSet = new HashSet();

root = newSet; // Item #2

if (root != newSet) { // Item #3

System.err.println(“root assignment was ignored”);

}

}

private void go() throws InterruptedException {

for (int i = 0; i < 30; i++) { synchronized (root) { // Item #4 root.add(root.size()); System.err.println(“root is now size “ + root.size()); Thread.sleep(500); } } } public static void main(String[] args) throws Exception { new HelloTerracotta().go(); } } [/source] When you run this as a normal unclustered program without Terracotta, it creates a set and adds a bunch of numbers to it:

root is now size 1
root is now size 2
root is now size 3
…etc

Now let’s make the root field shared between VMs. Terracotta does this by using a configuration file in combination with a JVM wrapper (not by modifying the source code). So, we’ll create a configuration file named tc-config.xml to set up sharing and locking:

[source:java]

<con:tc-config xmlns:con=”http://www.terracotta.org/config”>

<server host=”%i” name=”localhost”> 9510 9520 terracotta/server-data terracotta/server-logs </server> terracotta/client-logs test.HelloTerracotta test.HelloTerracotta.root \* test.\*.*(..) write

</con:tc-config>

[/source]

We then need to start up a Terracotta server using a Terracotta distribution (assumed to be in the path):

start-tc-server.sh

and start up a client using the Terracotta client wrapper which works just like your favorite Java executable. Here we’ll assume our project is set up with src, bin, and config directories and we are executing from the root of the project. The one extra parameter we need is to indicate the location of the Terracotta config file.

dso-java.sh -Dtc.config=config/tc-config.xml 
                -classpath bin 
                test.HelloTerracotta

When we do this, we’ll see the same output as before, but there will be an important difference. The root we are adding to is now shared and managed by the server process. Because of this, the objects are managed across clients. So, we can actually run the client again and we’ll see something much different:

map is already non-null, size 30
root assignment was ignored
root is now size 31
root is now size 32
...etc

So that was different! Let’s look back at the code to see what happened. If you look at “Item #1” in the code, you’ll notice that in the second run of the program, root is non-null here before anything has been done to set it. That’s because we declared the test.HelloTerracotta.root field to be a root field in the tc-config.xml configuration file. This makes this field “super-static”, which means that it is managed by the server and changes are shared across all clients and across restarts. So, here we see that not only is the root non-null, but it still contains all the data from our first run of the program!

Similarly in “Item #2”, we note that only the first non-null set of a root field takes effect. So, when the program is run the first time, setting root actually has an effect, but on subsequent executions this line will have no effect. The root has already been set so it is not changed to a new empty set. This also means that in “Item #3”, we see that the set had no effect (note the line in the output). These can be strange things to see if you are not familiar with what Terracotta is doing.

In Terracotta, objects are shared across the cluster without using Java serialization. Instead, the bytecode instructions that modify fields are intercepted so that only field level changes are transmitted across the wire. This has a number of benefits: classes don’t have to be Serializable/Externalizable, object identity is maintained, and performance is greatly enhanced by sending far less data over the wire. Right now, the important thing to know is that classes need to be instrumented for Terracotta to intercept the bytecodes and take the appropriate actions. Due to this, you will see in the tc-config.xml file above that the HelloTerracotta class must be instrumented.

The next interesting thing to look at is what we need to do to modify data in the root. The root is shared across all clients, and because Terracotta preserves all Java semantics with respect to the Java Memory Model, modification to the root requires proper Java synchronization. So we must synchronize on the root object before modifying it. At execution time, locking on the root object will synchronize access to root across all threads in the cluster. That means that if you run two clients at the same time, they will compete for access to root’s monitor, just like two threads in a single VM would. Due to the sleep() in the code, you can see the locking occur if you run two clients. The clients will ping-pong back and forth. (Obviously, you don’t normally want to put sleep() calls in your synchronized blocks!).

We should probably also look back at tc-config.xml to see how the locking behavior was established. To indicate that the synchronized lock on root should be automatically locked like normal Java locks (but cluster-wide), we specified a method expression that will automatically create a “write” lock on shared objects for anything matching that method expression. There are a variety of other locking levels to give higher performance or other behavior in other scenarios.

I hope that was a useful first introduction to illustrate a few key points about Terracotta – shared root fields, shared objects without serialization, and cluster-wide locking. Of course, I’m just scratching the surface of all of these topics (and learning them myself). As I learn more, I’ll share it with you as well!

If you have questions or want more info, please post to the comments and I’ll do my best to answer or find someone who can.

Also, many thanks to Tim Eck for writing the first version of this simple program and walking me through the details!!