Pure Danger Tech


navigation
home

Terracotta replacing the database?

07 Dec 2008

There was a fantastic article out this week about databases which also talks at length about Terracotta as an alternative. And there was just this question on Stack Overflow:

Would it be a good idea to use Terracotta as a persistence solution (replacing a database)? I’m specifically wondering about data integrity issues and support for transactional systems.

I took the time to write a lengthy reply so I’m reposting here.

Terracotta is transactional (synchronized blocks form transactions of modified objects) but is not and doesn’t want to be JTA-compliant. There is a fairly lengthy discussion of transactions and some common misconceptions about Terracotta here.

I wrote a blog post about data lifetimes and how that should frame your thinking about identifying opportunities for the use of Terracotta. In short, Terracotta’s sweet spot is the use case where you need persistence and availability (your app could crash but you still need the data) but where the data is not necessarily critical long term.

A canonical example is data important in the context of a user session in a web app, such as shopping cart info. You want to keep that data persistent so that if your web app crashes, you maintain the shopping cart. But the cart itself may or may not ever be purchased. So, you store it in Terracotta till it’s purchased, then save to the database as “system of record” data.

Historically, the data you stored in a database was always “system of record” data that was critical to the long-term success of your business: customers, orders, etc. With today’s “stateless” architectures (which really aren’t stateless), we shove all the medium-term data down to the database. This means we are needlessly punishing our database (with extra work and storage) and our developers (who have to handle the object-relational impedance mismatch, even if using ORM). A better approach is to leave it in objects and cluster it with Terracotta. Some recent Terracotta users have used this technique to reduce their database footprint (saving them millions of dollars) while simultaneously increasing their ability to scale.

There is the question of the integration point with the database and how to make the hand-off reliably. We saw this as a use case in the recently released Examinator (a Spring / Terracotta / Tomcat / MySql reference web application). When exams are in progress, the state (answers to questions, randomized choice orderings, questions marked for review) is stored in Terracotta. But when exams complete, the resulting score is calculated and stored long-term in the database.

To do this safely, we use a Hibernate key strategy that generates the database row id in the object in Terracotta first, then saves the data to the db, then removes from Terracotta. This scenario has a potential race condition if the app crashes after saving to the database but before removing from Terracotta. In that case, the application could try to re-save the data to the db, possibly creating two rows. But due to the pre-generated ID, we can tell whether the row was previously successfully written or not and avoid that issue.

In summary, I don’t think Terracotta will replace your db anytime soon. It’s too new operationally to even be considered as such in most shops. The usage model is way different. There is no query or SQL capability into the heap (your querying capability is defined by your object model). I think it can and is starting to replace the mid-term data usage where it’s a far cheaper and easier alternative. However, some people are starting to experiment with it for long-term storage.