Pure Danger Tech


Open source bargain

28 Jul 2010

This blog started as a Twitter conversation between me and @realjenius about Ehcache but I needed a little more room to make my point.

Ehcache is the most widely used open source Java caching library. Terracotta (my former employer) bought Ehcache last year and I was intimately involved in tending it while I worked at Terracotta … just to make my relationship to this clear.

Periodically someone notices that Ehcache pings a Terracotta server when it is instantiated and periodically thereafter and sends it the following information:

  • Operating system name (os.name)
  • Java VM name (java.vm.name)
  • Java version (java.version)
  • Platform (os.arch)
  • Terracotta version (if applicable)
  • Terracotta product name/version (if applicable)
  • Uptime
  • Hash of IP address – used as a fingerprint for correlation

The Terracotta update check server sends back information about whether a newer version of Ehcache exists. If so, a message is displayed to the console.

This of course is no different in operation than what most of the other software on your desktop or most open-source app servers do these days. I’ll admit it’s a little unusual for a non-server library to perform this kind of check but I think that line is pretty gray if you ponder it for a minute.

Generally, people seem upset when they find this out and feel like the library is spying on them or in some way intruding on their application. I’ve listed above the information that is actually being sent and it’s nothing nefarious (being open source, you’re welcome to peruse the code yourself).

The update check is made in a separate background thread – it will time out if there is no response due to network setup and it safely handles conditions where thread creation is not allowed (Google App Engine most notably). You can turn the update checker off either in the Ehcache xml configuration (with updateCheck="true" on the root <ehcache> element), or programmatically if you dynamically create caches, or VM-wide with the system property -Dnet.sf.ehcache.skipUpdateCheck=true.

Terracotta expects and recommends that any production deployment of Ehcache will turn off the update check, just as it would likely turn off the update check in Glassfish, or any other such software.

I think it might be helpful to consider why Terracotta/Ehcache would want such a check in the first place. This information tells Terracotta how Ehcache is being used as a metric of adoption. That information can be fed through the marketing and business sides of Terracotta. Those numbers let Terracotta convince investors that people use the library and consequently get funding to pay the salaries of the world-class team at Terracotta and the machines in the giant perf lab that makes Ehcache awesome.

The information about Ehcache versions and OS/JVM environments tells Terracotta how to place emphasis during QA. If 80% of users run on Linux, then it makes sense to focus testing efforts on that platform. Similarly, if a small but significant number are running JDK 1.5 then that might keep it in the QA and support matrix for longer. Again, this information lets Terracotta put limited financial resources to the most efficient use to make Ehcache awesome.

While these features may initially feel intrusive, I think on some reflection that they are not really doing anything evil or scary, that they are easy to turn off, and that while they provide value to Terracotta, they also provide value to the user, both in version information in the short term and in an awesome product in the long term.

Sometimes I think people underestimate the amount of engineering work that goes into an open source product like Ehcache or Terracotta or Quartz, especially one backed by an actual company. Terracotta as a company employs a team of a couple dozen people who are creating truly world-class products, equal in innovation and quality to any number of commercial, non-open source, non-free products. But to make that work financially, there must be some part of the products that actually provides revenue. Small things like an update check actually make a big difference on the business side of the equation, both in growth and efficiency. I think if you consider it in those terms, you’ll find that the trade-off of information for engineering value is still weighted heavily in favor of the user.

The first suggestion people always make about the update check is why can’t it default to off? The answer to that should be obvious – no one would take extra steps to turn it on and provide that information (even if it is harmless). There is no point in having the update check code if it is not on by default. You are welcome to read that rationalization as evil if you want, but I think that’s naive.

Google shows you ads when you use their free search engine – this trades your attention (and occasional clicks) for a valuable service. Any other Internet “free” service is asking you to participate in a trade of something (often attention to ads or personal information) in exchange for a valuable “free” service. The Ehcache update check is really nothing different – an exchange of information for free use of a great piece of software. I personally see no moral issue with making an exchange of this information for great (free) software – seems like a bargain to me.