Dashboard > Terracotta Public Wiki > Home > lsb > TechnicalFAQ
  Terracotta Public Wiki Log In   View a printable version of the current page.  
  TechnicalFAQ
Added by Sreenivasan Iyer, last edited by ROHIT REJA on Aug 22, 2008  (view change)
Labels: 
(None)

TechnicalFAQ

June 20, 2008—The notes below are based on the experiences of the Field Engineering Team at Terracotta. We are currently also working on integrating this information into other related content on the terracotta.org site.

Contents

The page Header does not exist.

LIVING WITH TERRACOTTA (POST DEPLOYMENT)

UPGRADES:

Can I, and if so, how can I upgrade my cluster with a new Terracotta release but with no downtime?

Yes - you can minimize cluster downtime to almost 0.

  • Subdivide the cluster into 2 (or more).
  • Take 1 Half of L1s out of rotation.
  • Kill the primary Terracotta Server (L2)
  • Remainder of L1s, that are still taking live traffic fail over to the standby Terracotta Server, which assumes ACTIVE state.
  • Upgrade the "out-of-rotation" L1s and L2 to the new version of Terracotta.
  • Restart the "out-of-rotation" Terracotta Server (L2) and the "out-of-rotation" L1s. Of course, if there was a new tc-config.xml, then L1s load them via the usual mechanism from the L2.
  • NOW THINGS DIVERGE:
    • If using Network Active-Passive Terracotta HA , then:
      • Reintroduce the upgraded Terracotta Server back into rotation.
      • It will join as PASSIVE and re-synch its state from the currently ACTIVE Terracotta Server.
      • On completion of the synch:
        • Take the currently Live L1s out-of-rotation
        • Kill the currently ACTIVE Terracotta Server. The currently PASSIVE L2 (i.e. the upgraded one takes over as ACTIVE)
        • Simultaneously introduce the upgraded L1s back into-rotation.
    • If using Disk-Based Terracotta HA, the:
      • Take the 2nd Half of L1s and the standby L2 (which is now the Primary) out of rotation.
      • Simultaneously, introduce the upgraded L1s + L2 back into rotation.
  • Then upgrade the 2nd Half of L1s and the standby-server and introduce the standby-server and L1s back into the cluster. The standby will re-synch its state in case of NAP.
  • NOTE:
    • THIS ASSUMES THAT THE BerkeleyDB DATA FILES STAYED COMPATIBLE BETWEEN THE 2 RELEASES.
    • If it dosen't (under some circumstances) - you will have to blow away the BDB data-files before restarting the upgraded TC Server (or point it to an empty directory location in the tc-config.xml) and the above process does change, in that you start from a fresh slate...
Can I, and if so, how can I accomplish new Application release roll-out with no downtime?
  • Basically yes, since Terracotta does not rely on Java Serialization for replication, there is no worry of running into Serialization Version exceptions.
  • But under some circumstances, rolling upgrades are not possible - See Troubleshooting Guide
How can I upgrade JDK versions across the cluster with no downtime?
  • On L1s:
    • Take the L1 out of rotation
    • As a best practice you have a soft-link to the right JDK version. So install the JDK and recreate the soft-link
    • Recreate the Terracotta boot-jar (Physically verify to see if the boot-jar file created has the right time-stamp etc.)
    • Restart the L1 and restore it back into rotation.
    • Based on what is being shared, there may be some restrictions in terms of compatibility of multiple JDK versions within the same cluster.
  • On L2s:
    • Take the Primary Terracotta Server out of rotation
    • Clients fail over to the Secondary Terracotta Server.
    • Upgrade the JDK by modifying the soft-link.
    • No need to recreate Boot-Jar on the Terracotta-Server (It is only needed on the client JVMs).
How can I install an OS Patch (or UPGRADE HARDWARE), with no downtime?
How do I upgrade tc-config.xml across the cluster?
  • Modify tc-config.xml.
  • Kill Primary Tc-server AND Take it out of rotation - Clients Fail over to Standby TC-Server.
  • Restart the Primary TC-server with the new config.
  • Take a set of L1s out of Rotation and Restart them - They will now connect to the Primary Tc-Server to get their configuration information
  • Take the remainder of the L1s and the Secondary-L2 out of rotation.
  • Simultaneously, put the first set of servers, back into rotation
  • Restart the Secondary-L2 with the new tc-config.xml and introduce it into the cluster.
  • Restart the second-set of L1s and re-introduce them into the cluster.
Can I add additional Terracotta Servers into the an existing cluster?
  • Of course, unless you are using Partitioned-EHCache and you are an Enterprise Subscription Customer, you can have only a single Active Terracotta Server. But you could have multiple Standby Terracotta Servers.
  • So, if you lose an existing passive, you could reintroduce another box with the IP of the Lost Passive.
  • If however, you cannot preserve the IP of the lost passive, or you wish to add additional Terracotta-servers, then you do need to modify tc-config.xml and follow the above process described around the tc-config.xml upgrade.

BACKUP/OTHERS:

How do I backup the state persisted on Terracotta Server's Disk?

Look at the attachment entitled BerkelyDbBackup.zip - it contains code, a script file and a readMe file which contains the instructions to run the script. Here is a brief overview:

  • This utility opens the Environment in the Read-Only mode, while the TC Server is running with opened Environment in Read-Write mode.
  • This utility gets the snap shot of all the log files the moment it attaches to the Environment and copies those files into the backup location.
  • As these files are frozen by this Read-Only process, the Cleaner process cant delete any files, that way there wont be any inconsistency in the data. It is designed for taking a full backup.
  • Note that this does not backup the tc-config.xml

TROUBLE-SHOOTING: See the Troubleshooting Guide. In addition, see below for some common occurrences (relatively speaking) and associated recovery

My Application appears deadlocked - how do I recover and determine what is going on?
  • Are you sure it is deadlocked or painfully slow?
  • To determine, take a series of thread dumps 5s apart for 1-2 minutes. If the relevant threads' stack-trace appear stationary, then there is a high likelihood that you have a deadlock (either in application code or in Terracotta).
  • To ANALYZE: Based on the thread-dump and the Lock Profiler (Admin Console), determine where, if in application-code, this might be occurring. Also code-review to ensure that locks are being obtained in the same sequence to eliminate possibility of application deadlock. Also consider tools to help deadlock detection.
  • TO RECOVER: Kill 1 L1 at a time - in the luckiest case, the very first L1 restart relieves the cluster of the deadlock and in the worst case the last L1 restart relieves it of the deadlock.
Client JVM threw an Out of Memory Exception - can you enumerate the usual suspects?
  • Heap allocated is too low - i.e. is Xms, Xmx too low. EXAMINE JAVA-STARTUP OPTIONS.
  • A non-clustered portion of your application increased its memory consumption significantly.
  • The rate of inserts into the clustered data-structures, exceeded the rate at which the Virtual Memory Manager was able to evict. TUNE VIRTUAL MEMORY MANAGER, in this case.
  • The keys in the partial associative array (Hashmap, Hashtable, ConcurrentHashMap) are all faulted in when the partial data-structure is referenced. If the cumulative size of keys exceeds VM Heap allocations, then an OOME can result. In such a case, (rare e.g. 1/2 Billion Keys in the Map), PARTITION DATA-STRUCTURE or make a composite data-structure (e.g. Map of Maps) - so that VMM can page out sub-maps.
  • You are using a non-partial data-structure with literals hanging off the very first level of the value-object - in which case VMM cannot evict and reclaim space associated with the Literals. (e.g.
    myLinkedBlockingQueue.put(Foo); // and Foo has int x, String y

    Both Foo.x and Foo.y are faulted in when myLinkedBlockingQueue is referenced. You are fine though if Foo is wrapped in a FooWrapper and

    myLinkedBlockingQueue.put(FooWrapper));
  • Terracotta does not support "partial" Terracotta Transactions - so you could OOME the L1 with a huge "Terracotta Transaction" - batch into suitable chunks (e.g. Batch into groups of 10,000 the code below, instead of trying to insert all Integer.MAX_VALUE objects in 1 shot.
    synchronized (myHashMap) { 
       for (int i=0; i < Integer.MAX_VALUE; i++) 
          myHashMap.put(key[i], value[i]);
       }
    }
Server JVM threw an Out Of Memory Exception - can you enumerate the usual suspects?
  • See above - similar arguments apply.
I believe I have run into a Terracotta Bug - How do I report an issue?
  • See if TroubleShooting Guide and this FAQ help you recover.
  • Artifacts Typically Needed to investigate complex issues (Deadlocks, Performance Problems etc.):
    • Problem Summary
    • Problem Description
    • Terracotta Client Logs
    • Stdout Logs of client JVM.
    • Terracotta Server Logs
    • Thread Dumps on L2 and atleast the offending L1 (if not all L1s) - 1 set every 15s, at least 4 times.
    • Output of SVT Run (i.e. record for 2 minutes and stop recording)
    • Output of Lock Profiling Session.
    • Admin Console Tabs (Classes, DGC, Roots)
    • Data file (in some cases, assuming it is of manageable size)
  • If OSS Customer:
    • Post on Forums on Mailing List with above Artifacts.
  • If Enterprise-Subscription Customer:
    • Supply artifacts via Web or email for P3, P4.
    • Call Support Hotline for P1, P2 (and email artifacts needed).
When do I have to restart client JVMs (outside of planned upgrades to APPLICATION / TERRACOTTA/ JDK/ OS/ HARDWARE, that is)?
  • When the client JVM gets disconnected off the cluster. i.e. it took longer than l2.l1reconnect.timeout.millis (see tc.properties) to reestablish its connectivity to the Terracotta server or attempted to connect with the same connection_id after after client-reconnect-window
  • You can grep the log for specific exceptions or BETTER STILL, trap the JMX Node Disconnected event and take appropriate action, PROGRAMMATICALLY (see http://www.terracotta.org/confluence/display/docs1/JMX+Guide)

DEPLOYING TERRACOTTA

HIGH AVAILABILITY:

How do I make sure that there is no Single Point of Failure in my architecture?
    • You need to deploy one of Terracotta certified configurations.
    • See http://www.terracotta.org/confluence/display/docs1/Network+Active-Passive+Deployment+Guide for Networked Active/Passive HA. Setting up your network as specified in the document, would imply that there is no Single Point of Failure in your architecture.
    • Other option is Shared-Disk-Based Active/Passive (you still need to ensure that there is no SPoF in your network architecture):
      • Block Storage (accessed via Fibre-Channel or iSCSI)
        • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
        • Use Redhat Cluster Services to facilitate failover.
        • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
        • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
      • Network File System (not certified on 2.6.x and beyond, given that there is a known failure not covered in this case) - See Document Terracotta_SharedDisk_HA_withNFS.pdf
        • NFS mount point as shared disk
        • NAS appliance as shared disk
What are the advantages/disadvantages of Networked-HA over Shared-disk-based-HA?
  • So many advantages, that Networked HA is the RECOMMENDED HA solution (unless there is a very compelling reason in your use-case or IT environment to use disk-based HA).
  • Hot-Hot Standby (i.e. no dip in TPS on failover) whereas a Shared Disk based solution is a Hot-Cold Standby (i.e. expect a dip in TPS on failover and eventual recovery).
  • HA in a box (no need for any special VCS or RedHat Clustering setup or a NAS device etc.)
  • Can use local Disk on each Terracotta server and cheaper in terms of disk setup (no virtualized storage needed, no HBA cards, no FibreChannel etc.)
  • Fail-over can be tuned to be sub-10seconds via modifying tc.properties.
  • 2 physical copies of the Data being persisted to disk.
  • etc.
Understand that Network-Active-Passive is recommended. Even so, can you describe how I achieve Terracotta Server Failover via Shared-Disk-based Active/Passive setup?
  • Block storage (accessed via Fibre-Channel or iSCSI)
    • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
      • Define IP, Volume, Application (Terracotta Server) in a group [IP is not strictly needed, since tc-config allows you to specify multiple IPs - for the active TC Server and standby TC Servers.]
      • VCS will monitor these resources and in case of any failure, will call shutdown services on the node and failover IP, remount volume and start TCServer on the standby node. You need to supply shutdown/startup/monitoring Scripts.
    • Use Redhat Cluster Services to facilitate failover.
    • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
    • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
  • Network File Storage (not recommended anymore)
    • NFS Mount
    • NAS Device (which basically offers you a turn-key HA NFS mount)

INFRASTRUCTURE-PROVISIONING:

How do I size my Production Terracotta Servers?

Consider the following factors:

  • Wrt Memory Sizing:
    • What is the size of the clustered object graph?
      • Does it fit in Heap or does it far exceed size of available RAM.
    • Do Application Latency SLAs allow for occasional faulting from Disk?
      • If No, then, in such a case, there is a requirement that the entire Data structure fit in JAVA-HEAP (or in Machine RAM). When HEAP Xmx gets too large, a pauseful Full-GC can wreak havoc in terms of cluster TPS. We "typically" do not recommend a heap larger than 6-8G, although one still needs to provision a box with a large amount of RAM so BDB (BerkeleyDB, the database that runs in-process with Terracotta) blocks are cached by the OS.
  • Wrt CPU Sizing:
    • Is the machine dedicated to the Terracotta Server Process or does it run other processes.
    • Does the application create a fair amount of garbage - If yes, then one can expect a fair amount of GC and DGC activity (and hence higher CPU needs).
    • How many Writes/Reads against the Terracotta Server.
    • If there is a fair bit of clustered I/O or extensive GC/DGC, then a 4-core box, at the very least, would be a requirement.
  • Other Considerations:
    • Dual NIC'ed boxes for L1 and L2 are required to avoid SpoFs (Single Points of Failure).
    • Disk/Network - see next set of issues below.
Can I deploy multiple active Terracotta servers - for scale and HA?
    • Typically Terracotta supports only a Single Active Server and multiple Standbys.
    • However we do support a TIM (Terracotta Integration Module) - named Partitioned-EHCache (only available to Enterprise Subscription Customers), which transparently stripes an EHCache Instance across multiple Active Terracotta Servers. One can thus scale out the Terracotta Server Tier as well and support a much larger TPS requirement, until other factors (such as partial key limitations constrain). See details attached as partitioned-EHCache-readme.txt
    • Note that using this mechanism, no matter what associative array represents the application interface (e.g. Hashmap, Hashtable, ConcurrentHashmap etc.) - one can, with an extra level of indirection, point to a Cache-Entry in an EHCache instance, which can then be striped across multiple Active Terracotta Servers.
    • Some restrictions apply: Assumed that the EHCache key, values in each Terracotta Server are completely contained in a given silo. i.e. there is no overflow of object graphs allowed from one TC server to another (if so, a runtime exception is thrown on the client during object-graph traversal).
Please enumerate disk storage choices with Terracotta.
  • Write to local Disk (fast, cheap)
  • Write to shared Storage (BLOCK STORAGE) - (Separate LUNs in case of Network-Active/Passive, Same LUN in case of Disk-Based-Active/Passive) e.g.
    • Block Storage via HBA talking Fibre-Channel (fast, expensive, centralized storage implies greater manageability).
    • Block storage via iSCSI (slower, cheaper)
  • Write to shared storage (NETWORK FILE SYSTEM) - (Not recommended, anymore).
    • NFS Mount
    • NAS Device.
  • Disk RAID Levels - at least Mirrored, although Striped and Mirrored is ideal.
Please enumerate Network Provisioning Considerations.
  • Gigabit full duplex between L1s and L2 (for any sizeable use case). There might be an occasional use-cases, where 10/100 might suffice.
  • Same thing between L2s, if it is going through the same switches. If it is a direct trunc (i.e. route does not pass through switch) - then even 10/100 should be ok. Latency, not bandwidth, is of utmost importance between L2s, as we do not transfer a lot of data for election, heartbeating etc. Although, in case of failures, there is a DB-Sync operation that is bandwidth intensive.
  • Make sure that the network interfaces are configured properly in /etc/host.conf (or similar) for utilizing full capacity and full duplex. In many deployments, we have seen incorrect setups (i.e. setup for 10/100).
  • Redundant NICs, redundant Switches with automatic failover (via VRRP, HSRP). See NAP diagram.
  • Configure the NIC failover time on the OS level appropriately, VRRP failover time appropriately - so your overall failover-time SLA is met.
  • With dual NICs - make sure they are in failover mode.
  • With dual redundant NICs, make sure that each mirrored pair is set up for failover.

MONITORING:

What should I monitor when I deploy a Terracotta Integration into Production?
    • See the attached document entitled : "Terracotta_Operations_Runbook.doc"

TESTING WITH TERRACOTTA

For some general do's/don't see
http://dsoguy.blogspot.com/2007/06/why-your-distributed-tests-are-lying-to.html
http://dsoguy.blogspot.com/2007/07/distributed-performance-testing-anti.html
http://dsoguy.blogspot.com/2007/07/more-lies-distributed-performance.html and
http://dsoguy.blogspot.com/2007/07/what-were-those-results-again.html

How does my functional testing change in the presence of Terracotta?

It dosen't in the sense that BlackBox and WhiteBox testing should proceed as normal. We however require that you ensure maximal code-coverage, so that there are no situations untested where an object reference may join the clustered object graph and then throw a runtime UnlockedSharedException or a NonPortableExcepetion.

How should I performance/stress test with Terracotta. What metrics am I looking for?

  • Measure overall Throughput and Latency of at least a basket of transactions
  • Note to run the test in the presence of extraneous usage such as Admin Console, Occasional SVT recording session and Monitoring infrastrucutre (e.g. JMX Ping etc.)
  • You probably need to monitor class creation, locks, DGC activity and CPU/Memory/Disk/Network on L1s and L2s. Most of these are available via the Terracotta Admin Console. You can augment via nmon and similar tools - see http://www.terracotta.org/confluence/display/wiki/Testing+Distributed+Software

How do I execute on Availability Testing in QA/ Stage?

  • You may or may not want to repeat the infrastructure failure tests, especially if you are mirroring the certified config.

Have clients on the WAN and the LAN - can they both connect to the same terracotta servers?

  • Yes they can
  • However, a WAN client may not relinquish Locks in a timely fashion as compared to a LAN client (depending on what kind of WAN latencies you experience) - therefore LAN clients get awarded more latencies and/or contention.
  • Simiarly, a WAN client may not have such a reliable connection to the Terracotta server. The Terracotta server allows a L1 (client) to reconnect within a certain window (You specify that in tc.properties: Defaults ==> l1.reconnect.enabled=false; l1.reconnect.timeout.millis=5000) after the persistent TCP/IP connection between itself and the L1 is interrupted. Therefore if the WAN connectivity is interrupted for 4s - the remote WAN client may still connect back to the TCServer without issue, but then resources held by the WAN client are blocked across the cluster for 4s.
  • Therefore, allowing WAN clients connectivity ought to be treated with caution.
  • For ReadOnly cases, that allow a bit of asynchronicity - you might be able to clone the state and have WAN clients pick up a copy of state on a periodic basis off a shared datastructure (e.g. a Queue).

TESTING TOOLS:

How can I test WAN clients?
  • Terracotta distribution bundles a primitive proxy+LoadBalancer in the product.
  • Look at com.net.tc.proxy.TCPProxy.proxy in tc.jar (or on SVN if you want to look at the source).
  • Usage is something like this:
    • cd to Terracotta-install/lib
    • host-siyer$ java -classpath tc.jar com.tc.net.proxy.TCPProxy
    • usage: TCPProxy <listen port> <endpoint[,endpoint...]> [delay]
      • <listen port> - The port the proxy should listen on
      • <endpoint> - Comma separated list of 1 or more <host>:<port> pairs to round robin requests to
      • [delay] - Millisecond delay between network data (optional,default: 0)
    • host-siyer$ java -classpath tc.jar com.tc.net.proxy.TCPProxy 9000 <tc-server-host>:9510 500
      • Thu Sep 13 11:04:53 PDT 2007: Starting listener on port 9000, proxying to [server:9510] with 500ms delay
    • So now you point the tc-config server element to the host and port where TCPProxy is running.
    • proxy> help
      • h - this help message
      • s - print proxy status
      • d <num> - adjust the delay time to <num> milliseconds
      • c - close all active connections
      • l - toggle debug logging
      • q - quit (shutdown proxy)
    • You can change the WAN delay at will (e.g. to 7s) after the proxy is running, with something like below.
      • proxy> d 7000
      • proxy> quit
Do you have an automated framework for Testing?
  • Droid is a scripted distributed testing framework.
    • Droid is a minimal framework for starting up, configuring, synchronizing, and collecting statistics about a test run on multiple machines. There are two concepts - the agent, and the worker (both just Java programs running in a JVM). The activities of both are scripted using Groovy, which is a scripting language that runs on the JVM and has many syntax similarities to Java.
    • An agent must be started on every machine that participates in the test. The agent runs a script that can coordinate with other agents (via Terracotta of course) and can choose how and when to start workers. The worker also starts with a Groovy script and performs the actual work of the test. The agent/worker split is necessary to allow things like a test that starts up a cluster, then later adds or removes nodes of the cluster. In that case, the agents run throughout the test but workers may be started or stopped according to the agent's script.
  • Cachetest are the set of tests which use droid as a framework to run across JVMs.
    • To run the cache tests, you will need to grab and build both projects. At runtime, you will be running the droid framework, using wrapper scripts from cache test to properly invoke it using the cache testing code. You can download the tests from the following location http://svn.terracotta.org/svn/forge/projects/cachetest
    • You will need to modify agent.sh according to your environment. The cache tests are designed for distributed testing so you should be testing on multiple machines, however you can run everything on a single machine while in development.
    • The agents use a clustered CyclicBarrier to wait for each other, so the tests don't have to be started at exactly the right time or anything. They will start when the number of agents as specified in the AGENTS variable in agent.sh have arrived at the barrier.

TUNING TERRACOTTA

What are the main components of tuning Terracotta.

For Scale and Latency, one needs to typically look at:

  • Instrumentation Scope
  • Distributed Locking
    • Lock Type
    • Lock Striping
    • Lock Scoping/Batching
    • Injection of Synchronization within your application.
    • Be aware of Terracotta Limitations.
  • Memory Management
    • Garbage Collection
      • on Client JVM
      • on Terracotta Server JVM
    • Virtual Memory Manager
      • on client JVM
      • on Terracotta Server JVM
    • Distributed Garbage Collection on Terracotta Server JVM
  • Application Code
    • Choice of Data Structures that offer more concurrency and play well with the Terracotta Virtual Memory Manager.
    • Synchronizing on the right objects
  • Thrash/Faulting Behavior
    • Fault Count in tc-config.xml
    • Fault Threads on the L2.
    • Locality of Reference
  • Transaction Batching
    • l1.transactionmanager properties in tc.properties.
    • Commit Threads on the L2.

For HA, one needs to look at:

  • Choice of HA strategy
  • Infrastructure Design
  • L1 <-> L2 HeartBeating
  • L2 <-> L2 HeartBeating

How do I tune Instrumentation Scope?

So how do I tune distributed Locks.

  • Lock Type.
    • In order of most pessimistic to optimistic, Lock Types supported by Terracotta are:
      • Synchronous Write
      • Write
      • Read
      • Concurrent
    • So make sure that we are using the right lock-type. e.g.
      • if a method is getXXX () and is only reading shared state, make sure it is not marked in tc-config.xml as lock-type of write.
      • Synchronous-write gives you greater guarantees, but comes at a high cost in terms of scale/latency - so use with caution.
      • Concurrent locks only demarcate transaction boundaries - so again be sure that your application is fine with broken semantic correctness
  • Lock Striping:
    • If you notice too much contention on a lock, you may be able to work around it via striping the lock.
    • The most popular example is when executing Hashmap.put and Hashmap.remove , you presumably are locking on the whole Hashmap. So by replacing a Hashmap with a ConcurrentHashMap (which by default support 16 buckets - and can be overridden via passing the concurrency-level as a constructor parameter) - you stripe the single Hashmap lock 16 ways. SEE http://unserializableone.blogspot.com/2007/04/performance-comparision-between.html
  • Lock Scoping/Batching:
    • Batching implies executing a bunch of state mutations in the context of a single distributed lock as against multiple-distributed locks. It usually is a trade-ff between concurrency (the more fine grained the locking, the more concurrent the application) and latency (the more coarse grained the lock, the more mutatations can be "batched" as a transaction reducing overheads around lock acquisistion and "Terracotta Transaction" committing).
    • As an example consider for (i=0; i< 1000; i++) { synchronized(foo) { hashmap.put(i, new Object()); }}. Executing the whole for loop in a single transaction (i.e. synchronizing outside the for-loop) is better than synchronizing within the for-loop.
  • Terracotta limitations:
    • Of course, one needs to be careful of not creating too huge of a "Terracotta Transaction" since as of now, Terracotta does not support transparent partial- transaction-commits or transaction-streaming, hence you could OOME the L1 with a giant "Terracotta-transaction".
    • Also be careful of what data-structures, since Terracotta is a little inconsistent today - e.g. ConcurrentHashMaps are implicitly locked, but Vectors/Hashtables and other thread-safe Collection implementations are not explicitly locked.
    • Volatiles are not supported.
  • Injection of Synchronization into the Application:
    • Use ReentrantReadWriteLocks.readLocks and ReentrantReadWriteLocks.writeLocks instead of synchronization.
    • Use a ThreadSafe DataStructure that Terracotta implicitly locks (e.g. ConcurrentHashMap) or a data structure that Terracotta provides a TIM for (e.g. Hashtable) - if you want locking at the datastructure level (as against higher in the application)
    • <auto-synchronized=true> in the lock-definition section of the tc-config.xml - implies that Terracotta will inject a ReentrantReadWriteLock.readLock and .writeLock based on your lock-type definition.
  • Others:
    • You will essentially need to analyze your code to identify what level of locking coarse-ness works best.
    • Please use the LOCK-PROFILER (admin Console) to gain visibility into the distributed locking behavior of your integrated application. See the Lock Profiler Guide at for details.

MEMORY-MANAGEMENT:

Can I get details of Garbage Collection Tuning with Terracotta.
  • It is no different than GC tuning on any standard JVM. On both client-JVM and TerracottaServer-JVM, the methodology is the same.
    • Profile GC characteristics - several tools available.
      • jstat [(e.g. $JAVA_HOME/bin/jstat -gcutil -h 10 -t <pid> 1s) will print GC stats every 1s]
      • Visual-GC/ JConsole / Visual-VM etc.
      • Parameters passed to java startup will result in more GC output in logs.
        • -verbose:gc
        • -XX:+PrintGCDetails
        • -XX:+PrintGCTimeStamps
        • -XX:+HeapDumpOnOutOfMemory (to get a heap-dump when JVM encounters Out Of Memory Exception)
    • Characterise the observations.
      • Are there frequent full GCs?
      • Are some of the full GC pauses unacceptably long (> double-digit seconds for example)?
      • Are Eden and Old appropriately sized - or is Old predominantly empty, while Eden is constantly getting filled up?
    • Improvements to GC characteristics typcially entail:
      • Heap Sizing (e.g. bump -Xms, -Xmx from 256m to 2048m)
      • Ratio of New/Young/Eden to Old/Tenured - [e.g. Modify via -XX:NewRatio (e.g. NewRatio=2, means Young is 1/3rd the Heap and Old is 2/3rd the Heap) or specify New Sizes directly - -XX:NewSize=512m -XX:MaxNewSize=512m implies size of Eden is 512m).
      • Choice of Collector (e.g. use -XX:+UseConcMarkSweepGC for the ConcurrentCollector. You could use parallel for Young via -XX:+UseParNewGC -XX:ParallelThreads=2, assuming you have atleast 2 cores on the box).
      • PermSize calibration (e.g. -XX:PermSize=256m -XX:MaxPermSize=256m)
    • For a more detailed article, that discusses GC tuning, see http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html
How do I tune Virtual Memory Manager on the Client JVM and Terracotta Server JVM
  • What is Virtual Memory - see http://www.terracotta.org/confluence/display/docs1/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-VirtualHeap and http://javamuse.blogspot.com/2007/10/why-is-rat-better-mouse-trap.html
  • It is also important to understand that you have to choose data-structures that play well with the Virtual Memory Manager implementation. e.g. know the difference between Partial Collections and Non-Partial Collections. Partial Collections are currently only implemented for HashMap, Hashtable, ConcurrentHashMap, LinkedBlockingQueue, Arrays).
  • Typical parameters to tune on the client JVM are (following makes the VMM run more aggressively).
    • l1.cachemanager.threshold (Decrease it - e.g. to 50 from default)
    • l2.cachemanager.criticalThreshold (Decrease it - e.g. to 50 from default)
    • l1.cachemanager.percentageToEvict (Increase it - e.g. to 20 from default)
    • l1.cachemanager.enabled, l1.cachemanager.logging.enabled (i.e. is Virtual Memory manager enabled at all and can I log its activity)
  • Typical parameters to tune on the Terracotta server are
    • l2.cachemanager.threshold
    • l2.cachemanager.criticalThreshold
    • l2.cachemanager.percentageToEvict
    • l2.cachemanager.enabled, l2.cachemanager.logging.enabled. (i.e. is Virtual Memory manager enabled at all and can I log its activity).
What is DGC and why should I tune it. And if I need to, how should I tune DGC.
  • How do you know that you have to tune DGC.
    • If left untuned, cluster wide garbage will remain in the distributed system for longer than needed. This will manifest itself as increases in the Live Managed Object Count (on the DGC Tab in the Admin Console) which will result in performance/scale degradation. It will also manifest itself as increased disk consumption on the Terracotta Server. Hence it is an important step in the Terracotta integration SDLC.
  • To tune DGC, you need to tune:
    • DGC Interval
      • In the tc-config.xml, you could tune DGC to run more frequently than the default 60 minutes.
      • We have seen applications (those with high churn - e.g. caches with low TTLs, VOIP Sessions) that need DGC to run as frequently as every 3 minutes
    • GC on the L1
      • For the Terracotta Server to consider a reference as garbage, no L1 in the cluster can have a reference to the object. To be sure of it, the reference has to get GC'ed on the L1.
      • To do so in a timely fashion, you have to tune OccupanyFraction. i.e. set XX:CMSInitiatingOccupancyFraction in the Java Options. (e.g. If you set -XX:UseCMSInitiatingOccupancyOnly=true and -XX:CMSInitialtingOccupanyFraction=50, then GCs will happen when Old gets to 50%. Otherwise, the threshold at which is collected is calculated by the formula ==> intiatingOccupancy = 100-MinHeapFreeRatio + MinHeapFreeRatio * (CMSTriggerRatio/100)). We have tuned it to as low as 10% - i.e. CMSIntitatingOccupancyFraction=10, to ensure that when a reference becomes garbage, the L2 is aware of it very soon.
    • BerkeleyDB
      • Sometimes, DGC is fine in terms of keeping up with Garbage churn, but BerkeleyDB Cleaner can't keep up. Hence, there may be a need to tune je.properties or tc.properties.
      • je.properties
        • l2.berkeleydb.je.cleaner.bytesInterval=100000000 (decrease - e.g. 20Million) - more aggressive cleaning.
        • l2.berkeleydb.je.checkpointer.bytesInterval=100000000 (decrease it e.g. ~ 20MB) - this forces more frequent checkpoints.
        • l2.berkeleydb.je.cleaner.lookAheadCacheSize=32768 (increase it - e.g. 65536) - lookahead cachesize for cleaning. This will reduce #Btree lookups.
        • l2.berkeleydb.je.cleaner.minAge=5 (decrease it - e.g. 1) - files get considered for cleaning sooner.
        • l2.berkeleydb.je.cleaner.maxBatchFiles=100 (e.g. set to 100) - Upper bounds the cleaner's backlog
        • l2.berkeleydb.je.cleaner.rmwFix=true disable this e.g. false) -
        • l2.berkeleydb.je.cleaner.threads=4 (increase this e.g. 8) - more threads cleaning.
      • tc.properties
        • l2.objectmanager.deleteBatchSize = 5000 (increase it - e.g. 40000) - more batched deletes
        • l2.objectmanager.loadObjectID.checkpoint.maxlimit = 1000 (increase it - e.g. 4Million) - product will default to this value in the next release.

THRASH:

How do I tune Faulting Behavior
What if I see many cache-misses on the Admin Console?
  • In addition, if the bottleneck is on the L2 (You see this as Cache-Misses on the Admin Console) - l2.seda.faultstage.threads=4 (increase this - e.g. 8)
What is the impact of request distribution?
  • Performance will get impacted if here is no locality of reference - hence necessiating round trips to the Terracotta Server to fault data in.
  • Good locality of reference is needed (e.g. sticky-load-balancing for sessions) to minimize thrash.
  • If data does not fit in a given JVM, consider partitioning with a router with intelligence to route to the right partition,
I see fault and flush rate as almost identical on the Terracotta Admin Console. How do I fix it?
  • Flush means that pre-fetched objects that were pulled in as raw-DNA from the Terracotta server are not being requested (which is when they get converted from raw-DNA to Object references) - in such a case, we "flush" those DNAs, after a certain time-out, to reclaim memory on the L1.
  • In such a case you probably need to tune the fault-count (reduce it) and l1.objectmanager.remote.maxDNALRUSize, in tc.properties - (increase it e.g. to 120 from default of 60)

IMPACT OF PAYLOAD:

Write-Heavy Payload - The bottleneck is in committing "Terracotta transactions" to the Terracotta server. What can I do?
  • Terracotta Transactions are written to a commit-buffer on the L1 from where a Terracotta thread writes to the Terracotta server and waits for an ACK from the server.
  • So one could increase the size of this commit buffer and "batch" Transaction writes to the Terracotta Server. One can do so via the following tc.properties:
    • l1.transactionmanager.logging.enabled=false (enable it e.g. true - to get more details)
    • l1.transactionmanager.maxOutstandingBatchSize=4 (increase it e.g. 8)
    • l1.transactionmanager.maxBatchSizeInKiloBytes=128 (increase it e.g. 256)
    • l1.transactionmanager.maxPendingBatches=88 (increase it e.g. 176)
  • If the bottleneck is on the L2 committing transactions, consider:
    • l2.seda.commitstage.threads=4 (increase it e.g. 8)

FAILOVER:

How do I tune Network Active/Passive for faster failover from Active to Passive.

By default with Network-Active-Passive, failover will occur within 45s. You will have to tune the following properties to get it lower (we have seen sub 5s in certain cases)

  • l2.healthcheck.l2.ping.enabled = true
  • l2.healthcheck.l2.ping.idletime = 5000
  • l2.healthcheck.l2.ping.interval = 1000
  • l2.healthcheck.l2.ping.probes = 3
  • l2.healthcheck.l2.socketConnect = true
  • l2.healthcheck.l2.socketConnectTimeout = 2
  • l2.healthcheck.l2.socketConnectCount = 10

TERRACOTTA INTEGRATION MODULES (TIMs) Tuning:

EHCache - Am using the EHCache TIM - but performance is poor.
  • Tune following tc.properties
    • Read and Write Concurrency: ehcache.concurrency = 1 (Increase it - e.g. to a prime number like 47)
    • Eviction properties: ehcache.global.eviction* properties
    • Lock Levels: ehcache.lock.readLevel = READ & ehcache.lock.writeLevel = WRITE to something more permissive if the application allows it.

DEVELOPING WITH TERRACOTTA

DISTRIBUTED CACHE:

What are my choices with regards to Terracotta-clustered distributed caches ?

Terracotta provides:

Do you have a replacement for JBossCache?

Yes,

How do I implement a reaper for my cache?
  • TTL Based Mechanism:
    • You have to set a TTL and update _ExpirationTime for your cache-entry when it is created.
    • Periodically run a reaper thread that compares _ExpirationTime to CurrentTime and removes cache-entries if needed (or annotates them as expired)
I understand that for semantic correctness, I need to read with a READ Lock. But what if I am ok with Read-committed - when will references on any JVM become eventually correct.

Just as in the JVM Memory Model, you get no guarantees as to when Thread Memory gets flushed to Main Memory, if the 2nd thread dosen't request the same monitor - similarly, there are no guarantees (although it could range from sub-seconds to few seconds). Some applications could allow you to read without a Read-Lock (some cost savings in that no lock need be acquired), but you may read state committed prior or partially - given that ReentrantReadWriteLocks.readLocks can be doled out multiple concurrent readers, it seems to us that it is almost better to ALWAYS READ with a READ LOCK since the cost of acquiring one is significantly lower.

CLASSLOADERS:

Does Terracotta support OSGI?

Not out of the box. But one can get around it by naming/registering classloaders. We have examples of Eclipse RCP applications (based on the Equinox implementation of the OSGI framework) being clustered via this technique with Terracotta. See http://forums.terracotta.org/forums/posts/list/15/131.page

When and why do I need to explicitly name and register ClassLoaders?

Object identity in Terracotta is the name of the ClassLoader + the fully qualified field reference. Hence the need to name/register classloaders, as of now to ensure that identity gets preserved when necessary in multiple-classloader environments.

How to share a root between web appplication and java application

Users can use "-Dcom.tc.loader.system.name=Any.class.loader.name" while starting their standalone java app to give a name to the standard class loader so that shared roots can have the same loader name between the java app and the webapp.

For Tomcat - -Dcom.tc.loader.system.name=Tomcat.Catalina:localhost:/webapp_name

If the classes are loaded through ExtensionClassLoader (java/ext) please use "-Dcom.tc.loader.ext.name=classloader.name".

MISCELLANEOUS:

What are semantics associated with the Concurrent Lock ?

A Concurrent Lock in Terracotta defines a transaction boundary but gives no visibility or correctness guarantees.
There is no conversation with the Terracotta server - but the "Terracotta transaction" begins when the concurrent lock is acquired and ends when the concurrent lock is released and all mutations within are batched and committed to the Terracotta server. So with a concurrent lock, 2 threads could read or write to a shared object graph concurrently (i.e no correctness guarantees). Defining no lock implies GETS/READS might work but there is no guarantee of correctness (in that one might be able to get stale or partially committed data) - but then you would not be able to write to it. With Concurrent Lock in place, GET/READS/WRITES can all work with no guarantees around correctness. (USAGE NOT RECOMMENDED)...

Does Terracotta support annotations?

Yes - see http://www.terracotta.org/confluence/display/labs/Annotations

How do I share large DOM tree with terracotta?

As a DOM poses certain performance challenges when stored as a Terracotta DSO it is not advisable to
share DOM directly as DSO.

Here I have attached a sample that illustrated of how to share the xml associated with the Document and then
propagate the changes to other nodes via this xml.

To avoid recreating the entire Document everytime it is accessed from the backed xml, a version of the xml and the Document is kept which is compared to check if the Document corresponds to the xml or is stale. In the latter case Document is recreated from the xml and the Document version is updated. Note that Both Document and documentVersion are transient (with respect to TC).

Anytime a Document is modified in any node the xml is updated as well as the xml version is incremented. While accessing the Document on the other node, the Document version is compared with the xml version.

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.ElementHandler;

/**
 * Wrapper over Document to incorporate clustering of document 
 * with Terracotta in the form of xmlString.
 *
 */
public class XMLDocument {

	/** Underlying document */
	private transient Document xmlDocument;
	/** Version of dom  */
	private transient long domVersion;
	/** XML String (clustered) */
	private String xmlDocString;
	/** Version of xml string (clustered)*/
	private long xmlVersion;

	public XMLDocument(String xml) throws DocumentException {
		createDocumentFromXml(xml);
	}

	public XMLDocument(Document tabletemplateDoc) {
		this.xmlDocument = tabletemplateDoc;
		updateXmlString();
	}

	public Element getRootElement() {
		return this.getXmlDocument().getRootElement();
	}

	public void setXmlDocument(Document xmlDoc) {
		this.xmlDocument = xmlDoc;
		updateXmlString();
	}

	/** This method is called when a dom is modified outside the scope of this class 
	 * to update the xmlString associated with the dom.
	 */
	public synchronized void updateXmlString() {
		this.xmlDocString = this.xmlDocument.asXML();
		this.domVersion = ++this.xmlVersion;
	}

	/**
	 * Creates the DOM from the xml provided and increments and updates the version.
	 * @param xml
	 * @throws DocumentException
	 */
	private synchronized void createDocumentFromXml(String xml)
			throws DocumentException {
		this.xmlDocument = DocumentHelper.parseText(xml);
		this.xmlDocString = xml;
		this.domVersion = ++this.xmlVersion;
	}

	public synchronized void updateOnlyXML(String xml) {
		this.xmlDocString = xml;
		++this.xmlVersion;
	}

	/**
	 * Returns the xml document, reconstructing it from the xml string if the dom version is older, i.e.
	 * the dom has been modified in another VM and the xml string being shared by TC represents the latest
	 * Document. After reconstructing the dom the domVersion is updated to the xmlVersion.
	 * @return 
	 */
	public synchronized Document getXmlDocument() {
		if ((this.xmlDocument == null || (this.xmlVersion > this.domVersion))
				&& (this.xmlDocString != null)) {
			try {
				this.xmlDocument = DocumentHelper.parseText(this.xmlDocString);
				this.domVersion = this.xmlVersion;
			} catch (Exception e) {
				// TODO: handle exception
			}
		}
		return this.xmlDocument;
	}

	public synchronized String getXmlDocumentString() {
		return this.xmlDocString;
	}

	public String asXML() {
		return this.getXmlDocumentString();
	}
}

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.5 Build:#811 Jul 25, 2007) - Bug/feature request - Contact Administrators