Concurrency in Java – Part 1, The Stepping Stones

What are Threads?

Conceptually, a thread is a distinct flow of control within a program. A thread is similar to the more familiar notion of a process, except that multiple threads within the same application share much of the same state, in particular, they run in the same address space or memory space.

Sharing the address space between threads has a big advantage as it allows the reuse of the same data space between threads. If we had a separate small data stack, local to each thread, then data sharing and data synchronization can become a big problem in programs. This is essentially the same situation that we encounter related to Session state sharing across JVM clusters in production clustered environments. Distinct address spaces would require a complex mechanism to share the mutable state across thread spaces as soon as the thread mutates its local address space. In essence, it would require address space synchronization mechanisms built into the processor platforms that use Java.

So to prevent this problem threads don’t hold their state (except when you use ThreadLocal) but instead, rely on the program’s address space for all their data input needs. Though this solves the immediate problems of data synchronization across threads, it creates another set of issues called race conditions which can make threads see stale data view of a master address space. However, this problem is much easier to solve by building thread-safe classes.

What is Thread Safety

Thread safety is an invariable attempt to design class constructs so that their objects or instances, when accessed from multiple threads, still demonstrate the idempotent behavior as they would, if accessed from a single-threaded model.

Since a single thread model access to an object is not a valid multithreaded model para-diagram, hence no matter how well an object or a component set behaves in a single thread model setup, it cannot be succinctly called 'thread-safe. Whether an object needs to be thread-safe or not depends on how that object is used instead of what it does. For some situations, complete thread safety may be a high requirement, but in some other cases, a 95% to 98% thread-safety coverage may be sufficient.

Thread safety problems are always caused when two or more threads have mutable access to the same shared data set. When threads start reading and writing some shared variable without any control then data-based problems start showing up. Since threads share the same address space this is bound to happen, especially when we have to rely on the Operating System’s capabilities and good faith about when to swap the threads in and out. Application design should strive to build classes and components to be thread safe from the very beginning. Age old concepts of encapsulation and abstraction are our best friends here. Debugging and refactoring a poorly written component set to be thread safe can be a herculean task and sometimes even a nightmare. So it's important to understand and employ correct thread safe class design principles from the very start.

How to make classes Thread Safe

At a high level, thread safety is a very simple concept, theoretically. If we allow free reading access to an object’s state but control the writing access we can make the object 100% thread-safe. This is the most logical way to do it. We need to take care of two things here:

As long as the state variables are not being modified e.g: a final int value, it is safe to allow multiple threads to read it freely.
When an action involves a write operation to be executed, we get and hold a lock on the variable and don’t allow any read operations to pursue until the write is finished. In parts, this is essentially what volatile does.

To make this strategy a flying success only thing that we need to do, as programmers, is make sure that when write blocks the read operations, the read and write operations use the same lock mechanism to test if the lock is free or not. If we can do this then we can get perfect thread safety. This is exactly what the synchronized keyword or intrinsic locks (or monitor locks) do. Though our programs would start behaving correctly in a multithreaded setup, however, this approach has two big problems both of which convene to the same outcome.

Threads would block each other: When a write operation happens the read would block since we have synchronized the write blocks. Then again when a read happens, the write and other reads would also have to be blocked because in a multithreaded setup we really don’t know exactly when a read or a write would happen and thus we need to synchronize the reads also to prevent them from being interleaving with writes. So in the end we essentially are making the program single threaded in nature. Since only one thread would always actually be alive doing the read/write, so even if we have hundreds of threads they just wait out for a chance. Economically, we are not making effective use of those 8 CPUs with 4 cores each (that we have in the production environment) since at any time only 1 thread would be active when we could harness about 32 threads in parallel. This is the problem in any synchronized java class and in the ConcurrentCollections of the java.util API.

The following example code shown a typical use of synchronized thread safe idiom.

Conclusion

This post covered a basic stepping stones of writing a simple concurrent class in Java. Most of the Java programmers tend to make their classes thread safe in this way. In the next installment of this series we will dig deeper into the Object behavior heuristics and understand how they impact the thread safety of your Java applications.

Happy Coding!! 👍

IDEAS Churn

Search This Blog