Skip to main content

Concurrency in Java – Part 1, The Stepping Stones


What are Threads?

Conceptually, a thread is a distinct flow of control within a program. A thread is similar to the more familiar notion of a process, except that multiple threads within the same application share much of the same state, in particular, they run in the same address space or memory space.

Sharing the address space between threads has a big advantage as it allows the reuse of the same data space between threads. If we had a separate small data stack, local to each thread, then data sharing and data synchronization can become a big problem in programs. This is essentially the same situation that we encounter related to Session state sharing across JVM clusters in production clustered environments. Distinct address spaces would require a complex mechanism to share the mutable state across thread spaces as soon as the thread mutates its local address space. In essence, it would require address space synchronization mechanisms built into the processor platforms that use Java.

So to prevent this problem threads don’t hold their state (except when you use ThreadLocal) but instead, rely on the program’s address space for all their data input needs. Though this solves the immediate problems of data synchronization across threads, it creates another set of issues called race conditions which can make threads see stale data view of a master address space. However, this problem is much easier to solve by building thread-safe classes.

What is Thread Safety

Thread safety is an invariable attempt to design class constructs so that their objects or instances, when accessed from multiple threads, still demonstrate the idempotent behavior as they would, if accessed from a single-threaded model.

Since a single thread model access to an object is not a valid multithreaded model para-diagram, hence no matter how well an object or a component set behaves in a single thread model setup, it cannot be succinctly called 'thread-safe. Whether an object needs to be thread-safe or not depends on how that object is used instead of what it does. For some situations, complete thread safety may be a high requirement, but in some other cases, a 95% to 98% thread-safety coverage may be sufficient.

Thread safety problems are always caused when two or more threads have mutable access to the same shared data set. When threads start reading and writing some shared variable without any control then data-based problems start showing up. Since threads share the same address space this is bound to happen, especially when we have to rely on the Operating System’s capabilities and good faith about when to swap the threads in and out. Application design should strive to build classes and components to be thread safe from the very beginning. Age old concepts of encapsulation and abstraction are our best friends here. Debugging and refactoring a poorly written component set to be thread safe can be a herculean task and sometimes even a nightmare. So it's important to understand and employ correct thread safe class design principles from the very start.

How to make classes Thread Safe

At a high level, thread safety is a very simple concept, theoretically. If we allow free reading access to an object’s state but control the writing access we can make the object 100% thread-safe. This is the most logical way to do it. We need to take care of two things here:

  • As long as the state variables are not being modified e.g: a final int value, it is safe to allow multiple threads to read it freely.
  • When an action involves a write operation to be executed, we get and hold a lock on the variable and don’t allow any read operations to pursue until the write is finished. In parts, this is essentially what volatile does.
To make this strategy a flying success only thing that we need to do, as programmers, is make sure that when write blocks the read operations, the read and write operations use the same lock mechanism to test if the lock is free or not. If we can do this then we can get perfect thread safety. This is exactly what the synchronized keyword or intrinsic locks (or monitor locks) do. Though our programs would start behaving correctly in a multithreaded setup, however, this approach has two big problems both of which convene to the same outcome.

Threads would block each other: When a write operation happens the read would block since we have synchronized the write blocks. Then again when a read happens, the write and other reads would also have to be blocked because in a multithreaded setup we really don’t know exactly when a read or a write would happen and thus we need to synchronize the reads also to prevent them from being interleaving with writes. So in the end we essentially are making the program single threaded in nature. Since only one thread would always actually be alive doing the read/write, so even if we have hundreds of threads they just wait out for a chance. Economically, we are not making effective use of those 8 CPUs with 4 cores each (that we have in the production environment) since at any time only 1 thread would be active when we could harness about 32 threads in parallel. This is the problem in any synchronized java class and in the ConcurrentCollections of the java.util API.

The following example code shown a typical use of synchronized thread safe idiom.

Conclusion

This post covered a basic stepping stones of writing a simple concurrent class in Java. Most of the Java programmers tend to make their classes thread safe in this way. In the next installment of this series we will dig deeper into the Object behavior heuristics and understand how they impact the thread safety of your Java applications.

Happy Coding!! 👍

Comments

Popular posts from this blog

Does Linux Need a Defragmentor?

In computing terminology, file system fragmentation sometimes also called file system aging, is the inability of a file system to lay out related data pieces sequentially or contiguously on the disk storage media. This is an inherent problem in storage-backed file systems that allow in-place or live modification of their contents. This is one of the primary use cases of data fragmentation.  File system fragmentation increases disk head movement or seeks required for fetching the whole file, which are known to hinder throughput and performance of disk reads and writes. As a remedial action, the motive is to reorganize files and free space back into contiguous areas or blocks so as to keep all pieces of a file data together at one place, a process called defragmentation. Ok, give me an example please? let's attempt at explaining in a simple, non-technical manner as to why some file systems suffer more from fragmentation than others. Rather than simply stumble through lots of dry te

Concurrency in Java – Part 3, State Visibility Guarantees

This post is the part 3 story in the  Concurrency in Java – Series  that touches the core concepts of concurrency in Java and provides a balanced view from the JVM memory model specifications as well as from the programmer's perspective. To see the list of posts in this series, please visit  here . ----------------********---------------- In the previous post , we discussed at length about the object behaviour heuristics. In this installment of the series we shall try to examine the common State Visibility Guarantees that we get from the JVM specification and JVM implementations. Locking idioms not just help us in controlling access around program ‘critical paths’ but also allows us to coordinate access and guarantee consistency and durability of state visibility operations. This is an important consideration that JVM specification offers as a subtle guarantee. Without this guarantee, there would actually be little or no sense

Are we DevOps, or SRE?

SRE Vs. DevOps has been a ranging battle for quite sometime now. Ask any operations or infra team in today's bubbling tech shops and, they will tell you, without a flutter, that they are SRE. Scratch them a bit more and, they will tell you how they follow the SRE principles with stories from their daily grind and link them to Google's SRE handbooks. Then drop the big question, "But isn't the same thing DevOps too?" and see them getting confused and incoherent a bit. Now, if you ask, "Or maybe, yours is more of a hybrid model than pure DevOps/SRE?". Now, you might have turned a few heads and even made some ponder further away. Managing "Operations" as a disciplined practice has always been an arduous task. Many companies today have dedicated operations departments engaged in planning and executions, but more often than not, they fail badly. The tech operations landscape is no different. There are always, generally unsolved questions about how t