Wednesday, 25 May 2016

Volatile vs Static


Many of us get confused with volatile and static varibale. Both of them maintain a single copy then why do we need a volatile variable static can do the thing.. But we cannot just compare static and volatile variable based on  the copies they store. The difference here is all about how they store (process they follow while storing).

Declaring a static variable in Java, means that there will be only one copy, no matter how many objects of the class are created. The variable will be accessible even with no Objects created at all. However, threads may have locally cached values of it.

When a variable is volatile and not static, there will be one variable for each Object. So, on the surface it seems there is no difference from a normal variable but totally different from static. However, even with Object fields, a thread may cache a variable value locally.

This means that if two threads update a variable of the same Object concurrently, and the variable is not declared volatile, there could be a case in which one of the thread has in cache an old value.
Even if you access a static value through multiple threads, each thread can have its local cached copy! To avoid this you can declare the variable as static volatile and this will force the thread to read each time the global value.


Example:

Static Variable:  If two Threads(suppose T1 and T2) are accessing the same object and updating a variable which is declared as static then it means T1 and T1 can make their own local copy of the same object(including static variables) in their respective cache, so update made by T1 to the static variable in its local cache wont reflect in the static variable for T1 cache .


Volatile variable: If two Threads(suppose T1 and T2) are accessing the same object and updating a variable which is declared as volatile then it means T1 and T2 can make their own local cache of the Object except the variable which is declared as a volatile . So the volatile variable will have only one main copy which will be updated by different threads and update made by one thread to the volatile variable will immediately reflect to the other Thread.


Here is a diagram for better explanation:







Monday, 23 May 2016

Concurrentmodification Exception Explained.


In this post we will try to understand why concurrentmodification exception is thrown.

Yes concurrent modification exception is thrown when the structure of a collection is changed within its iterator without using iterator's remove method "But there is a catch, This blogs explains the same"

Now a code sample from the Iterator returned by ArrayList:
    public boolean hasNext() {
        return cursor != size;
    }

    public E next() {
        checkForComodification();
        <stuff>
        return <things>;
    }

    <more methods>

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
Now from this we can figure out that ConcurrentModificationException is thrown in the next() method not hasnext() method.

Take for example(removing element):

public static void main(String[] args) {

List<String> strList = new ArrayList<String>();
strList.add("Sandeep");
strList.add("Raju");

Iterator<String> listItr = strList.iterator();
int i=0;
while(listItr.hasNext()){
System.out.println(i++);
String str = listItr.next();
System.out.println(" String "+ str);
strList.remove(str);
}

}
Now as per above code "The structure of arraylist is getting changed inside the iterator " So it should throw ConcurrentModificationException but it will not. The output will be

Output:

Loop count 0
Name Sandeep

Explanation :

When the program starts it inserts 2 elements into the list. In the while loop of the iterator we print the the loop count, then we print the string value, and the loop removes the same string value. Now since the value has been removed from the list, When the control goes for the second round, the hasnext() will give you false and the control will not go inside the loop anymore. So no ConcurrentModificationException will be thrown as we didn't even reached the next method.


Now in the same case consider the following example (adding element)

public static void main(String[] args) {

List<String> strList = new ArrayList<String>();
strList.add("Sandeep");
strList.add("Raju");

Iterator<String> listItr = strList.iterator();
int i=0;
while(listItr.hasNext()){
System.out.println(i++);
String str = listItr.next();
System.out.println(" String "+ str);
strList.add(str);
}

}

Output:

Loop count 0
Name Sandeep
Loop count 1
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at com.sandeep.java7.ConcModificationException.main(ConcModificationException.java:20)



Explanation :

When the program starts it inserts 2 elements into the list. In the while loop of the iterator we print the the loop count, then we print the string value, and the loop adds the same string value again. Now a new element has been added into the list, When the control goes for the second round, the hasnext() will give you true because we still have elements to be traversed  and the control will  go inside the loop. Now when the control reaches the next() method it comes to know that the structure of the collection has been modified hence it will trow the ConcurrentModificationException.





Wednesday, 18 May 2016

Distributed Deletes in Cassandra



Cassandra cluster defines a ReplicationFactor that determines how many nodes each key and associated columns are written to. In Cassandra, the client controls how many replicas to block for on writes, which includes deletions. In particular, the client may, and typically will, specify a ConsistencyLevel of less than the cluster's ReplicationFactor, that is, the coordinating server node should report the write successful even if some replicas are down or otherwise not responsive to the write.

A delete operation can't just wipe out all traces of the data being removed immediately. if we did, and a replica did not receive the delete operation, when it becomes available again it will treat the replicas that did receive the delete as having missed a write update, and repair them! So, instead of wiping out data on delete, Cassandra replaces it with a special value called a tombstone. The tombstone can then be propagated to replicas that missed the initial remove request.

Tombstones exist for a period of time defined by gc_grace_seconds(Table property). After data is marked with a tombstone, the data is automatically removed during the normal compaction process.
Facts about deleted data to keep in mind are:
  • Cassandra does not immediately remove data marked for deletion from disk. 
  • The deletion occurs during compaction.If you use the sized-tiered or date-tiered compaction strategy, you can drop data immediately by manually starting the compaction process.
  •  Before doing so, understand the documented disadvantages of the process. A deleted column can reappear if you do not run node repair routinely.

Why deleted data can reappear

Marking data with a tombstone signals Cassandra to retry sending a delete request to a replica that was down at the time of delete. If the replica comes back up within the grace period of time, it eventually receives the delete request. However, if a node is down longer than the grace period, the node can miss the delete because the tombstone disappears after gc_grace_seconds. Cassandra always attempts to replay missed updates when the node comes back up again. After a failure, it is a best practice to run node repair to repair inconsistencies across all of the replicas when bringing a node back into the cluster. If the node doesn't come back within gc_grace,_seconds, remove the node, wipe it, and bootstrap it again.