HBASE-3677: Create globally unique cluster identifier

Review Request #1669 - Created March 28, 2011 and submitted

Gary Helmling
This patch generates a unique identifier for an HBase cluster during master filesystem initialization (or during active master startup for existing installations), and synchronizes the value to all clients using ZK.  This allows both clients and servers to have a persistent, unique, agreed upon value to identify cluster instances.

One example where this will be important is for token-based authentication of clients (HBASE-3615), when running in multi-cluster environments.  In this case a separate authentication token may be obtained by the client for each cluster, and the HBase client and RPC servers need to agree upon which token should be used for authentication in each instance.  This is done by keying the client tokens by cluster ID, allowing the token to be selected based on the ID of the cluster being connected to.

Key parts to the implementation:
 1. The unique ID is just a random UUID stored in HDFS as "HBASE_ROOT/hbase.id"
 2. ClusterIdTracker, a new ZooKeeperNodeTracker implementation, publishes the ID value as "/hbase/hbaseid"
 3. Cluster ID is ultimately passed down to HBaseClient as a config property, using "hbase.cluster.id"

Originally I thought that passing cluster ID to HBaseClient through Configuration was just a convenient hack, but now I think it may provide some useful flexibility, either in testing, or allowing non-standard clients to make use of the value without being forced to handle the ZK watcher (since this should be set once at create time and then never change). 
FWIW, I've tested this implementation in conjunction with a token authentication implementation that makes use of it.
  1. +1
    However Gary and I discussed this and I believe there will be a v2 patch soon that makes the watcher of the cluster ID one-shot only. It's a little weird to have a watcher on an immutable value beyond when it is needed at startup.
  1. Just one comment below otherwise patch looks great but I can't help thinking why we need a tracker on this?  Isn't it set once and then kept for ever?  When would it change?
    1. Yes, no real need for a tracker.  Andy and I had discussed that previously.  ZK is just used to broadcast the ID to clients and servers without the need for HBase RPC (which for token authentication requires selection of the right token, but we don't know the token without the ID...).  I'll post an update that just reads the ID from ZK without using a tracker.
  2. Will we need to up the RPC version for this?
    1. Yes, I suppose we will.  And up the ClusterStatus version.
      As an aside, it would be nice if VersionedWritable/VersionMismatchException allowed something a little more nuanced that just throwing an exception and bailing out.  Some more motivation to look at other serialization frameworks...
    2. ... because HServerInfo carries HServerLoad and HServerInfo is mentioned in a few of our proxy Interfaces.
Review request changed
  1. +1