This patch generates a unique identifier for an HBase cluster during master filesystem initialization (or during active master startup for existing installations), and synchronizes the value to all clients using ZK. This allows both clients and servers to have a persistent, unique, agreed upon value to identify cluster instances.
One example where this will be important is for token-based authentication of clients (HBASE-3615), when running in multi-cluster environments. In this case a separate authentication token may be obtained by the client for each cluster, and the HBase client and RPC servers need to agree upon which token should be used for authentication in each instance. This is done by keying the client tokens by cluster ID, allowing the token to be selected based on the ID of the cluster being connected to.
Key parts to the implementation:
1. The unique ID is just a random UUID stored in HDFS as "HBASE_ROOT/hbase.id"
2. ClusterIdTracker, a new ZooKeeperNodeTracker implementation, publishes the ID value as "/hbase/hbaseid"
3. Cluster ID is ultimately passed down to HBaseClient as a config property, using "hbase.cluster.id"
Originally I thought that passing cluster ID to HBaseClient through Configuration was just a convenient hack, but now I think it may provide some useful flexibility, either in testing, or allowing non-standard clients to make use of the value without being forced to handle the ZK watcher (since this should be set once at create time and then never change).
FWIW, I've tested this implementation in conjunction with a token authentication implementation that makes use of it.
However Gary and I discussed this and I believe there will be a v2 patch soon that makes the watcher of the cluster ID one-shot only. It's a little weird to have a watcher on an immutable value beyond when it is needed at startup.
Just one comment below otherwise patch looks great but I can't help thinking why we need a tracker on this? Isn't it set once and then kept for ever? When would it change?
Will we need to up the RPC version for this?
Review request changed
New version of patch. Changes:
* Increments version numbers for ClusterStatus and HMasterInterface.
* Pares down ClusterIdTracker -> ClusterId. Now just reads the cluster ID from ZK and stores the first non-null response.