distributed log splitting now with better testing and configurable splitting

Review Request #370 - Created July 22, 2010 and discarded

Alex Newman
This build on the previous work. It does some smarter stuff with testing and now splitting is configurable.
ran on our private hudson
  1. First pass on this patch. Lots of cleanup that needs to be done, and it's a bit hard to follow the flow of events without any clear documentation that gives an overview of distributed splitting. Nothing big, just some use cases that could be put in the class javadoc of LogSplitter?
    1. I'll add the story and get these changes in.
  2. I'm sure you have a good reason of putting that there, but at least one issue I'm seeing is that this code is also in init() (which will be run just after that) and it's almost the same thing.
    Also, fs.automatic.close is handled by the ShutdownHook class, you shouldn't be setting it.
  3. Fix those long lines.
  4. Why are those static?
  5. remove that white space and all the others in that class at the same place
  6. both process and run call this method, can there be a race?
  7. don't need to declare this here
  8. What does that mean?
  9. confusing name when looking at what's returned, fix that
  10. Why two lines for nodes? Also, if nodes is null for any reason, won't that throw an NPE?
  11.  most of that stuff can be removed and put into the 
  12. so you create a lock with data=null?
  13. Or you were just disconnected, could mean a lot of things right?
  14. JavaBean convention, don't start parameters' name with upper case
  15. So we log here and we log in LogSplitter, remove one of them?
  16. again, name confusing WRT returned type
  17. don't start with upper case
  18. Usually ppl check that the other way around
  19. use HConstants.EMPTY_BYTE_ARRAY
  20. third ERROR line if splitPath is null, keep only one
  21. pull the next lines on this one with a tertiary operator
  22. copy pasta, we're in 2010 now! :P
  1. Hey Alex, this is looking good.  The master rewrite branch has a refactoring of ZooKeeperWrapper and general ZK usage inside HBase that conflicts with this pretty significantly.
    Do you think you could pull the new methods and classes nested in ZooKeeperWrapper into a separate class of static methods?  If you need the instantiated instance of ZKW, pass it in as the first argument to the static methods?  That will make my life WAY easier when I have to merge the branch back into trunk.
    Also gives an opportunity to have a class comment in the new class explaining the overall usage of zk.
    Stuff like the names of the nodes can be left in the instantiated ZKW class since it makes sense to pull those in from the confs on instantiation.
    Cool?  Let me know if you want an example.