HBASE-5360 [uberhbck] Add options for how to handle offline split parents.

Review Request #2122 - Created June 7, 2012 and submitted

Jimmy Xiang
Added -fixSplitParents to hbck to fix lingering split parents, and so on.
TestHBaseFsck is green
  • 0
  • 0
  • 20
  • 6
  • 26
Description From Last Updated
    1. Splitting info is in meta. We do fix both meta and hdfs. fixSplitParents should be fine.
    1. I will ignore this one and check the return for the next one.
      If this one fails, the next one will fail too. Good catch.
  2. regionInfoMap is a object field. do we need to add a parameter here?  
  3. hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java (Diff revision 2)
    mention meta in function name?
    for trunk and 0.94 can we used the atomic delete/put from HBASE-3584?
    For 0.92/0.90 this implementation is fine.
    1. Fixed.  How could I forget this?  I spent quite some time on mutateRow API before!
  4. fix == materialize.  
    What is the fix in the comment? Why this approach?
    1. I will fix the comment.
      The other approach is to ignore this split parent and wait till the reference file is resolved.  Now sure when will this happen?
      What do you suggest?
    2. My intuition says that we should have the default behavior be ignore regions marked split unless the new flags are specified (which I believe is the previous behavior).  If the new splitparents flag is specified, wouldn't it be easier to make sure that if the parent is marked offline in meta that that is fixed, and then use use the rest of hbck's repair logic to decide on merges or sidelining for children and parents?  Then we don't have to copy paste the half-region code.
    3. After a second thought, I agree it is easier and simpler to let the existing hbck's repair logic to handle it and ignore the split regions.
      We will fix the meta only if this flag is specified.
  5. Why was this removed? 
    I think it is ok -- IIRC all splits are offline but not all offline are splits (merger uses this too I believe are)
    1. Yes, all splits should be offline.  If split is true, offline is not, it is still splitting.
  6. have message say something about in Meta but not in HFDS?
  7. "not enough" or no children regions overlapping?
    1. no children regions overlapping, will fix.
  8. Isn't this case something that will "naturally" get handled by the CatalogJanitor and the compaction process?
    Feels like this could be race inducing.
    1. You are right.  It should be handled by the CatalogJanitor and the compaction process.  But sometimes, CatalogJanitor can't fix it.
      Should we just ignore it?
    2. This is related to the why question I asked earlier.  Why not just restore the parent and get ride of the daughters? if the daughters have data, why not just merge that data back in and then let hbase try to split it again with a separate attempt?
      Higher level, I think we are going to need some sort of safe mode for when hbck does operations to make it more robust by avoiding meta data change races.  This might allow it to be usable all the time.
    3. Ok, will simplify it as mentioned above to avoid meta data change races.
  9. This is a legit error and probably only present on hbases before HBASE-4562 (pre 0.90.5)
  10. include meta in ORPHAN_SPLIT_PARENT
  11. these two should mention HDFS.
    1. They are in both HDFS and META.
  12. This is begging for a simple unit test that explains by example what you expect and covers edge cases like NULL endkey.
  13. style nit: flip so that you exit early due when it is empty, reduces the amount of indentation
  14. This is stricter than the comment's description.  this has to exactly cover (what if the startRange's start key is before the parents's start key - seems like it should be ok.)  Fix comment or fix code?
    stylenit: also, flip this to be !=0 return false;  and then have the rest.  this limits nesting and makes it easier to read.
    1. Fixed the comment and the style.
  15. return BYTES_COMPARATOR.compare(...)==0;
    also, this is stricter than you comment.  My guess is that the comment will change. :)
  16. maybe rename copyReference to materializeReference?
  17. rename path to refFilePath and/or fix javadoc.
  18. Is the DataBlockEncoding.NONE going to work everywhere? 
    -- nm, this is how data is encoded in the cache.
    1. That is used for data in cache. Added comment.
  19. should this be a raw scanner instead of a default scanner (to get versions and deletes?)  Not sure how to pipe that parameter into here.
    hm.. this is a copy and paste.  Let's answer the higher level question first (why do we do this copy?)
    1. You are right.  It is coped from LoadIncrementalHFiles.  Ted raised this question too.
      I think it is better to put all the store file related utilities together.
    2. Did you check to see if it does a raw scan already? (I didn't actually check but it seems like the does the not-raw version).
      If the normal splitting code or loadincrementalhfiles does this as well, these operations may potentially break some of the assumptions  raw scans have. 
      Consider a raw scan returning some set of information, and then a split happening including reference copying, and then a new raw scan happening.  It wouldn't have any of the old versions or delete tombstones.  Do you agree?  if so we should probably bring this up in the mailing list.  
  20. surprised this doesn't exist elsewhere.
    1. There are several createHRegion in class HRegion. We could not use any though.
  21. rename to include Meta in function name?
  22. probably want offline as well - see https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java#L168
  1. lgtm.  One minor nit, your call.
  2. Maybe we should make this change the name or comments with this?  This is something one normally doesn't do.  Maybe we make it "forced online of offline split parent"?
    Would be great if this was added to the docs.  At the least please file a follow on jira.
    1. Cool, will change the comment and file a doc jira.
Review request changed

Status: Closed (submitted)