HBASE-1923. Bulk load into existing tables

Review Request #87 - Created May 25, 2010 and submitted

Todd Lipcon
jgray, stack
Here's a first patch that implements bulk import into existing tables. This applies on top of HBASE-2586 and HBASE-2588 - I've pushed the series of the three to my github: http://github.com/toddlipcon/hbase/tree/hfof-review

I have some TODOs left that I want to take care of before this gets committed, but since it's a pretty large patch, I figured I'd get it out for review ASAP.

The stuff in the hadoopbackport package is essentially copypaste from Hadoop trunk, so you can ignore that in the review.
Primary unit/functional testing, a bit of pseudo-distributed testing. Plan on doing full system tests before commit as well.
  2. this cant go here, HFile cannot know about KeyValues. It will go into StoreFile
  3. you might also say the last key in the HFile means the last "KeyValue"
  4. this won't work for an existing table - if the current time millis > the current sequence id in the live server, we could end up losing edits during a log recovery
  2. Oops, left this in.
  2. @Ryan: This shouldn't block commit of Todd's patch though, right?  We can move it out after your refactoring?
  3. Is this patch for trunk or branch?  If branch, this addition might break RPC.   It might be ok on the end here but we should test.
    1. I will be backporting to branch for our customer, but as a major new feature I didn't expect it would be committed to Apache branch. If people want it, though, I'll put the patch up.
      In general, adding a new RPC doesn't cause incompatibility - if a client calls it and it's not there on the server side, the client will just get an exception that the method doesn't exist. (eg see my backport of HDFS-630 to hadoop 20)
  4. Is this a good description? Its underneath the Import.class which imports by going against the API.  You might add a work about it going behind the API, writing hfiles?
    Hmmm... later I see that it can do both, write hfile or write to table
  5. Man, we never enable asserts.... throw an exception!
    1. Switched to using google Preconditions
  6. Just asking, the comparator for sure is going to do the right thing here?
    1. yea, ImmutableBytesWritable implements Comparable
  7. Won't javadoc mangle your nice formatting here?  Use ul/li or wrap w/ <pre>?
    1. good call, fixed to ul/li
  8. Why two types?  Is this legacy?  KV has advantage of being able to carry a Delete
    1. yea, it's for compatibility with TableOutputFormat which can take either. Kind of a pain, maybe we should just get rid of it and only accept KV?
    2. Doing both is a pain yes.  If Delete and Put had a common ancestor/Interface, we could use that but it ain't there yet.  File an issue to undo Put option?
  9. In KV, there is a parseColumns function that might help here
    1. doesn't seem much more convenient (since I already have String here)
      One question, though: if I want to insert into a family that has no qualifiers, should I be using EMPTY_BYTE_ARRAY when I construct the KV, or passing null?
    2. It looks like either works.  It looks like nulls are handled properly.
  10. Is this safe?  Is there escaping you should be handling here?
    1. Plan to address this in docs for importtsv - it's not a good TSV parser that supports quoting, etc.
    2. ok
  11. I suppose in rare circumstances, it could split in here and mess you up.  You need a no-split flag per region out in zk or something?  We'd need such a thing for snapshotting methinks.  But, thats for a later...
    1. if there's been a split, the new daughter region will detect that the HFile doesn't "fit" and throw WrongRegionException. This then triggers the retry on ServerCallable, which fetches the new region info from meta, and realizes it has to split the hfile before trying again.
    2. Nice.
  12. Yeah, there is probably other metadata you'd want to bring over beyond blocksize -- like whether its compressed?  For later.
  13. Class comment.  Whats this thing do?
    1. fixed - copied from KeyValueSortReducer
  14. We need this even though you wrote out partition edges earlier when you read region boundaries?
    1. You're right that it's unused, but I figured I'd put it in for convenience - when doing a bulk load into a new table, for example, you may want to use this in order to figure out where to set region splits.
    2. Ok.  Sounds good.
  15. This is old cruddy stuff you are asking about.  It looks like I added splitsAndClosesLock to stop either happening during critical periods (HBASE-588) and the splits lock looks now like a half measure from way back in 2007 -- HBASE-217.  Lets make an issue to clean it up.
    1. Which one should I be taking? Do I need both? Which order?
    2. Seems like it depends on what you need.  I like your issue on rethinking these locks.  Sounds like we might need to do stuff like break apart the splittingAndClose lock into a splitting and closing.
  16. There is FSUtils in hbase util
    1. On second thought actually this should be somewhat store-specific, so took out the TODO
  17. What you going to do here?  Get the last key from the map of storefiles in current region and make your file be one less than it?  Your timestamp is current though so edits in here could overshadow those in another storefile.  Thats ok?
    1. going to make storefiles be a list instead of map... new review coming in a bit
  18. You were going to remove this
  19. Whats this about?
    1. added comment - idea is to restore back to local-mode job runner when cluster shuts down
  20. Its not removing it itself?  Its registered to cleanup on exit.
    1. was seeing cases where in between different tests in the same test case, it wouldn't get cleaned up, etc.
  21. How?  If nullwritables, don't I just get nulls?
    1. Changed to:
       * Input format that creates as many map tasks as configured in
       * mapred.map.tasks, each provided with a single row of
       * NullWritables. This can be useful when trying to write mappers
       * which don't have any real input (eg when the mapper is simply
       * producing random data as output)
  1. Looks good to me.  I don't have any major issues so go ahead and commit addressing any comments of mine you think prompt fixup.
Review request changed

Change Summary:

One more day of hacking on this, adds some docs, cleans up the command line parameters, fixes a couple bugs identified in cluster testing.

Calling this one final, I swear :)

I don't know how to build the forrest docs in trunk, but the new xml file does pass xmllint.


Revision 4 (+3115 -241)

Show changes

  1. This is a killer feature.  Ship it.  There is one nitpick in the below but can be done on commit (if its legit)
  2. Where is the javadoc that the tsv parse doesn't do escaping?
  3. This is excellent.  We do have this though already: http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
    We should scrap the above if we're going to go w/ this or update the former to include this.
  4. This is a killer feature
  1. Reviewed changes between v2 and v3.  It looks like they were lost -- some failed communication with the server.  The storefile loading stuff is good.  I like the new Comparator.  Oh, one thing is that we have MD5Hash class now.. maybe your md5'ing could go there.
  2. Ignore