HBASE-2468: Improvements to prewarm META cache on clients.

Review Request #98 — Created May 28, 2010 and submitted

stack, todd
HBASE-2468: Improvements to prewarm META cache on clients.

1. Add new HTable methods which support region info de/serialation, and region cache prewarm: 
- void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache
- Map<HRegionInfo, HServerAddress> deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache 
- prewarmRegionCache(Map<HRegionInfo, HServerAddress> regionMap): MR job can prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. 
Unit tests passed locally for me. 
  2. Properly document this argument.
  3. Please use a concurrent collection and remove the synchronized blocks.  For guidance, see around slide 30 of this presentation:
    1. Belay that order, I need to inspect this code review... unfortunately a concurrent collection is not the answer to all multi threading woes.
  4. Coding style: put the catch on the previous line.
  5. We don't typically call methods using `this.methodname(args)' – remove the `this.'
  6. Add a space before the `:'.
  7. Use `Boolean.TRUE' instead of `new Boolean(true)'.
    1. Please use the autoboxing features of java6 (our target platform). Ie: just 'true' and 'false' for this and all others.
  8. Use `Boolean.FALSE' instead of `new Boolean(false)'.
  9. Use `Boolean.TRUE' instead of `new Boolean(true)'.
  10. Remove the outer parentheses.
  11. Add a space before the `:'.
  12. If this fits on the previous line while staying under 80 columns, please wrap it around.
  13. Remove the space after `cacheLocation'
  14. Use {@link #readFields readFields} instead of <code>readFields</code>
  15. You can remove this <p>
  16. Use <pre> not <code>
  17. Add a space before the `:'.
  18. Use {@link #getRegionsInfo getRegionsInfo}
  19. Use {@link ...} here and below.
  20. Wrap this around with the previous line.
  21. I believe you can remove this block.
  2. What does this method actually do? If its a predicate start it with 'is'.  Also precaching should be off by default...
     Is it?
    1. Will add ``is'' prefix to this method. 
      Pre-caching (in case of a cache miss) is set to ``on'' by default right now, according to stack:
      ``I'm for default behavior being grabbing more than just the one row – ten or something. Regards full-table scan as default, I think it should be an option (Its a nice option to have). ''
  2. I think any of getCachedRegionCount, countCachedRegions, or getNumCachedRegions would be a clearer name.
  3. style-wise, why "final" here?
  4. remove these @param javadocs - they just take up space if the param names are self-explanatory (which these are)
    same thing above
  5. this needs a Comparator argument, otherwise it does object identity rather than bytewise comparison of the table names
  6. javadoc out of date - it prefetches the region for the given row, and prefetchLimit regions ahead
  7. should return false to stop scanning at this point, right?
  8. // cache this meta entry
    (it's not caching all)
  9. this block should go after the cache check below, right?
    1. I reimplemented MetaScanner so it will start scanning Meta from the region where the given user table row resides, instead of scanning from the next region row in Meta.
      In this case, prefetchRegionCache() also fetches the queried table+row to the cache. Here I put it before the cache check block, so it can load the result from cache directly. Otherwise it may do an extra meta scan if cache prefetch is enabled. 
      However if multiple threads accessing this block concurrently, this way will cause cache-prefetch executed twice. But I think this case is pretty rare, so the penalty should be relatively smaller. 
      What do you think? 
  10. this function needs to synchronized on cachedRegionLocations which is an unsynchronized HashMap
  11. return location != null;
    1. I don't like it either. I will kill all ``is...Disabled()'' methods. 
  12. you should prefix the file with writeInt(allRegions.size()) so you don't have to use EOFException catching below
  13. yuck, see above
  14. I feel like it would be cleaner to have the following methods be non-static, so they don't need any arguments. that would reduce the number of functions by factor of two
    1. That was what I wanted to do in the very beginning. I don't like so many functions either.  
      But there is one existing method: 
        public static boolean isTableEnabled(String tableName) ...
        public static boolean isTableEnabled(Configuration conf, String tableName) ...
      It's one of the reason that I just want to keep the original coding style to be consistent. 
      In addition, I agree we can make isRegionCachePrefetchEnabled() etc. to be non-static. However, enableRegionCachePrefetch(), and disableRegionCachePrefetch() must to be static since we want to enable/disable a table without instantiate HTable. In another word, we cannot really dis/enable cache prefetch for each HTable instance who have the same table name. While we can only enable/disable based on table name. It is pretty similar as table enable/disable. 
  15. incorrect javadoc, also a few places below
  16. why do we need a separate function for enabled and disabled? aren't they always inverse of each other?
  17. should make clear this is the row name in the user table, not the meta table.
    should also be clarified that it will start scanning with the region *after* row, because it doesn't use getClosestRowBefore
  18. you should use util.getTestDir here
  19. import java.io.File
  20. I really like these unit tests, but I think you should use a row key for the Get that isn't also a region start key, as it may expose different behavior. eg I think you will end up with 11 cached regions instead of 10
    1. As mentioned above, I reimplemented MetaScanner so it will start scanning Meta from the region row where the given table row resides, instead of scanning from the next region row in Meta. HTable.getRowOrBefore() is called in MetaScanner to achieve it, (however I'm not very sure whether it is the most efficient way to do it or not). 
      So for this unit test, no matter the passed row is a region start key or not, we will always get 10 here. 
  1. I think this good to go.  Seem my comments below.  See what you think.  My one concern is the number of calls to getRowOrBefore... hopefully this patch cuts down overall on our need to use this function.  I'd like to hear your opinion on that.
  2. This is code duplicated from elsewhere.  Can I help make it so we don't have to do this duplication?  Or, for now, since this your fist patch, we can put it off IF you file a JIRA to fix the duplication (smile).
  3. So we start scanning at 'row'?  Is this the 'row' the user asked for? No, it needs to be the row in the .META. table, right?  We need to find the row in .META. that contains the asked for row first?  NM, I see below how the row here is made.. .this looks right.
  4. This is a nice little facility.
  5. OK.  This looks right.
  6. getRowOrBefore is an expensive call.  Are we sure we are not calling this too often?
    1. I agree it is an expensive call. 
      However I don't think it would bring any performance penalty for existing and potential use cases:
      Use case 1 -- existing MetaScanner users: since this method is newly added, existing users won't be affected; 
      Use case 2 -- hbase clients when locating a region :  
      1) if prefetch is on, it calls this MetaScanner with [table + row combination], which calls getRowOrBefore() to get current region info, then number of following regions from meta. After that, the client can get the region info directly from cache. 
      2) if prefetch is disabled (current behavior), it eventually calls similar method getClosestRowBefore() to get desired region. 
      So no matter prefetch is on or not, getRowOrBefore(or getClosestRowBefore) eventually is called. The only difference is whether to scan following regions from meta or not. 
      For future MetaScanner users which scan from one region with desired use table row, it has to take the effort since it is the expected behavior. 
  1. Looking good! Just a few notes.
  2. I thought we were collapsing these two calls into setRegionCachePrefetchEnabled(tableName, enabled)?
  3. I don't entirely understand why we key these hashes by integer, but it seems like you're following the status quo, so doesn't need to be addressed in this patch.
  4. I still don't quite understand the logic about why these should be static. Previously you pointed to the enable/disable calls, but those are more like admin calls, not calls that affect client behavior. Anyone else have an opinion?
  5. I think this should be Math.max(rowLimit, configuration.getInt(...)) - if we only want to scan 5 rows, we don't want the scanner to prefetch 100 for us.
  1. This looks great!  I think we should think more about where to expose this, and also think about using more of a get/set API to reduce the method calls and make it look more like the other client APIs.
  2. +1 on collapsing with a boolean.  setRegionCachePrefetch(table, enable)?  Seems self descriptive and with a couple lines of javadoc should be clear.
    1. yes sir. will modify to use set...() and get...(). 
  3. I guess these are static because of how HTables all share a single HCM per conf.  The setting of prefetching is set at the HCM level not the HTable level, however clients are usually not exposed to HCM and only deal with HTable.
    We should probably make it clear in the javadoc for these methods that they apply to all HTable instances, though that may be clear from being static.
    Maybe since these are more advanced calls, they shouldn't be in HTable?  If we provide proper documentation, it should be easy enough for a user to grab the HCM and apply the config at that level?
    1. > I guess these are static because of how HTables all share a single HCM per conf...
      > Maybe since these are more advanced calls, they shouldn't be in HTable?
      Two alternatives:
      1) HCM: as jgray said, ``however clients are usually not exposed to HCM and only deal with HTable.''
      2) HBaseAdmin: it is a more reasonable design choice since these operation are at HCM level. 
      3) or, make it a configuration. It would be one global configuration applied to all tables, and cannot be changed dynamically. 
      I like 2) better, but not really sure whether we want to expose it there or not. 
      What do you think? 
    2. Adding it to HBaseAdmin could make sense.  This one is a bit of an odd one because it's a client-side configuration parameter done at the per-client-jvm level.  Typically we have per-query or per-htable-instance configs.  HBaseAdmin is generally made up of remote administration commands not local client config.
      If we provide sufficient javadoc (including as a class comment on HTable) it doesn't matter so much where we put it.  Since it's distinct from what's currently in HTable and HBaseAdmin, maybe it does make sense as a static in HCM?
    3. I think keeping HConnectionManager an internal interface is a good idea, so kind of -0 there. -1 on HBaseAdmin, since we should keep that for administrative functions that really change something on the cluster. So I'd prefer HTable, but wouldn't cry over HCM.
    4. Since we can knock it down to just two methods, get/set, let's just put it in HTable.
      But it will be static, right, so people understand it's not per-instance.  Let's also make sure there is javadoc that also explains that it is for all HTable instances for the tables you configure.
  4. Should we also make these methods (if we even leave it in HTable) more like setRegionCachePrefetch / getRegionCachePrefetch?  HTable is the core client class so the less noise we add the better.  We have other methods in client APIs of the get/set form like Scan.setCacheBlocks and Put.setWriteToWAL
  5. Hmm, wouldn't that be Math.min then?  If rowLimit = 5 and scanner.caching = 100, we want to only do 5?
Review request changed

Change Summary:

1) resolved conflicts from the latest code commits
2) added get/setRegionCachePrefetch() in HTable, removed isRegionCachePrefetchEnabled(), enableRegionCachePrefetch(), etc. 
3) in MetaScanner, added Math.min(rowLimit, configuration.getInt(...))


Revision 6 (+518 -3)

Show changes

  1. +1
    I did a quick skim.  All looks good.  This is a weird one in that to prewarm we are going to do a getClosestOrBefore (which we'll be doing anyway) and then we'll open scanner, and scan next 10 rows... then close scanner; i.e. extra rpcs inline w/ first lookup against tale.  This latter will actually slow down first lookup some but we can make it faster by making open scanner return results so we don't have to then go call next (already an issue to do this) and also, making it so we scan forward always by changing keys in .META. such that region names are keyed by endkey rather than startkey... also another issue to do this.