HUE-4051 Have lighter Impala and Hive check configs call than list DBs

Review Request #7744 - Created June 9, 2016 and discarded

weixia xu
hue
20160609-HUE4051
HUE-4051
hue
enricoberti, jennykim, johan, krish, romain

commit 9b733fa593d5dd764f0f343bed83e39cc6f1120d
Author: Weixia Xu <weixia@cloudera.com>
Date: Thu Jun 9 11:45:47 2016 -0700

[HTML_REMOVED] HUE-4051 Have lighter Impala and Hive check configs call than list DBs

:100644 100644 be891ce... e607ee5... M apps/beeswax/src/beeswax/conf.py
:100644 100644 750b82b... 42ea534... M apps/impala/src/impala/conf.py

manual, check the reponse time(TGetSchemasResp) for the request: TGetSchemasReq with and without the name filter.(schemaName='__NonExistingDB!'), we are able to see performance improvment to 13-17ms from 22-29ms for page:
http://localhost:8000/about/admin_wizard

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
  1. My concern is that we can't guarantee that this is going to be faster in very large scenarios (1K+ and 10K+ databases), even when we're using a filter that we are confident doesn't exist. This is b/c this operation ultimately does a SELECT * FROM WHERE on the HMS, and even when the Database Name field is indexed and the value doesn't exist, this may still be slow for certain databases. It would be better if we could avoid querying databases or tables altogether.

    1. This is true, do we have any calls, e.g. GetInfo() that would work for both Hive/Impala
      https://github.com/cloudera/hue/blob/master/apps/beeswax/thrift/TCLIService.thrift#L1144 ?

      (note, then that might become to much for a starter jira)

  2. 
      
  1. Seems like the improvements are neglible, I would expect the transfer of thousands of DBs to take more time.

    Were you able to execute the call to the list DBs directly from the Hue shell? e.g. http://gethue.com/how-to-fix-the-multipleobjectsreturned-error-in-hue/
    Also could you execute this command from your local Hue machine to the remote HiveServer2?

  2. apps/beeswax/src/beeswax/conf.py (Diff revision 1)
     
     

    Rename

    '__NonExistingDB**!'

    -->

    'HUE_CANARY_DB'

    ?

  3. 
      
  1. Wouldn't open_session() be sufficient in checking that we can connect to HS2? Then we don't need to add a new wrapper method for GetInfo.

  2. 
      
Review request changed

Status: Discarded

Loading...