My concern is that we can't guarantee that this is going to be faster in very large scenarios (1K+ and 10K+ databases), even when we're using a filter that we are confident doesn't exist. This is b/c this operation ultimately does a SELECT * FROM WHERE on the HMS, and even when the Database Name field is indexed and the value doesn't exist, this may still be slow for certain databases. It would be better if we could avoid querying databases or tables altogether.
HUE-4051 Have lighter Impala and Hive check configs call than list DBs
Review Request #7744 - Created June 9, 2016 and discarded
|enricoberti, jennykim, johan, krish, romain|
Author: Weixia Xu <email@example.com>
Date: Thu Jun 9 11:45:47 2016 -0700
[HTML_REMOVED] HUE-4051 Have lighter Impala and Hive check configs call than list DBs
:100644 100644 be891ce... e607ee5... M apps/beeswax/src/beeswax/conf.py
:100644 100644 750b82b... 42ea534... M apps/impala/src/impala/conf.py
manual, check the reponse time(TGetSchemasResp) for the request: TGetSchemasReq with and without the name filter.(schemaName='__NonExistingDB!'), we are able to see performance improvment to 13-17ms from 22-29ms for page:
Seems like the improvements are neglible, I would expect the transfer of thousands of DBs to take more time.
Were you able to execute the call to the list DBs directly from the Hue shell? e.g. http://gethue.com/how-to-fix-the-multipleobjectsreturned-error-in-hue/
Also could you execute this command from your local Hue machine to the remote HiveServer2?
Wouldn't open_session() be sufficient in checking that we can connect to HS2? Then we don't need to add a new wrapper method for GetInfo.