HUE-7639 [dashboard] Support median & percentile in SQl

Review Request #12091 - Created Nov. 27, 2017 and submitted

Jean-Francois Desjeans Gauthier
hue
master
HUE-7639
hue
jgauthier
commit e2717a2fd2d4d535596fd278943bf2dccb9a4368
Author: jdesjean <jgauthier@cloudera.com>
Date:   Mon Nov 27 12:46:51 2017 -0800

    HUE-7639 [dashboard] Support median & percentiles in SQL

:100644 100644 9cd777a24a... 3534e1cde2... M	apps/impala/src/impala/dashboard_api.py
:100644 100644 f401941fe2... 240f95b3d9... M	desktop/libs/dashboard/src/dashboard/static/dashboard/js/search.ko.js
:100644 100644 114f9f170f... b41d59f9d8... M	desktop/libs/dashboard/src/dashboard/templates/common_search.mako
:100644 100644 daec317b1d... b680b1d56b... M	desktop/libs/notebook/src/notebook/dashboard_api.py


  • 0
  • 0
  • 11
  • 0
  • 11
Description From Last Updated
  1. 
      
  2. Could we avoid this one?

  3. This should not be needed (only in the dimensions)?

    Should also break existing dashboard without facet.properties.aggregate

  4. Same as auto update for mincount.

    Do we want to autorefresh only in case of Solr and not SQL?

  5. Should we move to only 1 percentile value?

    (might be bit less efficient for Solr but actually more practical so better)

  6. Upper case, SELECT etc..?

  7. desktop/libs/notebook/src/notebook/dashboard_api.py (Diff revision 1)
     
     
     
     
     
     
     

    Any way to avoid too many external ifs?

    (as it will grow out of control soon)

    e.g.

    if facet['aggregate']['function'] == 'median':
    if cls._supports_percentile():
    ...
    elif cls._supports_cume_dist()

  8. space: -1 * value

  9. 
      
  1. Nice!

    2 points about the design of SQL compatibilites and simplfying a bit more the percentiles

  2. Can be in a follow-up, but Hive supports it via a builtin percentile UDF.

    percentile
    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

    Would be a good example to validate the design of using 'def _supports_median(self):' functions.

    Maybe those functions should return the good SQL fragement when supported, e.g. 'percentile(col, 0.5) and None when it means that it is not supported?

    1. Maybe the current implentation is not clear, but while Hive does not support the keyword median it does support the keyword percentile and this is how it currently returns the value.

  3. {'value': 50} --> 50?

  4. zero_to_one is a book :)

  5. Should we keep it simple and just divide by 100 and that's it?

    (backend will bubble up the error anyway)

  6. 
      
  1. I see, Hive does not support median but supports percentile!

    We can see in next iteration if a DB supports the same UDF but it has a different name.

  2. 
      
Review request changed

Status: Closed (submitted)

Loading...