Adding celery to the system

Review Request #13627 - Created Feb. 15, 2019 and updated

Prakash Ranade
hue
master
hue
jgauthier, johan, ranade, romain, subrata, weixia, yingc
commit 00df0f92afdc85d9bf4b9e7d369ed26343d40c3b
Author: Prakash Ranade <ranade@cloudera.com>
Date:   Wed Feb 13 18:16:32 2019 -0800

    Adding celery to the Hue app
    
    (cherry picked from commit 0f14407ec7c826a00fad57195864cf2f21ba7615)

:000000 100644 0000000000 467bc335a6 A	apps/useradmin/src/useradmin/tasks.py
:100644 100644 3f38a273b7 9af4914a3f M	apps/useradmin/src/useradmin/views.py
:100644 100644 25dfeca8e2 d7a17fb869 M	desktop/core/src/desktop/settings.py
:000000 100644 0000000000 bfb2056f7c A	desktop/libs/notebook/src/notebook/tasks.py
:100644 100644 aea629f4d6 b3216cc133 M	desktop/libs/notebook/src/notebook/views.py

tested file download through celery worker.

  • 4
  • 0
  • 0
  • 0
  • 4
Description From Last Updated
Could all of this be configurable via hue.ini? Romain Rigaux
Here seems like you connect to Hue to download the query result, which defeats the whole purpose of doing the ... Romain Rigaux
Instead of a __main__, could we have a tasks_tests.py instead? Romain Rigaux
Could we avoid full duplication of the methods? e.g. if task server is enabled, we return the task_id, if not, ... Romain Rigaux
Review request changed

Testing Done:

  +

tested file download through celery worker.

  1. 
      
  2. desktop/core/src/desktop/settings.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    Could all of this be configurable via hue.ini?

  3. Here seems like you connect to Hue to download the query result, which defeats the whole purpose of doing the download in a task.

    Would recommend to focus on the HDFS file compression/extraction instead. The full notebook API (exec, fetch status, download, profile...) will be ported to the task system by JF.

    1. tasks.py runs in a separate process. It does not connect to Hue and simply reuses the code. While we need to be careful on which method we call from Celery, because of side effects, this particular method looks ok.

    2. Nice, thought it was notebook api. And interesting if there is no issue to connect to an Impala session via another

    3. Currently this method will reuse the session / query id. I think we need to consider if we want to create a new one.

  4. desktop/libs/notebook/src/notebook/tasks.py (Diff revision 1)
     
     
     
    Instead of a __main__, could we have a tasks_tests.py instead?
  5. desktop/libs/notebook/src/notebook/views.py (Diff revision 1)
     
     
     

    Could we avoid full duplication of the methods?

    e.g.
    if task server is enabled, we return the task_id, if not, we do like before?

  6. 
      
  1. Would recommend to split the reviews into:
    - celery task infra and ini configuration
    - any other task (e.g. compress etc..)

  2. 
      
Loading...