HUE-8726 [core] Switch file downloads to a separate process

Review Request #13662 - Created Feb. 28, 2019 and updated

Jean-Francois Desjeans Gauthier
hue
master
HUE-8726
hue
jgauthier
commit b72ac4ea973d5e8ddee1b39f60270d908e2e5362
Author: jdesjean <jgauthier@cloudera.com>
Date:   Thu Feb 28 16:49:15 2019 -0800

    HUE-8726 [core] Switch file downloads to a separate process

:100644 100644 858fc2bba2... 113a018e54... M	apps/filebrowser/src/filebrowser/views.py
:100644 100644 8ea7aa9006... 57c6d6245d... M	desktop/core/src/desktop/lib/rest/resource.py
:100644 100644 c752ae1784... 6ad85ea945... M	desktop/core/src/desktop/lib/wsgiserver.py
:100644 100644 1e938b8d1d... 1ddab9ac50... M	desktop/core/src/desktop/settings.py
:100644 100644 3306b032c3... 3c687ae69f... M	desktop/libs/hadoop/src/hadoop/fs/webhdfs.py

This is a basic commit that needs some additional work.
All file fetches and downloads are executed on a separate process which does not block the main django process.
Tested that it works correctly to load Hue and download files from HDFS. Downloaded 5x 1GB files concurrently + browsing Hue seem to be working well.

What's needed:
Implement S3 & ADLS.
Full load test.
Cleanup threads & processes
Configure # of threads & processes via config file

  1. Could we list the pro/cons of the technical options in order to for this? Any source of best practices? (e.g. https://stackoverflow.com/questions/1156246/having-django-serve-downloadable-files
    https://www.allbuttonspressed.com/projects/django-filetransfers)
    Especially that we are now bring a tasks server, are moving into a container world, seems tricky do bring a brand new dedicated process.

    1. 1) AFAIK, django-filetransfers only works on public files. I'll investigate if its possible to make it secure.
      2) The other concern is the storage always going to be web accessible to the client? (e.g not hidden behind VPN)

  2. 
      
Review request changed
Loading...