HUE-7258 [jb] Properly fetch YARN Spark job logs

Review Request #11800 - Created Oct. 18, 2017 and submitted

Ying Chen
hue
master
HUE-7258
hue
enricoberti, jgauthier, johan, krish, romain, weixia
commit dc05d52e78531f02851968796c35977264886cdf 
Author: Ying Chen <yingchen@cloudera.com>
Date:   Wed Oct 18 12:08:36 2017 -0700

    HUE-7258 [jb] Properly fetch YARN Spark job logs

:100644 100644 ad51e1f298... 330e039fb5... M    apps/jobbrowser/src/jobbrowser/api.py
:100644 100644 06a8c070f0... 91e594eea2... M    apps/jobbrowser/src/jobbrowser/apis/job_api.py
:100644 100644 906ec7df32... a3910c0ccc... M    apps/jobbrowser/src/jobbrowser/templates/job_browser.mako
:100644 100644 3623cb8ed7... 56869ceec8... M    apps/jobbrowser/src/jobbrowser/tests.py
:100644 100644 4a14c62080... cd2eb22804... M    apps/jobbrowser/src/jobbrowser/views.py
:100644 100644 106f32ecf6... 24595ae20e... M    apps/jobbrowser/src/jobbrowser/yarn_models.py
:100644 100644 aa63b8d937... 4bf92b806d... M    desktop/libs/hadoop/src/hadoop/yarn/resource_manager_api.py
:100644 100644 81440ce6dc... 272969e405... M    desktop/libs/hadoop/src/hadoop/yarn/spark_history_server_api.py


  • 2
  • 0
  • 35
  • 0
  • 37
Description From Last Updated
Need to monkey patch actually too? if not hasattr(job_api, 'old_SparkExecutorApi'): job_api.old_NativeYarnApi = job_api.SparkExecutorApi then revert in the tearDown Romain Rigaux
Would this work instead? from jobbrowser import api api.YarnApi = MockYarnApi Romain Rigaux
  1. Nice!

    Works for running and finished jobs?
    Do we have a link to the Spark UI on the 'Properties' page?

    1. When a spark job is still running, spark history server doesn't have it yet? I fixed a few exceptions. Should I hide the logs page and executors tabs?
      
      I still need to implement the breadcrumbs link in executor tab.
    2. In the Properties page, I update the url to the actual Spark UI link, but it doesn't display as hyperlink.
    3. Added breadcrumbs link in executor tab
  2. Import on top of file?

  3. Remove comment and add space afters commas?

  4. if job.metrics['executors']:

  5. 
      
  1. 
      
  2. Would it make more sense to use the Spark REST API actually?

    https://spark.apache.org/docs/1.6.2/monitoring.html#rest-api

    We already have https://github.com/cloudera/hue/blob/master/desktop/libs/hadoop/src/hadoop/yarn/spark_history_server_api.py

    Try plugging it instead?
    (then we can rename it SparkServerApi and instantiate it with the JHS URL or running application URL to make it work both when it runs and it finished)

    1. This is for spark executor, will there be something for spark stage or spark attempt later?
  3. 
      
  1. Nice!

    Mostly just some little nits.

    What would help a lot would be to add a new test class a bit similar to:
    https://github.com/cloudera/hue/blob/master/apps/jobbrowser/src/jobbrowser/tests.py#L358

    e.g.
    class TestSparkNoHadoop:

    and call the Spark API with some mocks to test getting a mocked running Job info + logs + log download?
    (to test the code paths of SparkExecutorApi, download_executors_logs, executors(self, job):)

    e.g. self.c.post('/jobbrowser/api/jobs/jobs', {app_id:"application_1513315932654_0002"
    interface:"jobs"})

  2. nit:
    return { --> return {

  3. nit: appId --> app_id

  4. Actually needed?
    (normally here we only want to open the good job on page refresh via the URL id)

    1. It intends to have a link user go back to Spark Job after s/he click into a executor. Trying to be consistent with other job navigate between job and attempts.
  5. Why not state here? (state of YARN app, not Spark job itself)

    https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Application_State_API

    1. https://github.com/cloudera/hue/blob/master/apps/jobbrowser/src/jobbrowser/yarn_models.py#L59

      Most of time, state and status are same, except state = Finished when status = SUCCEEDED, which is showing in Spark UI. Feels more consistent.

  6. nit:
    +self.attempt_id --> + self.attempt_id

  7. Same in Spark 2? (more used than 1.6 now)

  8. attampt list show --> attempt list shows

  9. nit:
    +job.attempt_id --> + job.attempt_id

  10. nit:

    if not job_filtered_json:

  11. Any way to merge the common code with above? (common _helper function maybe)
  12. 
      
  1. 
      
  2. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 6)
     
     

    json.dumps({app_id=......})

    ?

  3. 
      
  1. Really nice, almost there!

  2. apps/jobbrowser/src/jobbrowser/apis/job_api.py (Diff revisions 5 - 7)
     
     

    x --> executor?

    (easier to read)

  3. %s' % (smart_string(e))

    ?

  4. len(job.metrics['executors']) > 0 -->

    job.metrics['executors']

  5. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     

    nit: extra space

  6. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     

    Need to monkey patch actually too?

    if not hasattr(job_api, 'old_SparkExecutorApi'):
    job_api.old_NativeYarnApi = job_api.SparkExecutorApi

    then revert in the tearDown

    1. SparkExecutorApi is not a mock api.
  7. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     

    It works like this?

    (FYI usually easier to json.dumps({python dic}))

    1. This is the most troublesome part. I believe this related to encode data from multipart content. I captured the format of query_executor_data from debug mode. After that, data can be read from request.post.get('interface') in jobs function.
  8. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     

    add asserts that looks for some ids in the list of jobs?

  9. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     

    Any change to have a another call to a mocked spark job that is in RUNNING state?

  10. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     
     
  11. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 7)
     
     
     
     
  12. 
      
  1. 
      
  2. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 8)
     
     

    Would this work instead?

    from jobbrowser import api

    api.YarnApi = MockYarnApi

    1. In jobbrowser/apis/job_api.py, "from jobbrowser.api import YarnApi as NativeYarnApi"
      In jobbrowser/views.py, "from jobbrowser.api import get_api"
      It intends to replace NativeYarnApi to MockYarnApi at runtime.
  3. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 8)
     
     
     
     

    Move below

    def setUp(self):

    ?

  4. apps/jobbrowser/src/jobbrowser/tests.py (Diff revision 8)
     
     

    nit: new line before

  5. 
      
  1. Ship It!
  2. 
      
Review request changed

Status: Closed (submitted)

Loading...