[core] Spawning server should not be blocked by network calls

Review Request #2081 — Created May 16, 2012 and submitted

commit 2b7b8a7d87937c1beaac206342bff8ab20ecb432
Author: Romain Rigaux <romain@cloudera.com>
Date:   Wed May 16 11:03:24 2012 -0700

    [core] Spawning server should not be blocked by network calls

:100755 100755 10e54d8... ec52db3... M  apps/filebrowser/src/filebrowser/views.py
:100644 100644 515a7ae... 8962c09... M  apps/jobbrowser/src/jobbrowser/models.py
:000000 100644 0000000... 615ed21... A  desktop/core/src/desktop/lib/eventlet_util.py
:100644 100644 e69de29... 0ec7119... M  desktop/core/src/desktop/lib/rest/__init__.py
:100644 100644 446c67e... 3fedd36... M  desktop/core/src/desktop/lib/thrift_util.py
:100644 100644 580c7cc... 5375fac... M  desktop/core/src/desktop/management/commands/runspawningserver.py
Manual tests for WebHdfs calls
Unit tests ran

Testing Thrift client:
Not sure yet how to easily setup a fake compatible ThriftServer that can be called from a local Hue? (reuse SimpleThriftServer with JobTracker client maybe)

Patching urllib fails the tests and is not needed (urllib is used for quoting).
Removed unused urllib in filebrowser/view.py.
  • 2
  • 0
  • 1
  • 0
  • 3
Description From Last Updated
I don't understand the 2nd sentence. "We need to avoid this magic..." Does it mean that we haven't avoided it? ... bcwalrus bcwalrus
Since you're calling monkey_patch(), perhaps we can just monkey_patch everything and forget about the import_patch? Would that work? bcwalrus bcwalrus
  1. The fight is not over yet:
    * Let's also import the green version of socket and friends in runspawningserver.py. If Django uses a mysql database, every database lookup is a socket call.
    * Does this run correctly if we go back to the threaded cherrypy server? (desktop.conf.use_cherrypy_server = true)
    1. It might be safer to put the whole list of modules patched by eventlet?
      Yes cherrypy does not mind the patched libraries so far.
      And how about testing the ThiftClient?
    2. Right. We may want to put that in a central lib. SDK apps need to patch their imports as well.
      For thrift testing, you can set a breakpoint in the BeeswaxServer and watch the beeswax app stall.
    3. So by the point we arrive at the top of runspawningserver.py we already have these modules below. So we need to plug our import somewhere before at the very beginning or use a magic eventlet.monkey_patch(socket=True):
      ['ConfigParser', 'Queue', 'SocketServer', 'UserDict', '__builtin__', '__future__', '__main__', '__original_module_thread', '__original_module_threading', '_abcoll', '_bisect', '_codecs', '_collections', '_functools', '_hashlib', '_heapq', '_locale', '_random', '_socket', '_sre', '_ssl', '_struct', '_warnings', '_weakref', '_weakrefset', 'abc', 'about', 'about.settings', 'atexit', 'base64', 'beeswax', 'beeswax.conf', 'beeswax.desktop', 'beeswax.os', 'beeswax.settings', 'beeswax.sys', 'binascii', 'bisect', 'cPickle', 'cStringIO', 'codecs', 'collections', 'compiler', 'compiler.ast', 'compiler.cStringIO', 'compiler.compiler', 'compiler.consts', 'compiler.dis', 'compiler.future', 'compiler.imp', 'compiler.marshal', 'compiler.misc', 'compiler.os', 'compiler.parser', 'compiler.pyassem', 'compiler.pycodegen', 'compiler.struct', 'compiler.symbol', 'compiler.symbols', 'compiler.syntax', 'compiler.sys', 'compiler.token', 'compiler.transformer', 'compiler.types', 'compiler.visitor', 'compiler.warnings', 'configobj', 'copy', 'copy_reg', 'daemon', 'daemon.atexit', 'daemon.daemon', 'daemon.errno', 'daemon.lockfile', 'daemon.os', 'daemon.pidlockfile', 'daemon.resource', 'daemon.signal', 'daemon.socket', 'daemon.sys', 'daemon.version', 'daemon.version.version_info', 'datetime', 'decimal', 'desktop', 'desktop.appmanager', 'desktop.conf', 'desktop.daemon', 'desktop.desktop', 'desktop.django', 'desktop.exceptions', 'desktop.grp', 'desktop.lib', 'desktop.lib.codecs', 'desktop.lib.conf', 'desktop.lib.configobj', 'desktop.lib.daemon_utils', 'desktop.lib.desktop', 'desktop.lib.django', 'desktop.lib.grp', 'desktop.lib.i18n', 'desktop.lib.logging', 'desktop.lib.os', 'desktop.lib.paths', 'desktop.lib.pwd', 'desktop.lib.re', 'desktop.lib.sys', 'desktop.lib.textwrap', 'desktop.log', 'desktop.log.cStringIO', 'desktop.log.collections', 'desktop.log.desktop', 'desktop.log.log_buffer', 'desktop.log.logging', 'desktop.log.os', 'desktop.log.re', 'desktop.log.sys', 'desktop.logging', 'desktop.manage_entry', 'desktop.management', 'desktop.management.commands', 'desktop.management.commands.runspawningserver', 'desktop.management.commands.sys', 'desktop.optparse', 'desktop.os', 'desktop.pkg_resources', 'desktop.pwd', 'desktop.re', 'desktop.settings', 'desktop.signal', 'desktop.socket', 'desktop.stat', 'desktop.subprocess', 'desktop.supervisor', 'desktop.sys', 'desktop.threading', 'desktop.time', 'desktop.traceback', 'dis', 'distutils', 'distutils.ConfigParser', 'distutils.debug', 'distutils.dep_util', 'distutils.dist', 'distutils.distutils', 'distutils.email', 'distutils.errors', 'distutils.fancy_getopt', 'distutils.getopt', 'distutils.log', 'distutils.os', 'distutils.re', 'distutils.spawn', 'distutils.stat', 'distutils.string', 'distutils.sys', 'distutils.sysconfig', 'distutils.util', 'distutils.warnings', 'django', 'django.conf', 'django.conf.django', 'django.conf.global_settings', 'django.conf.os', 'django.conf.re', 'django.conf.time', 'django.core', 'django.core.exceptions', 'django.core.management', 'django.core.management.base', 'django.core.management.color', 'django.core.management.commands', 'django.core.management.commands.django', 'django.core.management.commands.os', 'django.core.management.commands.startapp', 'django.core.management.django', 'django.core.management.imp', 'django.core.management.optparse', 'django.core.management.os', 'django.core.management.sys', 'django.django', 'django.utils', 'django.utils.codecs', 'django.utils.datetime', 'django.utils.decimal', 'django.utils.django', 'django.utils.encoding', 'django.utils.functional', 'django.utils.importlib', 'django.utils.locale', 'django.utils.os', 'django.utils.re', 'django.utils.sys', 'django.utils.termcolors', 'django.utils.types', 'django.utils.urllib', 'django.utils.version', 'email', 'email.Charset', 'email.Encoders', 'email.Errors', 'email.FeedParser', 'email.Generator', 'email.Header', 'email.Iterators', 'email.MIMEAudio', 'email.MIMEBase', 'email.MIMEImage', 'email.MIMEMessage', 'email.MIMEMultipart', 'email.MIMENonMultipart', 'email.MIMEText', 'email.Message', 'email.Parser', 'email.Utils', 'email.base64MIME', 'email.email', 'email.mime', 'email.quopriMIME', 'email.sys', 'encodings', 'encodings.__builtin__', 'encodings.aliases', 'encodings.codecs', 'encodings.encodings', 'encodings.utf_8', 'errno', 'eventlet', 'eventlet.Queue', 'eventlet.collections', 'eventlet.convenience', 'eventlet.errno', 'eventlet.event', 'eventlet.eventlet', 'eventlet.green', 'eventlet.green._socket_nodns', 'eventlet.green.errno', 'eventlet.green.eventlet', 'eventlet.green.os', 'eventlet.green.select', 'eventlet.green.socket', 'eventlet.green.ssl', 'eventlet.green.sys', 'eventlet.green.time', 'eventlet.green.warnings', 'eventlet.greenio', 'eventlet.greenlet', 'eventlet.greenpool', 'eventlet.greenthread', 'eventlet.heapq', 'eventlet.hubs', 'eventlet.hubs.eventlet', 'eventlet.hubs.os', 'eventlet.hubs.sys', 'eventlet.hubs.timer', 'eventlet.imp', 'eventlet.itertools', 'eventlet.os', 'eventlet.patcher', 'eventlet.queue', 'eventlet.semaphore', 'eventlet.socket', 'eventlet.support', 'eventlet.support.__builtin__', 'eventlet.support.eventlet', 'eventlet.support.greenlet', 'eventlet.support.greenlets', 'eventlet.support.sys', 'eventlet.sys', 'eventlet.time', 'eventlet.timeout', 'eventlet.traceback', 'eventlet.warnings', 'exceptions', 'fcntl', 'filebrowser', 'filebrowser.settings', 'fnmatch', 'functools', 'gc', 'genericpath', 'getopt', 'gettext', 'greenlet', 'grp', 'hadoop', 'hadoop.conf', 'hadoop.desktop', 'hadoop.fnmatch', 'hadoop.logging', 'hadoop.os', 'hashlib', 'heapq', 'help', 'help.conf', 'help.settings', 'imp', 'itertools', 'jobbrowser', 'jobbrowser.conf', 'jobbrowser.desktop', 'jobbrowser.os', 'jobbrowser.settings', 'jobbrowser.sys', 'jobsub', 'jobsub.conf', 'jobsub.desktop', 'jobsub.os', 'jobsub.settings', 'keyword', 'linecache', 'locale', 'lockfile', 'logging', 'logging.ConfigParser', 'logging.SocketServer', 'logging.atexit', 'logging.cPickle', 'logging.cStringIO', 'logging.codecs', 'logging.config', 'logging.handlers', 'logging.logging', 'logging.os', 'logging.re', 'logging.socket', 'logging.stat', 'logging.struct', 'logging.sys', 'logging.thread', 'logging.threading', 'logging.time', 'logging.traceback', 'logging.types', 'logging.warnings', 'logging.weakref', 'logilab', 'marshal', 'math', 'modulefinder', 'numbers', 'opcode', 'operator', 'optparse', 'os', 'os.path', 'parser', 'paste', 'paste.modulefinder', 'paste.pkg_resources', 'pickle', 'pkg_resources', 'pkgutil', 'posix', 'posixpath', 'proxy', 'proxy.conf', 'proxy.desktop', 'proxy.os', 'proxy.re', 'proxy.settings', 'proxy.sys', 'pwd', 'random', 're', 'resource', 'select', 'shell', 'shell.conf', 'shell.constants', 'shell.desktop', 'shell.eventlet', 'shell.logging', 'shell.settings', 'shell.shell', 'shell.utils', 'signal', 'site', 'sitecustomize', 'socket', 'sre_compile', 'sre_constants', 'sre_parse', 'ssl', 'stat', 'string', 'strop', 'struct', 'subprocess', 'symbol', 'sys', 'tempfile', 'textwrap', 'thread', 'threading', 'time', 'token', 'traceback', 'types', 'urllib', 'urlparse', 'useradmin', 'useradmin.conf', 'useradmin.desktop', 'useradmin.settings', 'warnings', 'weakref', 'zipimport', 'zlib', 'zope', 'zope.pkg_resources']
  2. desktop/core/src/desktop/lib/rest/__init__.py (Diff revision 1)
    Should these become "from eventlet.green import ..."? Would that still work?
    1. Yes it works but actually if I don't put them it works too.
      eventlet.import_patched('urllib2') seems to do all the work (and it won't work without it). 
      Maybe we can drop them.
  2. apps/jobbrowser/src/jobbrowser/models.py (Diff revision 2)
    This location is a bit more random than I'd like. Would it work if we put it in jobbrowser/__init__.py?
    1. Yes, it is cleaner.
  3. Copyright header missing.
  4. desktop/core/src/desktop/lib/eventlet_util.py (Diff revision 2)
    I don't understand the 2nd sentence. "We need to avoid this magic..."
    Does it mean that we haven't avoided it? And why is that? Note that the very first file that gets executed when you do `build/env/bin/hue ...' is desktop/core/src/desktop/manage_entry.py. Perhaps we can patch it there?
    1. I reworded it. I tried to say that it would be a better practice to avoid the monkey patching.
      I tried in desktop/core/src/desktop/manage_entry.py / desktop/core/__init__.py but it did not work. It probably loads the __init__.py of other modules before.
  5. Since you're calling monkey_patch(), perhaps we can just monkey_patch everything and forget about the import_patch? Would that work?
    1. So I tried but I could not make it work...
  1. I'm no longer familiar with the implementation, but a quick, non-binding comment: the only reason we have spawning in here at all is to support the shell app.  The shell app requires long-lived requests that are deemed too expensive for a thread.  Otherwise, we were happy with multi-threaded behavior and performance.  Originally, we'd planned to enable 'spawning' to be in 'mixed' mode, wherein it used eventlet stuff for the select in the shell app, but was otherwise doing threaded stuff.  The documentation (quoted below) suggests that this is possible.  It's possible that eventlet patches enough that Thrift, urllib (HTTP), and MySQL all behave correctly, but it might be just as easy to allow more than one thread too.  Anyway, something to keep in mind as a possibility.
    From: http://pypi.python.org/pypi/Spawning
    Single or Multiple Worker Thread
    If your wsgi applications perform a certain subset of blocking calls which have been monkeypatched by eventlet to cooperate instead (such as operations in the socket module), you can configure each process to run only a single main thread and cooperate using eventlet's green threads instead. This can be useful if your application needs to scale to a large number of simultaneous open connections, such as a COMET server or an application which uses AJAX polling. However, most existing wsgi applications will probably perform blocking operations (for example, calling database adapter libraries which perform blocking socket operations). Therefore, for most wsgi applications a combination of multiple processes and multiple threads will be ideal.
    1. Thanks for the recommendation. We tried that one -- multithreading but not multi-processing. (Multi-processing is a rather risky change that requires us to re-evaluate all shared states. So I ruled it out for now.) With multi-threading, there is no more cooperative scheduling. It's back to the old-school one-thread-per-request mode, which is no good for Shell. This is confirmed by Aditya and by empirical results.
      For the short term fix, I'm ok with patching. When we have time to thoroughly test it, I'd like to turn on multi-processing as you pointed out.
Review request changed

Status: Closed (submitted)