FLUME-292: Refactor OutputFormat and OutputFormatBuilder to better support plugin authoring

Review Request #1085 - Created Oct. 24, 2010 and submitted

Eric Sammer
jon, phunt
*WARNING* This patch requires FLUME-195 and assumes it is applied. Since FLUME-195 has not yet been committed to master, I've pushed my flume-195 branch for reference to the Cloudera repo.

This is a refactoring / cleaning of OutputFormat and OutputFormatBuilder to better facilitate output format plugin development and how FormatFactory registers and deals with plugins.

* OutputFormatBuilder is now the factory and "descriptor" of the output format. It is responsible for describing the instances it creates. The only property that all OutputFormats have in common currently is their name. To prevent duplication and "drift" of OF naming, one should now get this from the builder. If the builder instance is not available...
* OutputFormat now supports having a reference to its constituent OutputFormatBuilder injected into it during construction. By convention, all OFBs now invoke format.setBuilder(this).
* FormatFactory now keeps core formats (builtins) in a list rather than a map and builds the map as a cache. This deals with some of the double instantiation issues in FLUME-195.
* AbstractOutputFormat has been introduced to make OutputFormat implementation authoring easier. It simply implements OutputFormat, provides the {get,set}Builder() methods, and marks build() abstract to force derived classes to deal with it.
* Existing OFs have been retrofitted to extend AbstractOutputFormat. We could also do the same for input formats and an input / output delegating hybrid, but there aren't many input formats right now.
* Ran all tests; no new failures.
Review request changed
  2. Should this be writing to System.out or to a log?
  3. Same question about using System.out
  4. This should be using the SLF4J API now ... please?
  5. Can you try making this use SLF4J as well and also private static final instead of final public static?
  1. I meant to also suggest in my last review:
     * It would be good to have examples of this in the plugin directory along side the current HelloWorld sink and source.
     * Similarly, some documentation would be good (although that is somewhat related to FLUME-315 about a Developer Guide).
  1. A few nits and a few questions.
    I like the move of getName into the builder for the naming drift issues and becuae it still allows for synonyms from the user point of view.  
    But adding the reference to the builder seems like overkill since at the moment.  It essentailly just wants a name string adds a minor circular dependency.  Are there thoughts about the OFB having more fields?  If not, would just adding a name field to the OF constructor suffice?
    Not sure I see the benefit of chaining the baked in formats into a list.  What was the issue in FLUME-195?
  2. combine lines 90,94?
  3. combine lines 89, 93
  4. combine?
    1. This is a stylistic habit adopted due to years of C I'm willing to break if you object to it. I strongly dislike inline declaration as I think it makes it difficult to read. Let's discuss off line.
    2. yes, let's discuss off line.
    1. It's a duplication of naming. OutputFormatBuilders are self describing so by having this be a map, we create another place where names can drift. By simply adding the builder, we force the rest of the code to consult the builder's idea of the name.
    2. ok, I buy this.
  5. combine 46, 50
  6. Combine 74,78
  7. combine 46,50
  8. nit: combine lines 90, 94
  9. combine 89,93
  10. combine 88,92
    1. I actually didn't add this - this is how it is today. I'm happy to change it while I'm in here, though.
    2. yup, there are a handful of places from the early days where we weren't consistent about this.  fix broken windows (or out of style windows) in this case.. 
  11. I'm fine with removing this.
  12. combine 60,62
  13. src/java/com/cloudera/flume/handlers/text/SyslogEntryFormat.java (Diff revision 2)
    combine 150, 159
  14. why have this?  The dependency feels backwards.
    1. This is more of a nomenclature issue. I've effectively hijacked the Builder and made it an "OutputFormatType" or "OutputFormatDescriptor." If you think about it that way, instances understand their type and it doesn't feel backwards. I could also just inject the name into the output format if that's preferred but I saw this as an (admittedly theoretical) advantage in that now, given an output format, you can create similar OFs.
    2. I think I'd prefer just copying the name (fewer dependencies, slightly simpler) but I could see some potential uses in the future for this (python style keyword args metadata when we add user specifiable output format parameters).
      Your call.
  15. combine 46,50
  1. Fix the nits, your call on the setOutputFormatBuilder().
    Please add jira's based on Bruce's comments!