HUE-7252 [importer] CSV with quoted new lines in fields are not handled propertly
Review Request #11514 - Created Sept. 11, 2017 and submitted
|enricoberti, jgauthier, johan, krish, ranade, weixia, yingc|
commit 779dddb4c5b7b5f370a6c2d19018d308283fc747 Author: Romain Rigaux <firstname.lastname@example.org> Date: Mon Aug 28 09:08:52 2017 +0300 HUE-7252 [importer] CSV with quoted new lines in fields are not handled propertly :100644 100644 58f17fc... 090a93e... M desktop/libs/indexer/src/indexer/file_format.py :100644 100644 fe2dae4... 936f9f9... M desktop/libs/indexer/src/indexer/indexers/morphline.py
The idea is to use the CSV sniffer for reading the data the first time and guess the separators (get_format).
The second time we call it (get_sample) we should override the separators with the one picked in the UI (they have been prefilled via step 1).
Previously we just splited on the record separator (new line) and so we could not support quoted CSV with new lines in some fields.