HUE-7252 [importer] CSV with quoted new lines in fields are not handled propertly

Review Request #11514 - Created Sept. 11, 2017 and submitted

Information
Romain Rigaux
hue
master
HUE-7252
Reviewers
hue
enricoberti, jgauthier, johan, krish, ranade, weixia, yingc
commit 779dddb4c5b7b5f370a6c2d19018d308283fc747
Author: Romain Rigaux <romain@cloudera.com>
Date:   Mon Aug 28 09:08:52 2017 +0300

    HUE-7252 [importer] CSV with quoted new lines in fields are not handled propertly

:100644 100644 58f17fc... 090a93e... M	desktop/libs/indexer/src/indexer/file_format.py
:100644 100644 fe2dae4... 936f9f9... M	desktop/libs/indexer/src/indexer/indexers/morphline.py

manual

The idea is to use the CSV sniffer for reading the data the first time and guess the separators (get_format).
The second time we call it (get_sample) we should override the separators with the one picked in the UI (they have been prefilled via step 1).

Previously we just splited on the record separator (new line) and so we could not support quoted CSV with new lines in some fields.

Enrico Berti
Romain Rigaux
Review request changed

Status: Closed (submitted)

Loading...