univocity

Spark java.lang.NoSuchMethodError for Univocity CSV Parser setDelimiter method

南楼画角 提交于 2020-01-16 09:48:47
问题 I'm trying to run a Scala Spark job that uses the Univocity CSV Parser and after upgrading to support a String delimiter (vs only character), I'm getting the following error when I run my jar in the cluster. Running it locally in my IDEA IDE produces expected results with no errors. ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvFormat.setDelimiter(Ljava/lang/String;)V java.lang.NoSuchMethodError: com.univocity.parsers.csv

Univocity CSV parser multiple beans with multiple rows in single CSV

大憨熊 提交于 2020-01-05 08:32:34
问题 Given the following classes public class Inventory { private InventoryHeader header; private List<InventoryLine> lines; } public class InventoryHeader { private String date; private boolean isCurrent; } public class InventoryLine { private String itemName; private int quantity; } and the following CSV (using ',' as the delimiter but for visibility's sake I used spaces here): IH 2007-06-05 false IL Watch 7 IL Flower Pot 9 IL Chicken Wing 29 IH 2010-07-30 true IL Cable 200 IL Fish Tank 87 In

Univocity CSV parser multiple beans with multiple rows in single CSV

随声附和 提交于 2020-01-05 08:31:53
问题 Given the following classes public class Inventory { private InventoryHeader header; private List<InventoryLine> lines; } public class InventoryHeader { private String date; private boolean isCurrent; } public class InventoryLine { private String itemName; private int quantity; } and the following CSV (using ',' as the delimiter but for visibility's sake I used spaces here): IH 2007-06-05 false IL Watch 7 IL Flower Pot 9 IL Chicken Wing 29 IH 2010-07-30 true IL Cable 200 IL Fish Tank 87 In

Univocity - How to return one bean per row using iterator style?

左心房为你撑大大i 提交于 2019-12-29 05:36:07
问题 Introduction I am building a process to merge a few big sorted csv files. I am currently looking into using Univocity to do this. The way I setup the merge is to use beans that implement comparable interface. Given The simplified file looks like this: id,data 1,aa 2,bb 3,cc The bean looks like this (getters and setters ommited): public class Address implements Comparable<Address> { @Parsed private int id; @Parsed private String data; @Override public int compareTo(Address o) { return Integer

Error while reading very large files with spark csv package

佐手、 提交于 2019-12-23 18:36:42
问题 We are trying to read a 3 gb file which has multiple new line character in one its column using spark-csv and univocity 1.5.0 parser, but the file is getting split in the multiple column in some row on the basis of newline character. This scenario is occurring in case of large file. We are using spark 1.6.1 and scala 2.10 Following code i'm using for reading the file : sqlContext.read .format("com.databricks.spark.csv") .option("header", "true") .option("inferSchema", "true") .option("mode",

Handling “”, “-” CSV with Univocity

女生的网名这么多〃 提交于 2019-12-13 17:15:15
问题 Any idea how I can get proper lines? some lines are getting glued, and I can't figure out how to stop it or why. col. 0: Date col. 1: Col2 col. 2: Col3 col. 3: Col4 col. 4: Col5 col. 5: Col6 col. 6: Col7 col. 7: Col7 col. 8: Col8 col. 0: 2017-05-23 col. 1: String col. 2: lo rem ipsum col. 3: dolor sit amet col. 4: mcdonalds.com/online.html col. 5: null col. 6: "","-""-""2017-05-23" col. 7: String col. 8: lo rem ipsum col. 9: dolor sit amet col. 10: burgerking.com col. 11: https://burgerking

uniVocity doesn't parse the first column into beans

人走茶凉 提交于 2019-12-12 09:46:32
问题 I'm trying to read CSV files from GTFS.zip with help of uniVocity-parsers and run into an issue that I can't figure out. For some reason it seems the first column of some CSV files won't be parsed correctly. For example in the "stops.txt" file that looks like this: stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station "de:3811:30215:0:6","Freiburg Stübeweg","48.0248455941735","7.85563688037231","","Parent30215" "de:8311:30054:0:1","Freiburg Schutternstraße","48.0236251356332","7

Univocity - parse each TSV file row to different Type of class object

狂风中的少年 提交于 2019-12-08 07:48:54
问题 I have a tsv file which has fixed rows but each row is mapped to different Java Class. For example. recordType recordValue1 recordType recordValue1 recordValue2 for First row I have follofing class: public class FirstRow implements ItsvRecord { @Parsed(index = 0) private String recordType; @Parsed(index = 1) private String recordValue1; public FirstRow() { } } and for second row I have: public class SecondRow implements ItsvRecord { @Parsed(index = 0) private String recordType; @Parsed(index

uniVocity doesn't parse the first column into beans

微笑、不失礼 提交于 2019-12-06 03:04:26
I'm trying to read CSV files from GTFS.zip with help of uniVocity-parsers and run into an issue that I can't figure out. For some reason it seems the first column of some CSV files won't be parsed correctly. For example in the "stops.txt" file that looks like this: stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station "de:3811:30215:0:6","Freiburg Stübeweg","48.0248455941735","7.85563688037231","","Parent30215" "de:8311:30054:0:1","Freiburg Schutternstraße","48.0236251356332","7.72434519425597","","Parent30054" "de:8311:30054:0:2","Freiburg Schutternstraße","48.0235446600679","7