Which data structure should I use to search a string from CSV?

后端 未结 2 751
醉酒成梦
醉酒成梦 2021-01-16 00:05

I have a csv file with nearly 200000 rows containing two columns- name & job. The user then inputs a name, say user_name, and I have to search the entire csv to find the

2条回答
  •  醉梦人生
    2021-01-16 01:05

    If you are unable to use a commercial database then you are going to have to write code to mimic some of a database's functionality.

    To search the entire dataset sequentially in O(n) time you just read it and search each line. If you write a program that loads the data into an in-memory Map, you could search the Map in amortized O(1) time but you'd still be loading it into memory each time, which is an O(n) operation, gaining you nothing.

    So the next approach is to build a disk-based index of some kind that you can search efficiently without reading the entire file, and then use the index to tell you where the record you want is located. This would be O(log n), but now you are at significant complexity, building, maintaining and managing the disk-based index. This is what database systems are optimized to do.

    If you had 200 MILLION rows, then the only feasible solution would be to use a database. For 200 THOUSAND rows, my recommendation is to just scan the file each time (i.e. use grep or if that's not available write a simple program to do something similar).

    BTW, if your allusion to finding a "pattern" means you need to search for a regular expression, then you MUST scan the entire file every time since without knowing the pattern you cannot build an index.

    In summary: use grep

提交回复
热议问题