How can I analyse ~13GB of data?

前端 未结 4 1431
梦谈多话
梦谈多话 2021-02-07 06:26

I have ~300 text files that contain data on trackers, torrents and peers. Each file is organised like this:

tracker.txt

time torrent
            


        
4条回答
  •  我在风中等你
    2021-02-07 07:09

    You state that your MySQL queries took too long. Have you ensured that proper indices are in place to support the kind of request you submitted? In your example, that would be an index for Peer.ip (or even a nested index (Peer.ip,Peer.id)) and an index for TorrentAtPeer.peer.

    As I understand you Java results, you have much data but not that many different strings. So you could perhaps save some time by assigning a unique number to each tracker, torrent and peer. Using one table for each, with some indexed value holding the string and a numeric primary key as the id. That way, all tables relating these entities would only have to deal with those numbers, which could save a lot of space and make your operations a lot faster.

提交回复
热议问题