I have ~300 text files that contain data on trackers, torrents and peers. Each file is organised like this:
tracker.txt
time torrent
I would give MySQL another try but with a different schema:
use natural primary keys here:
Peer: ip, port
Torrent: infohash
Tracker: url
TorrentPeer: peer_ip, torrent_infohash, peer_port, time
TorrentTracker: tracker_url, torrent_infohash, time
use innoDB engine for all tables
This has several advantages:
TorrentAtPeer
directly contains the peer ip
as foreign key to the peer table. If you need to query the torrents used by peers in a subnetwork you can now do this without using a join, because all relevant data is in the linking table.If you want the torrent count per peer and you want the peer's ip in the results too then we again have an advantage when using natural primary/foreign keys here.
With your schema you have to join to retrieve the ip:
SELECT Peer.ip, COUNT(DISTINCT torrent)
FROM TorrentAtPeer, Peer
WHERE TorrentAtPeer.peer = Peer.id
GROUP BY Peer.ip;
With natural primary/foreign keys:
SELECT peer_ip, COUNT(DISTINCT torrent)
FROM TorrentAtPeer
GROUP BY peer_ip;
EDIT
Well, original posted schema was not the real one. Now the Peer
table has a port
field. I would suggest to use primary key (ip, port) here and still drop the id column. This also means that the linking table needs to have multicolumn foreign keys. Adjusted the answer ...