merge sort in python

后端未结

关注

 3  1587

栀梦 2021-02-06 19:26

basically I have a bunch of files containing domains. I\'ve sorted each individual file based on its TLD using .sort(key=func_that_returns_tld)

now that I\'ve done that

3条回答

梦毁少年i (楼主)

2021-02-06 20:22

If your files are not very large, then simply read them all into memory (as S. Lott suggests). That would definitely be simplest.

However, you mention collation creates one "massive" file. If it's too massive to fit in memory, then perhaps use heapq.merge. It may be a little harder to set up, but it has the advantage of not requiring that all the iterables be pulled into memory at once.

import heapq
import contextlib

class Domain(object):
    def __init__(self,domain):
        self.domain=domain
    @property
    def tld(self):
        # Put your function for calculating TLD here
        return self.domain.split('.',1)[0]
    def __lt__(self,other):
        return self.tld<=other.tld
    def __str__(self):
        return self.domain

class DomFile(file):
    def next(self):
        return Domain(file.next(self).strip())

filenames=('data1.txt','data2.txt')
with contextlib.nested(*(DomFile(filename,'r') for filename in filenames)) as fhs:
    for elt in heapq.merge(*fhs):
        print(elt)

with data1.txt:

google.com
stackoverflow.com
yahoo.com

and data2.txt:

standards.freedesktop.org
www.imagemagick.org

yields:

google.com
stackoverflow.com
standards.freedesktop.org
www.imagemagick.org
yahoo.com

0 讨论(0)

查看其它3个回答