Most efficient and most Pythonic way to deep copy class instance and perform additional manipulation

问题

Say I have a Python class instance, here's a really simple example:

class Book:
    def __init__(self, genre):
        self.genre = genre
        self.author = self.title = None
        return

    def assign_author(self, author_name):
        self.author = author_name
        return

    def assign_title(self, title_name):
        self.title = title_name
        return

western_book_template = Book("Western")

For various reasons, I want each class instance to be able to generate a deep copy of itself, and then perform some additional manipulation on the new object.

Here's what I'd like to know:

What would be the most efficient way to do that?
What about the most Pythonic one (best practice)?
Does the amount of data stored in the instance, as well as the amount of computations performed to get to that data, influence the previous answers? If so, how?

I know I can use western_book_1 = copy.deepcopy(western_book_template), and then perform all additional manipulation I want on western_book_1 directly, but wouldn't it be better to do something like:

class Book:
    def __init__(self, genre):
        self.genre = genre
        self.author = self.title = None
        return

    def assign_author(self, author_name):
        self.author = author_name
        return

    def assign_title(self, title_name):
        self.title = title_name
        return

    def generate_specific_book(self, author_name, title_name):
        specific_book = Book(self.genre)
        specific_book.assign_author(author_name)
        specific_book.assign_title(title_name)
        return specific_book

western_book_template = Book("Western")
western_book_1 = western_book_template.generate_specific_book("John E. Williams", "Butcher's Crossing")

This would give me more control over what gets copied and what doesn't, as well as allowing me to perform all additional manipulation in one place. I am assuming the above is inefficient in cases where a lot of computation was needed to get to the data stored in the instance.

回答1:

You may be looking for the cls.__new__() method. It allows you to control instance creation. You can call this function to create a blank instance, then update the dict of this instance with the dict of your original instance:

Here's an example:

class A:
    def __init__(self, a):
        self.a = a

    def newA(self):
        b = type(self).__new__(type(self))
        b.__dict__.update(self.__dict__)
        return b


x = A(42)
x.a # 42
y = x.newA()
y.a # 42

This is very different from a deepcopy, but I'm suggesting it because of your example and the way you worded your question. No copying is actually being done--the new instance contains references to the same data as the old instance, making it susceptible to side-effects. but I offer it because in the example you gave, the new class you created was based on self.genre, which indicates that you aren't as concerned about whether or not the data in the new instance is a copy so much as about the computational cost of generating the new instance.

回答2:

Use patterns matter a lot. If I'm just starting to think about efficiency for deep copies of large objects in a project, I rather enjoy lenses. The idea is that instead of interacting with a Book directly you'll interact with something that's acts like a Book and internally keeps track of immutable deltas and changesets when you modify it.
This is up for debate, but copy.deepcopy from the standard library is hard to beat in terms of pythonic code.
Absolutely, no questions.

3a. As a somewhat pathological example, if your class is only storing a few bytes then straight up copying the entire class isn't any more expensive than any modification you were about to do to it. Your deepcopy strategy would be to just copy it and do what you need to do -- the pythonic solution would also be the "best" solution by most metrics.

3b. If you're storing 10**10 bytes per class and only modifying 5 then a full copy is probably incredibly expensive compared to the work you're doing. Lenses can look really attractive (since they'll just hold a reference to the original object and the 5 byte change).

3c. You mentioned data access times as a potential factor. I'll use a long linked list as an example here -- both a full deepcopy and lenses will perform poorly here since they need to walk the entire list for any naive implementation. If you knew you needed to deepcopy linked lists though you could come up with some kind of immutable tree structure bisecting the list and use that to adopt a quasi-lenses strategy, only swapping out the leaves/branches of the tree that you change.

3*. Elaborating on the examples a little more, you're choosing a representation for a collection of objects in your code that can be conceptualized as a chain of deep copies and modifications. There are some classic approaches (copy all the data and make changes to the copies, leave the data alone and record all the changes) that work on arbitrary objects that have certain performance characteristics, but if you have additional information about the problem (e.g., these are all big arrays and only one coordinate changes) you can use that to design a representation that best fits your problem, and more importantly your desired performance trade-offs.

来源：https://stackoverflow.com/questions/62213488/most-efficient-and-most-pythonic-way-to-deep-copy-class-instance-and-perform-add

标签

python

oop

deep-copy