python numpy and memory efficiency (pass by reference vs. value)

前端 未结 2 587
陌清茗
陌清茗 2021-02-05 13:26

I\'ve recently been using python more and more in place of c/c++ because of it cuts my coding time by a factor of a few. At the same time, when I\'m processing large amounts of

相关标签:
2条回答
  • 2021-02-05 13:36

    So i'm going to have to quote EOL on this because I think his answer is very relevant:

    3) The last point is related to the question title: "passing by value" and "passing by reference" are not concepts that are relevant in Python. The relevant concepts are instead "mutable object" and "immutable object". Lists are mutable, while numbers are not, which explains what you observe. Also, your Person1 and bar1 objects are mutable (that's why you can change the person's age). You can find more information about these notions in a text tutorial and a video tutorial. Wikipedia also has some (more technical) information. An example illustrates the difference of behavior between mutable and immutable - answer by EOL

    In general I've found Numpy/Scipy follow these; more importantly they tell you explicitly in the docs what is happening.

    For example np.random.shuffle asks for an input array and returns None while np.random.permutation returns an array. You can clearly see which one returns a value versus doesn't here.

    Simiarly arrays have pass-by-reference semantics and in general I find Numpy/Scipy to be very efficient.

    I think it's fair to say that if it's faster to use pass-by-reference they will. As long as you use the functions the way the docs say, you shouldn't have significant problems with regards to speed.


    is there any type in specific you are asking about?

    0 讨论(0)
  • 2021-02-05 13:37

    Objects in python (and most mainstream languages) are passed as reference.

    If we take numpy, for example, "new" arrays created by indexing existing ones are only views of the original. For example:

    import numpy as np
    
    >>> vec_1 = np.array([range(10)])
    >>> vec_1
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> vec_2 = vec_1[3:] # let vec_2 be vec_1 from the third element untill the end
    >>> vec_2
    array([3, 4, 5, 6, 7, 8, 9])
    >>> vec_2[3] = 10000
    array([3, 4, 5, 10000, 7, 8, 9])
    >>> vec_1
    array([0, 1, 2, 3, 4, 5, 10000, 7, 8, 9])
    

    Numpy have a handy method to help with your questions, called may_share_memory(obj1, obj2). So:

    >>> np.may_share_memory(vec_1, vec_2)
    True
    

    Just be carefull, because it`s possible for the method to return false positives (Although i never saw one).

    At SciPy 2013 there was a tutorial on numpy (http://conference.scipy.org/scipy2013/tutorial_detail.php?id=100). At the end the guy talks a little about how numpy handles memory. Watch it.

    As a rule of thumb, objects are almost never passed as value by default. Even the ones encapsulated on another object. Another example, where a list makes a tour:

    Class SomeClass():
    
        def __init__(a_list):
            self.inside_list = a_list
    
        def get_list(self):
            return self.inside_list
    
    >>> original_list = range(5)
    >>> original_list
    [0,1,2,3,4]
    >>> my_object = SomeClass(original_list)
    >>> output_list = my_object.get_list()
    >>> output_list
    [0,1,2,3,4]
    >>> output_list[4] = 10000
    >>> output_list
    [0,1,2,3,10000]
    >>> my_object.original_list
    [0,1,2,3,10000]
    >>> original_list
    [0,1,2,3,10000]
    

    Creepy, huh? Using the assignment symbol ("="), or returning one in the end of a function you will always create a pointer to the object, or a portion of it. Objects are only duplicated when you explicitly do so, using a copy method like some_dict.copy, or array[:]. For example:

    >>> original_list = range(5)
    >>> original_list
    [0,1,2,3,4]
    >>> my_object = SomeClass(original_list[:])
    >>> output_list = my_object.get_list()
    >>> output_list
    [0,1,2,3,4]
    >>> output_list[4] = 10000
    >>> output_list
    [0,1,2,3,10000]
    >>> my_object.original_list
    [0,1,2,3,10000]
    >>> original_list
    [0,1,2,3,4]
    

    Got it?

    0 讨论(0)
提交回复
热议问题