AKA - What\'s this obsession with pointers?
Having only really used modern, object oriented languages like ActionScript, Java and C#, I don\'t really understand the
I'm currently waist-deep in designing some high level enterprise software in which chunks of data (stored in an SQL database, in this case) are referenced by 1 or more other entities. If a chunk of data remains when no more entities reference it, we're wasting storage. If a reference points so data that's not present, that's a big problem too.
There's a strong analogy to be made between our issues, and those of memory management in a language that uses pointers. It's tremendously useful to be able to talk to my colleagues in terms of that analogy. Not deleting unreferenced data is a "memory leak". A reference that goes nowhere is a "dangling pointer". We can choose explicit "frees", or we can implement "garbage collection" using "reference counting".
So here, understanding low-level memory management is helping design high-level applications.
In Java you're using pointers all the time. Most variables are pointers to objects - which is why:
StringBuffer x = new StringBuffer("Hello");
StringBuffer y = x;
x.append(" boys");
System.out.println(y);
... prints "Hello boys" and not "Hello".
The only difference in C is that it's common to add and subtract from pointers - and if you get the logic wrong you can end up messing with data you shouldn't be touching.
You need them if you want to generate "objects" at runtime without pre allocate memory on the stack
I am always distressed by the focus on such things as pointers or references in high-level languages. It's really useful to think at a higher level of abstraction in terms of the behavior of objects (or even just functions) as opposed to thinking in terms of "let me see, if I send the address of this thing to there, then that thing will return me a pointer to something else"
Consider even a simple swap function. If you have
void swap(int & a, int & b)
or
procedure Swap(var a, b : integer)
then interpret these to mean that the values can be changed. The fact that this is being implemented by passing the addresses of the variables is just a distraction from the purpose.
Same with objects --- don't think of object identifiers as pointers or references to "stuff". Instead, just think of them as, well, OBJECTS, to which you can send messages. Even in primitive languages like C++, you can go a lot further a lot faster by thinking (and writing) at as high a level as possible.
Write more than 2 lines of c or c++ and you'll find out.
They are "pointers" to the memory location of a variable. It is like passing a variable by reference kinda.
Since you have been programming in object-oriented languages, let me put it this way.
You get Object A instantiate Object B, and you pass it as a method parameter to Object C. The Object C modifies some values in the Object B. When you are back to Object A's code, you can see the changed value in Object B. Why is this so?
Because you passed in a reference of Object B to Object C, not made another copy of Object B. So Object A and Object C both hold references to the same Object B in memory. Changes from one place and be seen in another. This is called By Reference.
Now, if you use primitive types instead, like int or float, and pass them as method parameters, changes in Object C cannot be seen by Object A, because Object A merely passed a copy instead of a reference of its own copy of the variable. This is called By Value.
You probably already knew that.
Coming back to the C language, Function A passes to Function B some variables. These function parameters are natively copies, By Value. In order for Function B to manipulate the copy belonging to Function A, Function A must pass a pointer to the variable, so that it becomes a pass By Reference.
"Hey, here's the memory address to my integer variable. Put the new value at that address location and I will pick up later."
Note the concept is similar but not 100% analogous. Pointers can do a lot more than just passing "by reference". Pointers allow functions to manipulate arbitrary locations of memory to whatever value required. Pointers are also used to point to new addresses of execution code to dynamically execute arbitrary logic, not just data variables. Pointers may even point to other pointers (double pointer). That is powerful but also pretty easy to introduce hard-to-detect bugs and security vulnerabilities.
References in C++ are fundamentally different from references in Java or .NET languages; .NET languages have special types called "byrefs" which behave much like C++ "references".
A C++ reference or .NET byref (I'll use the latter term, to distinguish from .NET references) is a special type which doesn't hold a variable, but rather holds information sufficient to identify a variable (or something that can behave as one, such as an array slot) held elsewhere. Byrefs are generally only used as function parameters/arguments, and are intended to be ephemeral. Code which passes a byref to a function guarantees that the variable which is identified thereby will exist at least until that function returns, and functions generally guarantee not to keep any copy of a byref after they return (note that in C++ the latter restriction is not enforced). Thus, byrefs cannot outlive the variables identified thereby.
In Java and .NET languages, a reference is a type that identifies a heap object; each heap object has an associated class, and code in the heap object's class can access data stored in the object. Heap objects may grant outside code limited or full access to the data stored therein, and/or allow outside code to call certain methods within their class. Using a reference to calling a method of its class will cause that reference to be made available to that method, which may then use it to access data (even private data) within the heap object.
What makes references special in Java and .NET languages is that they maintain, as an absolute invariant, that every non-null reference will continue to identify the same heap object as long as that reference exists. Once no reference to a heap object exists anywhere in the universe, the heap object will simply cease to exist, but there is no way a heap object can cease to exist while any reference to it exists, nor is there any way for a "normal" reference to a heap object to spontaneously become anything other than a reference to that object. Both Java and .NET do have special "weak reference" types, but even they uphold the invariant. If no non-weak references to an object exist anywhere in the universe, then any existing weak references will be invalidated; once that occurs, there won't be any references to the object and it can thus be invalidated.
Pointers, like both C++ references and Java/.NET references, identify objects, but unlike the aforementioned types of references they can outlive the objects they identify. If the object identified by a pointer ceases to exist but the pointer itself does not, any attempt to use the pointer will result in Undefined Behavior. If a pointer isn't known either to be null
or to identify an object that presently exists, there's no standard-defined way to do anything with that pointer other than overwrite it with something else. It's perfectly legitimate for a pointer to continue to exist after the object identified thereby has ceased to do so, provided that nothing ever uses the pointer, but it's necessary that something outside the pointer indicate whether or not it's safe to use because there's no way to ask the pointer itself.
The key difference between pointers and references (of either type) is that references can always be asked if they are valid (they'll either be valid or identifiable as null), and if observed to be valid they will remain so as long as they exist. Pointers cannot be asked if they are valid, and the system will do nothing to ensure that pointers don't become invalid, nor allow pointers that become invalid to be recognized as such.