Let\'s assume that we have a data frame x
which contains the columns job
and income
. Referring to the data in the frame normally requi
If you execute attach(data)
multiple time, eg, 5 times, then you can see (with the help of search()
) that your data has been attached 5 times in the workspace environment. So if you de-attach (detach(data)
) it once, there'll still be data
present 4 times in the environment. Hence, with()/within()
are better options. They help create a local environment containing that object and you can use it without creating any confusions.
When to use it:
I use attach()
when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.
When not to use it:
However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.
The with()
function, or the data=
argument to many functions, are excellent alternatives to many instances where attach()
is tempting.
Another reason not to use attach
: it allows access to the values of columns of a data frame for reading (access) only, and as they were when attached. It is not a shorthand for the current value of that column. Two examples:
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> attach(cars)
> # convert stopping distance to meters
> dist <- 0.3048 * dist
> # convert speed to meters per second
> speed <- 0.44707 * speed
> # compute a meaningless time
> time <- dist / speed
> # check our work
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
No changes were made to the cars
data set even though dist
and speed
were assigned to.
If explicitly assigned back to the data set...
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> attach(cars)
> # convert stopping distance to meters
> cars$dist <- 0.3048 * dist
> # convert speed to meters per second
> cars$speed <- 0.44707 * speed
> # compute a meaningless time
> cars$time <- dist / speed
> # compute meaningless time being explicit about using values in cars
> cars$time2 <- cars$dist / cars$speed
> # check our work
> head(cars)
speed dist time time2
1 1.78828 0.6096 0.5000000 0.3408862
2 1.78828 3.0480 2.5000000 1.7044311
3 3.12949 1.2192 0.5714286 0.3895842
4 3.12949 6.7056 3.1428571 2.1427133
5 3.57656 4.8768 2.0000000 1.3635449
6 4.02363 3.0480 1.1111111 0.7575249
the dist
and speed
that are referenced in computing time
are the original (untransformed) values; the values of cars$dist
and cars$speed
when cars
was attached.
I think there's nothing wrong with using attach
. I myself don't use it (then again, I love animals, but don't keep any, either). When I think of attach
, I think long term. Sure, when I'm working with a script I know it inside and out. But in one week's time, a month or a year when I go back to the script, I find the overheads with searching where a certain variable is from, just too expensive. A lot of methods have the data
argument which makes calling variables pretty easy (sensulm(x ~ y + z, data = mydata)
). If not, I find the usage of with
to my satisfaction.
In short, in my book, attach is fine for short quick data exploration, but for developing scripts that I or other might want to use, I try to keep my code as readable (and transferable) as possible.