R- How to plot correct pie charts in haploNet haplotyp Networks {pegas} {ape} {adegenet}

问题

When using the haploNet package to make some plots on a haplotype network, I used a script available on the internet to do so. However I think there is something wrong. The script is available in form of the woodmouse example. The code I used is:

x <- read.dna(file="Masto.fasta",format="fasta")
h <- haplotype(x)
net <- haploNet(h)
plot(net)

plot(net, size = attr(net, "freq"), fast = TRUE)
plot(net, size = attr(net, "freq"))
plot(net, size=attr(net, "freq"), scale.ratio = 2, cex = 0.8

table(rownames(x))

ind.hap<-with(
    stack(setNames(attr(h, "index"), rownames(h))), 
    table(hap=ind, pop=rownames(x)[values])
)
ind.hap 

plot(net, size=attr(net, "freq"), scale.ratio = 2, cex = 0.8, pie=ind.hap)
legend(50,50, colnames(ind.hap), col=rainbow(ncol(ind.hap)), pch=20)

legend(x=7,y=10,c("Baeti ero","Felege weyni","Golgole naele","Hagare selam","Ruba feleg","Ziway"),c("red","yellow","green","turquoise","blue","magenta"))

However when plotting ind.hap, you can notice that some rows are not in the right place. You can see this here:

      pop
hap    Baetiero ETH022 ETH742 Felegeweyni Golgolenaele Rubafeleg
  I           0      0      1           0            0         0
  II          0      1      0           0            0         0
  III         1      0      0           1            0         1
  IV          2      0      0           0            0         3
  IX          0      0      0           1            0         0
  V           4      0      0           0            2         0
  VI          4      0      0           1            0         4
  VII         2      0      0           1            0         0
  VIII        0      0      0           1            0         1
  X           3      0      0           0            1         0
  XI          0      0      0           0            1         1
  XII         0      0      0           1            0         0
  XIII        0      0      0           0            0         1

You can see that row IX is not on its right place. This would not be too much of a problem, but the program takes row 9 to make the pie plot for IX, which is the data of VIII. This is the result: (I could not insert the image since my reputation is below 10..., you get the image by executing the whole file anyway)

You can see that for V until IX it's not as it should be (these are the swapped rows). For instance: IX has only 1 haplotype in it, but there's a pie chart for 2 haplotypes (which both have 50% of the chart), which is generated using the VIII data. Since the rows are sorted alphabetically instead of ascending, but this is inherent to the package, I don't know what to do. I'm far from a master in R, so try not to be too abstract, but provide code instead.

If there is someone who knows this package very well, please explain also why there are these weird extra lines behind the real charts (these with the numbers on them), as they were not visible in the woodmouse example (maybe that's because of what's wrong too?)

Thanx in advance

回答1:

I've struggled with the same issue, but believe I came up with a solution.

The problem is that the step making the table of haplotype counts per "population" orders the haplotypes alphabetically. So, for example, haplotype "IX" comes before "V". On the other hand, the function haplotype() sorts the haplotypes by their "numerical" order. And this is what creates a discrepancy when plotting.

This can be solved by sorting the haplotype object by "label", as explained in ?haplotype help.

I'll use the woodmouse example data to exemplify:

# Sample 9 distinct haplotypes
library(pegas)
data(woodmouse)
x <- woodmouse[sample(9, 100, replace = T), ]

To simplify, I create a function to create the count table of haplotypes (based on this post):

countHap <- function(hap = h, dna = x){
    with(
        stack(setNames(attr(hap, "index"), rownames(hap))),
        table(hap = ind, pop = attr(dna, "dimnames")[[1]][values])
    )
}

Now, let's see the result without sorting haplotypes:

h <- haplotype(x) # create haplotype object
net <- haploNet(h) # create haploNet object

plot(net, pie = countHap(), size = attr(net, "freq"), legend = T)

Now, let's look at our count table, to check these results:

countHap(h, x)

      pop
hap    No0906S No0908S No0909S No0910S No0912S No0913S No304 No305 No306
  I          0       0       0       0       0       0     0     8     0
  II         0       0       0       0       0       0     9     0     0
  III        0       0       0       0       0       0     0     0    10
  IV        16       0       0       0       0       0     0     0     0
  IX         0       0       0       0       0       8     0     0     0
  V          0      12       0       0       0       0     0     0     0
  VI         0       0      10       0       0       0     0     0     0
  VII        0       0       0      13       0       0     0     0     0
  VIII       0       0       0       0      14       0     0     0     0

Things do not match: for example, haplotype "V" should occur in individual "No0908S", but instead is coloured as individual "No0913S" (which should be the label for haplotype "IX").

Now, let's sort haplotypes:

h <- haplotype(x)
h <- sort(h, what = "labels") # This is the extra step!!
net <- haploNet(h)

plot(net, pie = countHap(), size = attr(net, "freq"), legend = T)

And all is well now!

Extra:

Although this is not requested by the OP, I thought of leaving it here if it is of interest for anyone else. Sometimes, I find it convenient to label haplotypes by their frequency. This can be done by changing the haplotype labels to be equal to their frequencies:

attr(h, "labels") <- attr(h, "freq")
plot(net, pie = countHap(), size = attr(net, "freq"), legend = T)

来源：https://stackoverflow.com/questions/31220586/r-how-to-plot-correct-pie-charts-in-haplonet-haplotyp-networks-pegas-ape-a

标签

dna-sequence

phylogeny

genetics