How does character comparison work in R?

问题

I used to use string comparisons directly in my code without giving it much of a thought. I then recently found that something is off here as though it worked for say

> "1" < "2"
[1] TRUE

It failed for

> "6" < "10"
[1] FALSE

I think there is a very simple explanation for this and I am just being dumb. Maybe it compares their ASCII values or something. Any insights?

The reason I ask is I am planning on sorting a vector of timestamps which are of class character.

Example :

> timeStamps <- c("2013/10/30 12:12:17","2013/10/30 12:12:38","2013/10/30 12:10:32","2013/10/30 12:09:42")
> sort(timeStamps)
[1] "2013/10/30 12:09:42" "2013/10/30 12:10:32" "2013/10/30 12:12:17" "2013/10/30 12:12:38"

Is it safe to do this? Or are their cases where it will fail and I should convert it into a proper timestamp format and then sort it?

回答1:

Comparisons between strings depend on the locale and the encoding of the strings. The ?Comparison help page describes the process in detail.

Most (probably all) locales and encodings will consider "0" < "1" < "2" ... " < "9", so as long as your dates-time are in the format %Y/%m/%d %H:%M:%S, they will be sorted correctly.

This is a really dangerous approach though, since

Single digit days or months (e.g. 3 instead of 03 for March) will break the sort order.
Hyphens or other punctuation instead of slashes will break the sort order.
You won't be able to identify non-existent date-times.

All in all, the time you'll spend debugging problems from using strings as dates will far outweigh the time to write one line of code to convert to a date format.

timeStamps <- strptime(timeStamps, "%Y/%m/%d %H:%M:%S")

library(lubridate)
timeStamps <- ymd_hms(timeStamps)

来源：https://stackoverflow.com/questions/19833826/how-does-character-comparison-work-in-r

标签

posixct