How does character comparison work in R?

点点圈 提交于 2020-01-04 06:05:37

问题


I used to use string comparisons directly in my code without giving it much of a thought. I then recently found that something is off here as though it worked for say

> "1" < "2"
[1] TRUE

It failed for

> "6" < "10"
[1] FALSE

I think there is a very simple explanation for this and I am just being dumb. Maybe it compares their ASCII values or something. Any insights?

The reason I ask is I am planning on sorting a vector of timestamps which are of class character.

Example :

> timeStamps <- c("2013/10/30 12:12:17","2013/10/30 12:12:38","2013/10/30 12:10:32","2013/10/30 12:09:42")
> sort(timeStamps)
[1] "2013/10/30 12:09:42" "2013/10/30 12:10:32" "2013/10/30 12:12:17" "2013/10/30 12:12:38"

Is it safe to do this? Or are their cases where it will fail and I should convert it into a proper timestamp format and then sort it?


回答1:


Comparisons between strings depend on the locale and the encoding of the strings. The ?Comparison help page describes the process in detail.

Most (probably all) locales and encodings will consider "0" < "1" < "2" ... " < "9", so as long as your dates-time are in the format %Y/%m/%d %H:%M:%S, they will be sorted correctly.

This is a really dangerous approach though, since

  1. Single digit days or months (e.g. 3 instead of 03 for March) will break the sort order.

  2. Hyphens or other punctuation instead of slashes will break the sort order.

  3. You won't be able to identify non-existent date-times.

All in all, the time you'll spend debugging problems from using strings as dates will far outweigh the time to write one line of code to convert to a date format.

timeStamps <- strptime(timeStamps, "%Y/%m/%d %H:%M:%S")

Or

library(lubridate)
timeStamps <- ymd_hms(timeStamps)


来源:https://stackoverflow.com/questions/19833826/how-does-character-comparison-work-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!