I need to sort a file using shell sort in linux. The sort needs to be based on timestamp values contained within each of file\'s rows. The timestamps are of irregular format
sort
is a nice tool but it doesn't have enough bells and whistles to take pseudo-xml apart, convert an attribute to a sensible time value, and then sort on it.
However, such tools do exist. While the best way to do this would probably be with an XSLT transform, if the file is really as consistent as your example command expects, you could extract the time values with cut -d'"' -f4
, and you can convert each one to a more sensible format with date
. For example (needs GNU date
):
paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-
which extracts the date-times, one per line; feeds them to date to convert them to seconds-since-epoch; pastes each timestamp on the beginning of each line; sorts the pasted result numerically, now with numeric timestamps at the beginning, and finally removes the timestamp to get the original file back.
Test:
$ cat >file.txt <<'EOF'
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
EOF
$ paste <(cut -d'"' -f4 file.txt | date -f- +%s) file.txt | sort -n | cut -f2-
<r id="defg" t="7/24/2012 12:16:18 AM"><d><nv n="name" v="177" /><nv n="name0" v="0" /><nv n="name1" v="5888870" /></d></r>
<r id="abcd" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="defg" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="0" /></d></r>
<r id="fghj" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="38" /><nv n="name0" v="0" /><nv n="name1" v="3004537" /></d></r>
<r id="zxy" t="7/24/2012 12:16:17 PM"><d><nv n="name" v="0" /><nv n="name0" v="0" /><nv n="name1" v="59542676" /></d></r>
<r id="abcd" t="10/12/2012 12:16:17 AM"><d><nv n="name" v="868" /><nv n="name0" v="73" /><nv n="name1" v="13815004" /></d></r>
The linux date
command does a fine job of parsing dates like this, and it can translate them into more sortable things, like simple unix-time integers.
Example:
cat file | while read line; do
datestring=$(sed -e 's/^.* t="\([^"]*\)".*$/\1/' <<<"$line")
echo "$(date -d "$datestring" +%s) $line"
done | sort -n
then you could pass that through the appropriate cut
invocation if you want that unix timestamp removed again.