Sorting a hash in Perl when the keys are dynamic

前端未结

关注

 1  1098

I have a hash as follows:

my %data = (
    \'B2\' => {
        \'one\' => {
            timestamp => \'00:12:30\'
        },
        \'two\' => {


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2021-01-21 04:04
              
            
            
                                                                       
This type question might be more suited to the Programmers Stack Exchange site or the Code Review one. Since it is asking about implementation, I think its fine to ask here. The sites tend to have some overlap. 



As @DondiMichaelStroma pointed out, and as you already know, your code works great! However, there is more than one way to do it. For me, if this was in a small script, I would probably leave it as is and move on to the next part of the project. If this was in a more professional code base, I would make some changes.

For me, when writing for a professional code base, I try to keep a few things in mind.


Readability
Efficiency when it matters
Not gold-plating it
Unit Testing


So let's take a look at your code:

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);


The way data is defined is excellent and nicely formatted. This may not be how %data is built in your code, but maybe a unit test would have a hash like that.

my @flattened;
for my $outer_key (keys %data) {
    for my $inner_key (keys %{$data{$outer_key}}) {
        push @flattened, [
            $data{$outer_key}{$inner_key}{timestamp}
            , $outer_key
            , $inner_key
        ];
    }
}
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
    print join ',' => @$ary;
    print "\n";
}


The variable names could be more descriptive, and the @flattened array has some redundant data in it. Printing it with Data::Dumper, you can see we have C3 and B2 in multiple places.

$VAR1 = [
          '00:13:45',
          'C3',
          'three'
        ];
$VAR2 = [
          '00:09:30',
          'C3',
          'adam'
        ];
$VAR3 = [
          '00:12:30',
          'B2',
          'one'
        ];
$VAR4 = [
          '00:09:30',
          'B2',
          'two'
        ];


Maybe this isn't a big deal, or maybe you want to keep the functionality of getting all the data under the key B2.

Here's another way we could store that data:

my %flattened = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);


It may make the sorting more complicated, but it makes the data structure simpler! Maybe this is getting closer to gold-plating, or maybe you'd benefit from this data structure in another part of the code. My preference is to keep data structures simple, and add extra code if needed when processing them. If you decide you need to dump %flattened to a log file, you might appreciate not seeing duplicate data.



Implementation

Design: I think we want to keep this as two distinct operations. This will help code clarity and we could test each function individually. The first function would convert between the data formats we want to use, and the second function would sort the data. These functions should be in a Perl module, and we can use Test::More to do the unit testing. I don't know where we are calling these functions from, so let's pretend we are calling them from main.pl, and we can put the functions in a module called Helper.pm. These names should be more descriptive, but again I'm not sure what the application is here! Great names lead to readable code.



main.pl

This is what main.pl could look like. Even though there are no comments, the descriptive names can make it self documenting. These names could be still be improved too!

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = populate_data();

my @sorted_data = @{ sort_by_times_then_names( convert_to_simple_format( \%data ) ) };

print Dumper(@sorted_data);




Utilities/Helper.pm

Is this readable and elegant? I think it could use some improvements. More descriptive variable names would help in this module as well. However, it is easily testable, and keeps our main code clean and data structures simple.

package Utilities::Helper;
use strict;
use warnings;

use Exporter qw(import);
our @EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);

# We could put a comment here explaning the expected input and output formats.
sub sort_by_times_then_names {

    my ( $data_ref ) = @_;

    # Here we can use the Schwartzian Transform to sort it
    # Normally, we would just be sorting an array. But here we
    # are converting the hash into an array and then sorting it.
    # Maybe that should be broken up into two steps to make to more clear!
    #my @sorted = map  { $_ } we don't actually need this map
    my @sorted = sort {
                        $a->[2] cmp $b->[2] # sort by timestamp
                                 ||
                        $a->[1] cmp $b->[1] # then sort by name
                      }
                 map  { my $outer_key=$_;       # convert $data_ref to an array of arrays
                        map {                    # first element is the outer_key
                             [$outer_key, @{$_}] # second element is the name
                            }                    # third element is the timestamp
                            @{$data_ref->{$_}}
                      }
                      keys %{$data_ref};
    # If you want the elements in a different order in the array,
    # you could modify the above code or change it when you print it.
    return \@sorted;
}


# We could put a comment here explaining the expected input and output formats.
sub convert_to_simple_format {
    my ( $data_ref ) = @_;

    my %reformatted_data;

    # $outer_key and $inner_key could be renamed to more accurately describe what the data they are representing.
    # Are they names? IDs? Places? License plate numbers?
    # Maybe we want to keep it generic so this function can handle different kinds of data.
    # I still like the idea of using nested for loops for this logic, because it is clear and intuitive.
    for my $outer_key ( keys %{$data_ref} ) {
        for my $inner_key ( keys %{$data_ref->{$outer_key}} ) {
            push @{$reformatted_data{$outer_key}},
                 [$inner_key, $data_ref->{$outer_key}{$inner_key}{timestamp}];
        }
    }

    return \%reformatted_data;
}

1;




run_unit_tests.pl

Finally, let's implement some unit testing. This is might be more than you were looking for with this question, but I think clean seams for testing is part of elegant code and I want to demonstrate that. Test::More is really great for this. I'll even throw in a test harness and formatter so we can get some elegant output. You can use TAP::Formatter::Console if you don't have TAP::Formatter::JUnit installed.

#!/usr/bin/env perl
use strict;
use warnings;
use TAP::Harness;

my $harness = TAP::Harness->new({
    formatter_class => 'TAP::Formatter::JUnit',
    merge           => 1,
    verbosity       => 1,
    normalize       => 1,
    color           => 1,
    timer           => 1,
});

$harness->runtests('t/helper.t');




t/helper.t

#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

my %formatted_data = %{ convert_to_simple_format( \%data ) };

my %expected_formatted_data = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);

is_deeply(\%formatted_data, \%expected_formatted_data, "convert_to_simple_format test");

my @sorted_data = @{ sort_by_times_then_names( \%formatted_data ) };

my @expected_sorted_data = ( ['C3','adam', '00:09:30'],
                             ['B2','two',  '00:09:30'],
                             ['B2','one',  '00:12:30'],
                             ['C3','thee','00:13:45'] #intentionally typo to demonstrate output
                            );

is_deeply(\@sorted_data, \@expected_sorted_data, "sort_by_times_then_names test");

done_testing;




Test Output

The nice thing about testing this way is that it will tell you what is wrong when a test fails.

<testsuites>
  <testsuite failures="1"
             errors="1"
             time="0.0478239059448242"
             tests="2"
             name="helper_t">
    <testcase time="0.0452120304107666"
              name="1 - convert_to_simple_format test"></testcase>
    <testcase time="0.000266075134277344"
              name="2 - sort_by_times_then_names test">
      <failure type="TestFailed"
               message="not ok 2 - sort_by_times_then_names test"><![CDATA[not o
k 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee']]></failure>
    </testcase>
    <testcase time="0.00154280662536621" name="(teardown)" />
    <system-out><![CDATA[ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee'
1..2
]]></system-out>
    <system-err><![CDATA[Dubious, test returned 1 (wstat 256, 0x100)
]]></system-err>
    <error message="Dubious, test returned 1 (wstat 256, 0x100)" />
  </testsuite>
</testsuites>


In summary, I prefer readable and clear over concise. Sometimes you can make less efficient code that is easier to write and logically simpler. Putting ugly code inside functions is a great way to hide it! It isn't worth messing around with code to save 15ms when you run it. If your data set is large enough that performance becomes an issue, Perl might not be the right tool for the job.  If you are really looking for some concise code, post a challenge over at the Code Golf Stack Exchange.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复