Regex: How to remove extra spaces between strings in Perl

前端 未结 5 479
不知归路
不知归路 2021-01-18 10:03

I am working on a program that take user input for two file names. Unfortunately, the program can easily break if the user does not follow the specified format of the input.

相关标签:
5条回答
  • 2021-01-18 10:20

    The standard way to deal with this kind of problem is utilising command-line options, not gathering input from STDIN. Getopt::Long comes with Perl and is servicable:

    use strict; use warnings FATAL => 'all';
    use Getopt::Long qw(GetOptions);
    my %opt;
    GetOptions(\%opt, 'qseq=s', 'barcode=s') or die;
    die <<"USAGE" unless exists $opt{qseq} and $opt{qseq} =~ /^sample\d[.]qseq$/ and exists $opt{barcode} and $opt{barcode} =~ /^barcode.*\.txt$/;
    Usage: $0 --qseq sample1.qseq --barcode barcode.txt
           $0 -q sample1.qseq -b barcode.txt
    USAGE
    printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode};
    

    The shell will deal with any extraneous whitespace, try it and see. You need to do the validation of the file names, I made up something with regex in the example. Employ Pod::Usage for a fancier way to output helpful documentation to your users who are likely to get the invocation wrong.

    There are dozens of more advanced Getopt modules on CPAN.

    0 讨论(0)
  • 2021-01-18 10:20

    You'll need to trim spaces before handling the filename data in your routine, you could check the file extension with yet another regular expression, as nicely described in Is there a regular expression in Perl to find a file's extension?. If it's the actual type of file that matters to you, then it might be more worthwile to check for that instead with File::LibMagicType.

    0 讨论(0)
  • 2021-01-18 10:28

    First, put use strict; at the top of your code and declare your variables.

    Second, this:

    # remove the ',' and put the files into an array separated by spaces; indexes the files
    push @filename, join(' ', split(',', $filenames))
    

    Is not going to do what you want. split() takes a string and turns it into an array. Join takes a list of items and returns a string. You just want to split:

    my @filenames = split(',', $filenames);
    

    That will create an array like you expect.

    This function will safely trim white space from the beginning and end of a string:

    sub trim {
        my $string = shift;
        $string =~ s/^\s+//;
        $string =~ s/\s+$//;
        return $string;
    }
    

    Access it like this:

    my $file = trim(shift @filenames);
    

    Depending on your script, it might be easier to pass the strings as command line arguments. You can access them through the @ARGV array but I prefer to use GetOpt::Long:

    use strict;
    use Getopt::Long;
    Getopt::Long::Configure("bundling");
    
    my ($qseq_filename, $barcode);
    
    GetOptions (
        'q|qseq=s' => \$qseq_filename,
        'b|bar=s'  => \$barcode,
    );
    

    You can then call this as:

    ./script.pl -q sample1.qseq -b barcode.txt
    

    And the variables will be properly populated without a need to worry about trimming white space.

    0 讨论(0)
  • 2021-01-18 10:41

    While I think your design is a little iffy, the following will work?

    my @fileNames = split(',', $filenames);
    foreach my $fileName (@fileNames) {
      if($fileName =~ /\s/) {
        print STDERR "Invalid filename.";
        exit -1;
      }
    }
    my ($qsec, $barcode) = @fileNames;
    
    0 讨论(0)
  • 2021-01-18 10:43

    And here is one more way you could do it with regex (if you are reading the input from STDIN):

    # read a line from STDIN
    my $filenames = <STDIN>;
    
    # parse the line with a regex or die with an error message
    my ($qseq_filename, $barcode) = $filenames =~ /^\s*(\S.*?)\s*,\s*(\S.*?)\s*$/
        or die "invalid input '$filenames'";
    
    0 讨论(0)
提交回复
热议问题