Scripting for file management with a very large amount of files

佐手、 提交于 2019-12-24 09:39:33

问题


I have a three OSX machine setup that was using syncthing to keep shared drives synchronized remotely. Someone made some mistakes and a lot of files ended up getting renamed.

So all throughout this drive I have situations where there's a file of size 0KB named,for example, file.jpg and another file with real size named file.sync-confilct201705-4528.jpg. I need to search the entire drive recursively and while I find a file with the sync-conflict string in it, check to see if there is the same file without the 'sync-conflict' string along with a size of 0KB. If there is, I need to rename the sync-conflict file to overwrite the 0KB file.

I have considered tackling this with a bash script or a Perl script. Using bash I think just using the 'find' command with -regex would get me started but I don't really know how to process the results and run the next find test. I am studying and working on it.

Same problem with Perl. I can get through the first step using File::Find:find and select what I need using regex to filter out the files, but there again I am stuck getting to the next step, which would be finding the original file in the same directory and performing the necessary file move function.

In both of these cases I am willing to put in the time to figure it out, but I wonder what the caveats will be? Can both of these scenarios handle recursing a large number of files without exception? Is there perhaps a better approach anyone can recommend?


回答1:


One good tool in Perl for this is File::Find::Rule.

Find all sync-conflict files, then test whether corresponding files exist and are zero size

use warnings;
use strict;
use FindBin qw($RealBin);
use File::Copy qw(move);
use File::Find::Rule;

my $dir = shift || '.';  # top of hierarchy to search (from command line, or ./)

my @conflict_files = File::Find::Rule
    ->file->name('*sync-conflict*.jpg')->in($dir);

foreach my $conflict (@conflict_files)
{
    my ($file) = $conflict =~ m|(.*)\.sync-conflict|;
    $file .= '.jpg';

    if (-z "$RealBin/$file") {
        print "Rename $conflict to $file\n"
        #move($conflict, $file) or warn "Can't move $conflict to $file: $!";
    }
 }

This builds the file's name file for each file.sync-conflict file and applies -z file test (-X), which tests for both existence and zero size. Then it renames the file using the core File::Copy.

Note that file-test operators need the full path while File::Find::Rule returns the path relative to the $dir it searches. I use $RealBin provided by FindBin, which is the path to the directory where the script was started with all links resolved, to build the full path for -z.

Uncomment the move line after sufficient testing (and with having made a backup first).

The code makes some assumptions about file names, please adjust as needed. The $dir supplied on the command line is expected to be relative to the script's directory.




回答2:


find is great. But as you've noted, you need more.

What find gets you in this scenario is the ability to search recursively and match certain patterns. As it happens as of Bash version 4, you can do that right in the shell.

(Note that macOS ships with bash version 3, so for this solution, you'll need to install bash 4 from Macports, Homebrew or Fink.)

$ shopt -s globstar nullglob
$ for file in **/*sync-confilct2017*.*; do echo mv -v "$file" "${file%sync-conf*}${file##*.}"; done
mv -v file.sync-confilct201705-4528.jpg file.jpg
mv -v foo/bar.sync-confilct201705-4528.ext foo/bar.ext

You can remove the echo to actually run the mv command.

The way this works is that the double asterisk, **, is treated by bash like a * that recurses. We're using parameter expansion to strip the parts of the filename we want in order to construct the "target" filename.




回答3:


Create a function to fix the name:

$ function fixname() { file="$1"; newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" );  if [ -f "$newname" -a ! -s "$newname" ]; then mv "$file" "$newname"; fi; }

Or, spread out a bit:

function fixname() {
    file="$1"
    newname=$( echo "$file" | sed "s/sync-conflict.*\.jpg$/.jpg/" )
    # If empty file exists
    if [ -f "$newname" -a ! -s "$newname" ]; then
        mv "$file" "$newname"
    fi
}

Export the function:

$ export -f fixname

Run find to execute the function:

$ find . -type f -name \*sync-conflict\*.jpg -exec bash -c 'fixname {}' bash \;

Caveat: It will not work with spaces or funky characters in the filenames.



来源:https://stackoverflow.com/questions/44424514/scripting-for-file-management-with-a-very-large-amount-of-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!