问题
I'm trying to write an application that looks through a directory and flag out all files (be it in directory or subdirectories) that has the TODO keyword (the one that flashes/highlights in color whenever we code in our code editor [i am using visual studio code]
I have gotten most of the code running, its just the last bit that is puzzling me : because my RegEx accepts 'TODO' as a word block, it picks up even files that has TODO as variable name / string content eg.
var todo = 'TODO'
or
var TODO = 'abcdefg'
so it is messing up with my test cases. How do we write a robust TODO regex / expression that is able to pick up just the TODO keyword (eg. //TODO
or // TODO
) and ignore the other use cases (in variables/strings etc) I dont want to hardcode // or anything in the regex as well, as i would prefer it to be cross-language as much as possible (eg. //
(single-line) or /*
(multi-line) for javascript, #
for python etc)
Here is my code:
import * as fs from 'fs';
import * as path from 'path';
const args = process.argv.slice(2);
const directory = args[0];
// Using recursion, we find every file with the desired extention, even if its deeply nested in subfolders.
// Returns a list of file paths
const getFilesInDirectory = (dir, ext) => {
if (!fs.existsSync(dir)) {
console.log(`Specified directory: ${dir} does not exist`);
return;
}
let files = [];
fs.readdirSync(dir).forEach(file => {
const filePath = path.join(dir, file);
const stat = fs.lstatSync(filePath); // Getting details of a symbolic link of file
// If we hit a directory, recurse our fx to subdir. If we hit a file (basecase), add it to the array of files
if (stat.isDirectory()) {
const nestedFiles = getFilesInDirectory(filePath, ext);
files = files.concat(nestedFiles);
} else {
if (path.extname(file) === ext) {
files.push(filePath);
}
}
});
return files;
};
const checkFilesWithKeyword = (dir, keyword, ext) => {
if (!fs.existsSync(dir)) {
console.log(`Specified directory: ${dir} does not exist`);
return;
}
const allFiles = getFilesInDirectory(dir, ext);
const checkedFiles = [];
allFiles.forEach(file => {
const fileContent = fs.readFileSync(file);
// We want full words, so we use full word boundary in regex.
const regex = new RegExp('\\b' + keyword + '\\b');
if (regex.test(fileContent)) {
// console.log(`Your word was found in file: ${file}`);
checkedFiles.push(file);
}
});
console.log(checkedFiles);
return checkedFiles;
};
checkFilesWithKeyword(directory, 'TODO', '.js');
Help is greatly appreciated!!
回答1:
I don't think there is a reliable way to exclude TODO in variable names or string values across languages. You'd need to parse each language properly, and scan for TODO in comments.
You can do an approximation that you can tweak over time:
- for variable names you'd need to exclude
TODO =
assignments, and any type of use, such asTODO.length
- for string value you could exclude
'TODO'
and"TODO"
, and even"Something TODO today"
while looking for matching quotes. What about a multi-line string with backticks?
This is a start using a bunch of negative lookaheads:
const input = `Test Case:
// TODO blah
// TODO do "stuff"
/* stuff
* TODO
*/
let a = 'TODO';
let b = 'Something TODO today';
let c = "TODO";
let d = "More stuff TODO today";
let TODO = 'stuff';
let l = TODO.length;
let e = "Even more " + TODO + " to do today";
let f = 'Nothing to do';
`;
let keyword = 'TODO';
const regex = new RegExp(
// exclude TODO in string value with matching quotes:
'^(?!.*([\'"]).*\\b' + keyword + '\\b.*\\1)' +
// exclude TODO.property access:
'(?!.*\\b' + keyword + '\\.\\w)' +
// exclude TODO = assignment
'(?!.*\\b' + keyword + '\\s*=)' +
// final TODO match
'.*\\b' + keyword + '\\b'
);
input.split('\n').forEach((line) => {
let m = regex.test(line);
console.log(m + ': ' + line);
});
Output:
false: Test Case:
true: // TODO blah
true: // TODO do "stuff"
false: /* stuff
true: * TODO
false: */
false: let a = 'TODO';
false: let b = 'Something TODO today';
false: let c = "TODO";
false: let d = "More stuff TODO today";
false: let TODO = 'stuff';
false: let l = TODO.length;
false: let e = "Even more " + TODO + " to do today";
false: let f = 'Nothing to do';
false:
来源:https://stackoverflow.com/questions/66079810/regex-for-todo-keyword-when-passing-through-a-list-of-directories-to-get-a-list