Regex for TODO keyword when passing through a list of directories to get a list of files with TODO keyword (eg. //TODO) but not as variable / string

自作多情 提交于 2021-02-11 12:24:21

问题


I'm trying to write an application that looks through a directory and flag out all files (be it in directory or subdirectories) that has the TODO keyword (the one that flashes/highlights in color whenever we code in our code editor [i am using visual studio code]

I have gotten most of the code running, its just the last bit that is puzzling me : because my RegEx accepts 'TODO' as a word block, it picks up even files that has TODO as variable name / string content eg.

var todo = 'TODO' or var TODO = 'abcdefg'

so it is messing up with my test cases. How do we write a robust TODO regex / expression that is able to pick up just the TODO keyword (eg. //TODO or // TODO) and ignore the other use cases (in variables/strings etc) I dont want to hardcode // or anything in the regex as well, as i would prefer it to be cross-language as much as possible (eg. // (single-line) or /* (multi-line) for javascript, # for python etc)

Here is my code:

import * as fs from 'fs'; 
import * as path from 'path';

const args = process.argv.slice(2);
const directory = args[0];

// Using recursion, we find every file with the desired extention, even if its deeply nested in subfolders.
// Returns a list of file paths
const getFilesInDirectory = (dir, ext) => {
  if (!fs.existsSync(dir)) {
    console.log(`Specified directory: ${dir} does not exist`);
    return;
  }

  let files = [];
  fs.readdirSync(dir).forEach(file => {
    const filePath = path.join(dir, file);
    const stat = fs.lstatSync(filePath); // Getting details of a symbolic link of file

    // If we hit a directory, recurse our fx to subdir. If we hit a file (basecase), add it to the array of files
    if (stat.isDirectory()) {
      const nestedFiles = getFilesInDirectory(filePath, ext);
      files = files.concat(nestedFiles);
    } else {
      if (path.extname(file) === ext) {
        files.push(filePath);
      }
    }
  });

  return files;
};



const checkFilesWithKeyword = (dir, keyword, ext) => {
  if (!fs.existsSync(dir)) {
    console.log(`Specified directory: ${dir} does not exist`);
    return;
  }

  const allFiles = getFilesInDirectory(dir, ext);
  const checkedFiles = [];

  allFiles.forEach(file => {
    const fileContent = fs.readFileSync(file);

    // We want full words, so we use full word boundary in regex.
    const regex = new RegExp('\\b' + keyword + '\\b');
    if (regex.test(fileContent)) {
      // console.log(`Your word was found in file: ${file}`);
      checkedFiles.push(file);
    }
  });

  console.log(checkedFiles);
  return checkedFiles;
};

checkFilesWithKeyword(directory, 'TODO', '.js');



Help is greatly appreciated!!


回答1:


I don't think there is a reliable way to exclude TODO in variable names or string values across languages. You'd need to parse each language properly, and scan for TODO in comments.

You can do an approximation that you can tweak over time:

  • for variable names you'd need to exclude TODO = assignments, and any type of use, such as TODO.length
  • for string value you could exclude 'TODO' and "TODO", and even "Something TODO today" while looking for matching quotes. What about a multi-line string with backticks?

This is a start using a bunch of negative lookaheads:

const input = `Test Case:
// TODO blah
// TODO do "stuff"
/* stuff
 * TODO
 */
let a = 'TODO';
let b = 'Something TODO today';
let c = "TODO";
let d = "More stuff TODO today";
let TODO = 'stuff';
let l = TODO.length;
let e = "Even more " + TODO + " to do today";
let f = 'Nothing to do';
`;
let keyword = 'TODO';
const regex = new RegExp(
  // exclude TODO in string value with matching quotes:
  '^(?!.*([\'"]).*\\b' + keyword + '\\b.*\\1)' +
  // exclude TODO.property access:
  '(?!.*\\b' + keyword + '\\.\\w)' +
  // exclude TODO = assignment
  '(?!.*\\b' + keyword + '\\s*=)' +
  // final TODO match
  '.*\\b' + keyword + '\\b'
);
input.split('\n').forEach((line) => {
  let m = regex.test(line);
  console.log(m + ': ' + line);
});

Output:

false: Test Case:
true: // TODO blah
true: // TODO do "stuff"
false: /* stuff
true:  * TODO
false:  */
false: let a = 'TODO';
false: let b = 'Something TODO today';
false: let c = "TODO";
false: let d = "More stuff TODO today";
false: let TODO = 'stuff';
false: let l = TODO.length;
false: let e = "Even more " + TODO + " to do today";
false: let f = 'Nothing to do';
false: 


来源:https://stackoverflow.com/questions/66079810/regex-for-todo-keyword-when-passing-through-a-list-of-directories-to-get-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!