Find direct and indirect subclasses by scanning filesystem

微笑、不失礼 提交于 2020-01-22 17:42:10

问题


I'm having a problem in writing an algorithm to help me scan a file system and find all subclasses of a certain class.

Details:

I've an app that scans an external application using nio Files.walk() while retrieving I check for "extends SuperClass" while reading the file if the word exits, I add the class name in my list as follows:

List<String> subclasses = new ArrayList<>();
Files.walk(appPath)
     .filter(p->Files.isRegularFile(p) && p.toString()
     .endsWith(".java")).forEach(path -> {
        try {
         List<String> lines = Files.readAllLines(path);
         Pattern pattern = Pattern.compile("\\bextends SuperClass\\b");
         Matcher matcher = pattern
                           .matcher(lines.stream()
                                 .collect(Collectors.joining(" ")));
         boolean isChild = matcher.find();
         if(isChild) subclasses.add(path.getFileName().toString());
        }catch (IOException e){
                //handle IOE
        }

The problem with the above is that it only gets direct subclasses of SuperClass but I need to retrieve all direct and indirect subclasses. I thought about recursion since I've no Idea how many subclasses of SuperClass there is but I couldn't implement any reasonable implementation.

NOTES:

  • Scanning more than 600 thousands file
  • I have no Idea how many direct/indirect subclasses of SuperClass there is
  • The application that I'm scanning is external and I can't modify its code so I'm only allowed to access it by reading files and see where extends exists
  • If there is a non-recursive solution to the problem that would be great but if there's no other way, I'll be more than happy to accept a recursive one since I care about the solution more than performance.

Edit:

I use the following regex to compare both name and import to make sure even in case of same name different packages the output is correct:

Pattern pattern = Pattern.compile("("+superClasss.getPackage()+")[\\s\\S]*(\\bextends "+superClass.getName()+"\\b)[\\s\\S]");

I also tried:

Pattern pattern = Pattern.compile("\\bextends "+superClass.getName()+"\\b");

But there is also some missing subclasses, I believe that the code bellow skips some checks, and doesn't fully work:

public static List<SuperClass> getAllSubClasses(Path path, SuperClass parentClass) throws IOException{
classesToDo.add(baseClass);
while(classesToDo.size() > 0) {
    SuperClass superClass = classesToDo.remove(0);
    List<SuperClass> subclasses = getDirectSubClasses(parentPath,parentClass);
    if(subclasses.size() > 0)
        classes.addAll(subclasses);
    classesToDo.addAll(subclasses);
}
return classes;

}

Any help is truly appreciated!

Edit 2 I also noticed another problem, is that when I detect a subclass I get the file name currentPath.getFileName() which might or might not be the subclass name as the subclass may be a nested or non-public class in the same file.


回答1:


I strongly recommend parsing compiled class files instead of source code. Since these class files are already optimized for being processed by machines, a lot of the complexity and corner cases of the source code file processing has been eliminated.

So a solution to build a complete class hierarchy tree using the ASM library would look like this:

public static Map<String, Set<String>> getClassHierarchy(Path root) throws IOException {
    return Files.walk(root)
         .filter(p->Files.isRegularFile(p) && isClass(p.getFileName().toString()))
         .map(p -> getClassAndSuper(p))
         .collect(Collectors.groupingBy(Map.Entry::getValue,
                Collectors.mapping(Map.Entry::getKey, Collectors.toSet())));
}
private static boolean isClass(String fName) {
    // skip package-info and module-info
    return fName.endsWith(".class") && !fName.endsWith("-info.class");
}
private static Map.Entry<String,String> getClassAndSuper(Path p) {
    final class CV extends ClassVisitor {
        Map.Entry<String,String> result;
        public CV() {
            super(Opcodes.ASM5);
        }
        @Override
        public void visit(int version, int access,
                String name, String signature, String superName, String[] interfaces) {
            result = new AbstractMap.SimpleImmutableEntry<>(
                Type.getObjectType(name).getClassName(),
                superName!=null? Type.getObjectType(superName).getClassName(): "");
        }
    }
    try {
        final CV visitor = new CV();
        new ClassReader(Files.readAllBytes(p)).accept(visitor, ClassReader.SKIP_CODE);
        return visitor.result;
    } catch (IOException ex) {
        throw new UncheckedIOException(ex);
    }
}

As a bonus, resp. to create some test cases, the following method adds the ability to build the hierarchy for a runtime class’ source:

public static Map<String, Set<String>> getClassHierarchy(Class<?> context)
                                        throws IOException, URISyntaxException {
    Path p;
    URI clURI = context.getResource(context.getSimpleName()+".class").toURI();
    if(clURI.getScheme().equals("jrt")) p = Paths.get(URI.create("jrt:/modules"));
    else {
        if(!clURI.getScheme().equals("file")) try {
            FileSystems.getFileSystem(clURI);
        } catch(FileSystemNotFoundException ex) {
            FileSystems.newFileSystem(clURI, Collections.emptyMap());
        }
        String qn = context.getName();
        p = Paths.get(clURI).getParent();
        for(int ix = qn.indexOf('.'); ix>0; ix = qn.indexOf('.', ix+1)) p = p.getParent();
    }
    return getClassHierarchy(p);
}

Then, you can do

Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
System.out.println("Direct subclasses of "+Number.class);
hierarchy.getOrDefault("java.lang.Number", Collections.emptySet())
         .forEach(System.out::println);

and get

Direct subclasses of class java.lang.Number
java.lang.Float
java.math.BigDecimal
java.util.concurrent.atomic.AtomicLong
java.lang.Double
java.lang.Long
java.util.concurrent.atomic.AtomicInteger
java.lang.Short
java.math.BigInteger
java.lang.Byte
java.util.concurrent.atomic.Striped64
java.lang.Integer

or

Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
System.out.println("All subclasses of "+Number.class);
printAllClasses(hierarchy, "java.lang.Number", "  ");
private static void printAllClasses(
        Map<String, Set<String>> hierarchy, String parent, String i) {
    hierarchy.getOrDefault(parent, Collections.emptySet())
        .forEach(x -> {
            System.out.println(i+x);
            printAllClasses(hierarchy, x, i+"  ");
    });
}

to get

All subclasses of class java.lang.Number
  java.lang.Float
  java.math.BigDecimal
  java.util.concurrent.atomic.AtomicLong
  java.lang.Double
  java.lang.Long
  java.util.concurrent.atomic.AtomicInteger
  java.lang.Short
  java.math.BigInteger
  java.lang.Byte
  java.util.concurrent.atomic.Striped64
    java.util.concurrent.atomic.LongAdder
    java.util.concurrent.atomic.LongAccumulator
    java.util.concurrent.atomic.DoubleAdder
    java.util.concurrent.atomic.DoubleAccumulator
  java.lang.Integer



回答2:


DISCLAIMER: This solution might not work if you have several classes with the same name as it does not take packages names into account.

I think you can do it with keeping track of the classes to lookup in a List and use a while loop until all the values on the list have been explored.

Here is a bit of code which creates a Map<String, List<String>>, key is the class name, value is the list of child classes.

public class Test {

    private static Path appPath = //your path

    private static Map<String, List<String>> classes = new HashMap<>();
    private static List<String> classesToDo = new ArrayList<>();

    public static void main(String[] args) throws IOException {

        classesToDo.add("AnswerValueValidatorBase");

        while(classesToDo.size() > 0) {
            String className = classesToDo.remove(0);
            List<String> subclasses = getDirectSubclasses(className);
            if(subclasses.size() > 0)
                classes.put(className, subclasses);
            classesToDo.addAll(subclasses);
        }

        System.out.println(classes);
    }

    private static List<String> getDirectSubclasses(String className) throws IOException {
        List<String> subclasses = new ArrayList<>();
        Files.walk(appPath)
             .filter(p -> Files.isRegularFile(p) && p.toString().endsWith(".java"))
             .forEach(path -> {
                 try {
                      List<String> lines = Files.readAllLines(path);
                      Pattern pattern = Pattern.compile("\\bextends "+className+"\\b");
                      Matcher matcher = pattern.matcher(lines.stream().collect(Collectors.joining(" ")));
                      boolean isChild = matcher.find();
                      if(isChild) {
                          String fileName = path.getFileName().toString();
                          String clazzName = fileName.substring(0, fileName.lastIndexOf("."));
                          subclasses.add(clazzName);
                      }
                  } catch(IOException e) {
                  //handle IOE
                  }
        });

        return subclasses;
    }
}

Running it on my project returns something that looks correct

{
    AnswerValueValidatorBase=[SingleNumericValidator, DefaultValidator, RatingValidator, ArrayValidatorBase, DocumentValidator],
    ArrayValidatorBase=[MultiNumericValidator, StringArrayValidator, IntegerArrayValidator, MultiCheckboxValidator], 
    DefaultValidator=[IntegerValidator, DateValidator, StringValidator, CountryValidator, PercentageValidator], 
    IntegerArrayValidator=[MultiPercentageValidator, RankValidator, MultiDropValidator, MultiRadioValidator, CheckboxValidator], 
    SingleNumericValidator=[SliderValidator], 
    MultiNumericValidator=[MultiSliderValidator], 
    StringArrayValidator=[MultiTextValidator, ChecklistValidator]
}

EDIT

A recursive way of doing it would be

public class Test {

    private static Path appPath = // your path 

    public static void main(String[] args) throws IOException {

        List<String> classesToDo = new ArrayList<>();
        classesToDo.add("AnswerValueValidatorBase");

        Map<String, List<String>> classesMap = getSubclasses(new HashMap<>(), classesToDo);

        System.out.println(classesMap);
    }

    private static Map<String, List<String>> getSubclasses(Map<String, List<String>> classesMap, List<String> classesToDo) throws IOException {
        if(classesToDo.size() == 0) {
            return classesMap;
        } else {
            String className = classesToDo.remove(0);
            List<String> subclasses = getDirectSubclasses(className);
            if(subclasses.size() > 0)
                classesMap.put(className, subclasses);
            classesToDo.addAll(subclasses);
            return getSubclasses(classesMap, classesToDo);
        }
    }

    private static List<String> getDirectSubclasses(String className) throws IOException {
        // same as above
    }

}


来源:https://stackoverflow.com/questions/49896798/find-direct-and-indirect-subclasses-by-scanning-filesystem

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!