I was working on a code generation task that uses ML to generate GUI code from screenshots. After generating the DSL code successfully, I find it difficult to measure the accura