I have to write a program that takes a user\'s chemical equation as an input, like 12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2, and watch if the amount of Atoms is on both sites the
The way a proper parser such as ANTLR works is to 1) convert the text into a stream of lexical tokens, then 2) parse the tokens with lookahead into a parse tree.
Lookahead is useful to know when to "end" a particular structural level of parsing.
For your requirements, you might be able to skip the distinction between lexing and parsing and just parse from the text directly -- however, an appreciation and use of lookahead would potentially be useful.
In particular a buffer to hold the upcoming (remaining) text, test matches (eg regex) against it, and consume matches from the front could be useful. This could be implemented either by modifying the remaining
string or by advancing an index within it.
Given such a buffer, your pseudocode might look like:
class EquationParser {
protected ParseBuffer buffer;
// parse Equation;
// -- of form "Sum -> Sum".
// -- eg. "12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2"
public Equation parseEquation() {
Sum left = parseSum();
matchAndConsume("->");
Sum right = parseSum();
return new Equation( left, right);
}
// parse Sum;
// -- eg. "12 CO2 + 6 H2O"
public Sum parseSum() {
// parse 1 or more Product terms;
Sum result = new Sum();
result.add( parseProduct());
while (buffer.lookaheadMatch("\\+")) {
buffer.consumeMatch("\\+");
result.add( parseProduct());
}
return result;
}
// parse Product term;
// -- of form "N formula", or just "formula".
// -- eg. "12 CO2" or "CO2"
public Product parseProduct() {
int quantity = 1;
if (buffer.lookaheadMatch("\\d+")) {
quantity = Integer.parseInt( buffer.consumeMatch("\\d+"));
}
Formula formula = parseFormula();
return new Product( quantity, formula);
}
// parse Formula;
// -- eg. "C6H12O6" or "CO2"
public Formula parseFormula() {
Formula result = new Formula();
result.add( parseTerm());
while (buffer.lookaheadMatch("[A-Z][a-z]?\\d*")) {
result.add( parseTerm());
}
return result;
}
// parse Term;
// -- eg. "C6", "C", "Co6", or "Co6"
public Term parseTerm() {
// ... reader exercise to implement...
}
protected void matchAndConsume (String patt) {
if (! buffer.lookaheadMatch( patt))
throw ParseFailed("parse failed: expected "+patt);
buffer.consumeMatch( patt);
}
}
This is conceptual example code, not tested & does not include the buffer or complete parser -- it is the reader's job to flesh these out to a complete solution.
This question is asking for a simple parser for a simple type of equation. I am assuming that you do not need to support all kinds of irregular equations with parentheses and weird symbols.
Just to be safe, I would use a lot of String.split()
instead of regexes.
A (relatively) simple solution would do the following:
->
+
Each level of parsing can be handily done in a separate method. Using regex is probably the best way to parse the individual molecules, so I borrowed the expression from here: https://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts. The regex is pretty much trivial, so please bear with me:
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SimpleChemicalEquationParser
{
// Counts of elements on each side
private Map<String, Integer> left;
private Map<String, Integer> right;
public SimpleChemicalEquationParser(String eqn)
{
this.left = new HashMap<>();
this.right = new HashMap<>();
parse(eqn);
}
public boolean isBalanced()
{
return left.equals(right);
}
public boolean isSimpleBalanced()
{
return leftCount() == rightCount();
}
public int leftCount()
{
return left.values().stream().mapToInt(Integer::intValue).sum();
}
public int rightCount()
{
return right.values().stream().mapToInt(Integer::intValue).sum();
}
private void parse(String eqn)
{
String[] sides = eqn.split("->");
if(sides.length != 2) {
throw new RuntimeException("Check your equation. There should be exactly one -> symbol somewhere");
}
parseSide(sides[0], this.left);
parseSide(sides[1], this.right);
}
private void parseSide(String side, Map<String, Integer> counter)
{
String[] molecules = side.split("\\+");
for(String molecule : molecules) {
parseMolecule(molecule, counter);
}
}
private void parseMolecule(String molecule, Map<String, Integer> counter)
{
molecule = molecule.trim();
Matcher matcher = Pattern.compile("([a-zA-Z]+)\\s*([0-9]*)").matcher(molecule);
int multiplier = 1;
int endIndex = 0;
while(matcher.find()) {
String separator = molecule.substring(endIndex, matcher.start()).trim();
if(!separator.isEmpty()) {
// Check if there is a premultiplier before the first element
if(endIndex == 0) {
String multiplierString = molecule.substring(0, matcher.start()).trim();
try {
multiplier = Integer.parseInt(multiplierString);
} catch(NumberFormatException nfe) {
throw new RuntimeException("Invalid prefix \"" + multiplierString +
"\" to molecule \"" + molecule.substring(matcher.start()) + "\"");
}
} else {
throw new RuntimeException("Nonsensical characters \"" + separator +
"\" in molecule \"" + molecule + "\"");
}
}
parseElement(multiplier, matcher.group(1), matcher.group(2), counter);
endIndex = matcher.end();
}
if(endIndex != molecule.length()) {
throw new RuntimeException("Invalid end to side: \"" + molecule.substring(endIndex) + "\"");
}
}
private void parseElement(int multiplier, String element, String atoms, Map<String, Integer> counter)
{
if(!atoms.isEmpty())
multiplier *= Integer.parseInt(atoms);
if(counter.containsKey(element))
multiplier += counter.get(element);
counter.put(element, multiplier);
}
public static void main(String[] args)
{
// Collect all command line arguments into one equation
StringBuilder sb = new StringBuilder();
for(String arg : args)
sb.append(arg).append(' ');
String eqn = sb.toString();
SimpleChemicalEquationParser parser = new SimpleChemicalEquationParser(eqn);
boolean simpleBalanced = parser.isSimpleBalanced();
boolean balanced = parser.isBalanced();
System.out.println("Left: " + parser.leftCount());
for(Map.Entry<String, Integer> entry : parser.left.entrySet()) {
System.out.println(" " + entry.getKey() + ": " + entry.getValue());
}
System.out.println();
System.out.println("Right: " + parser.rightCount());
for(Map.Entry<String, Integer> entry : parser.right.entrySet()) {
System.out.println(" " + entry.getKey() + ": " + entry.getValue());
}
System.out.println();
System.out.println("Atom counts match: " + simpleBalanced);
System.out.println("Elements match: " + balanced);
}
}
All the work is done by the parse
method and it's subordinates, which make a sort of virtual call tree. Since this approach makes it especially easy to make sure that the atoms of each element are actually balanced out, I have gone ahead and done that here. This class prints the counts of the atoms on each side of the equation, whether or not the raw counts balance out, as well as whether or not they match my element type. Here are a couple of example runs:
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O -> 2 C6H12O6 + 12 O2'
Left: 54
C: 12
H: 12
O: 30
Right: 72
C: 12
H: 24
O: 36
Atom counts match: false
Elements match: false
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O + 6 O3 -> 2 C6H12O6 + 12 O2'
Left: 72
C: 12
H: 12
O: 48
Right: 72
C: 12
H: 24
O: 36
Atom counts match: true
Elements match: false
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 12 H2O -> 2 C6H12O6 + 12 O2'
Left: 72
C: 12
H: 24
O: 36
Right: 72
C: 12
H: 24
O: 36
Atom counts match: true
Elements match: true
Notice that I added a space between C
and O
in CO2
. This is because my current regex for molecules, ([a-zA-Z]+)\\s*([0-9]*)
, allows any combination of letters to represent an element. If your elements are always going to be simple one-letter elements, change this to ([a-zA-Z])\\s*([0-9]*)
(remove the +
quantifier). If they are going to be properly named, two letter combinations with the second letter always lowercase, do this instead: ([A-Z][a-z]?)\\s*([0-9]*)
. I recommend the latter option. For both modified versions, the space in C O2
will no longer be necessary.
So, every time I need to parse some text with Java
, I mostly end up just using Regex
. So I'd recommend you to also do so.
You can test regular expressions at regex101.com.
And also easily use it in Java
:
final inputText = ...
final Pattern pattern = Patern.compile("Some regex code");
final Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(0));
}
Inside Regex
you can define capturing groups with (
and )
and then grab the results by matcher.group(int)
.
For example, you may first separate the equation using (.*) -> (.*)
.
Then loop the left and right group using find
with: (\d+) (\w+)(?: \+| -|$)
.
After that you can use group(1)
for the amount and group(2)
for the element.
And if needed also iterate the second group (the element) for the exact element distribution using (\w)(\d?)
. Then the first group is the element, for example for the text CO2
it yields two hits, the first hit has group(1) -> C
and no second group. The second hit has group(1) -> O
and group(2) -> 2
.
Test your regex here: regex101#Q6KMJo