I am looking to parse iCalendar files using C. I have an existing structure setup and reading in all ready and want to parse line by line with components.
For example I
This answer is supposing that you want to roll your own parser using Standard C. In practice it is usually better to use an existing parser because they have already thought of and handled all the weird things that can come up.
My high level approach would be:
parse_line
:
strcspn
on the pointer to identify the location of the first :
or ;
(aborting if no marker found);
:
extract_name_value_pair
passing address of your parsing pointer.;
or :
following the entry. Of course this function must handle quote marks in the value and the fact that their might be ;
or :
in the value:
)parse_csv
which will look for comma-separated values (again, being aware of quote marks) and store the results it finds in the right place. The functions parse_csv
and extract_name_value_pair
should in fact be developed and tested first. Make a test suite and check that they work properly. Then write your overall parser function which calls those functions as needed.
Also, write all the memory allocation code as separate functions. Think of what data structure you want to store your parsed result in. Then code up that data structure, and test it, entirely independently of the parsing code. Only then, write the parsing code and call functions to insert the resulting data in the data structure.
You really don't want to have memory management code mixed up with parsing code. That makes it exponentially harder to debug.
When making a function that accepts a string (e.g. all three named functions above, plus any other helpers you decide you need) you have a few options as to their interface:
Each way has its pros and cons: it's annoying to write null terminators everywhere and then unwrite them later if need be; but it's also annoying when you want to use strcspn
or other string functions but you received a length-counted piece of string.
Also, when the function needs to let the caller know how much text it consumed in parsing, you have two options:
There's no one right answer, with experience you will get better at deciding which option leads to the cleanest code.