I have a big text file structured in blocks like:
Student = {
PInfo = {
ID = 0001;
Name.First = \"Joe\";
Na
To parse the file you could define a grammar that describes your input format and use it to generate a parser.
There are many language parsers in Python. For example, you could use Grako that takes grammars in a variation of EBNF as input, and outputs memoizing PEG parsers in Python.
To install Grako, run pip install grako
.
Here's grammar for your format using Grako's flavor of EBNF syntax:
(* a file is zero or more records *)
file = { record }* $;
record = name '=' value ';' ;
name = /[A-Z][a-zA-Z0-9.]*/ ;
value = object | integer | string ;
(* an object contains one or more records *)
object = '{' { record }+ '}' ;
integer = /[0-9]+/ ;
string = '"' /[^"]*/ '"';
To generate parser, save the grammar to a file e.g., Structured.ebnf
and run:
$ grako -o structured_parser.py Structured.ebnf
It creates structured_parser
module that can be used to extract the student information from the input:
#!/usr/bin/env python
from structured_parser import StructuredParser
class Semantics(object):
def record(self, ast):
# record = name '=' value ';' ;
# value = object | integer | string ;
return ast[0], ast[2] # name, value
def object(self, ast):
# object = '{' { record }+ '}' ;
return dict(ast[1])
def integer(self, ast):
# integer = /[0-9]+/ ;
return int(ast)
def string(self, ast):
# string = '"' /[^"]*/ '"';
return ast[1]
with open('input.txt') as file:
text = file.read()
parser = StructuredParser()
ast = parser.parse(text, rule_name='file', semantics=Semantics())
students = [value for name, value in ast if name == 'Student']
d = {'{0[Name.First]} {0[Name.Last]}'.format(s['PInfo']):
dict(School=s['School'], Zip=s['Address']['Zip'])
for s in students}
from pprint import pprint
pprint(d)
{'Joe Burger': {'School': u'West High', 'Zip': 12345},
'John Smith': {'School': u'East High', 'Zip': 12346}}
For such thing, I use Marpa::R2, a Perl interface to Marpa, a general BNF parser. It allows decribing the text as a grammar rules and parse them to a tree of arrays (parse tree). You can then traverse the tree to save the results as a hash of hashes (hash is perl for python's dictionary) or use it as is.
I cooked a working example using your input: parser, result tree.
Hope this helps.
P.S. Example of ast_traverse()
: Parse values from a block of text based on specific keys
it's not json, but similar structured. you should be able to reformat it into json.