IFC is a variation of STEP files used for construction projects. The IFC contains information about the building being constructed. The file is text based and it easy to read. I am trying to parse this information into a python dictionary. The general format of each line will be similar to the following
ideally this should be parsed int #2334, IFCMATERIALLAYERSETUSAGE, #2333,.AXIS2.,.POSITIVE.,-180. I found a solution Regex includes two matches in first match https://regex101.com/r/RHIu0r/10 for part of the problem. However, there are some cases the data contains arrays instead of values as the example below
This case need to be parsed as #2335, IFCRELASSOCIATESMATERIAL, '2ON6$yXXD1GAAH8whbdZmc', #5,$,$, [#40,#221,#268,#281],#2334 Where [#40,#221,#268,#281] is a stored in a single variable as an array The array can be in the middle or the last variable.
Would you be able to assist in creating a regular expression to obtain desired results I have created https://regex101.com/r/mqrGka/1 with cases to test
Here's a solution that continues from the point you reached with the regular expression in the test cases:
file = """\
#2=IFCSPACE(';;);',#1,$);some text);
import re
d = dict()
for line in file:
m = re.match(r"^#(\d+)\s*=\s*([a-zA-Z0-9]+)\s*\(((?:'[^']*'|[^;'])+)\);", line, re.I|re.M)
attr = m.group(3) # attribute list string
values = [m.group(2)] # first value is the entity type name
while attr:
start = 1
if attr[0] == "'": start += attr.find("'", 1) # don't split at comma within string
if attr[0] == "(": start += attr.find(")", 1) # don't split item within parentheses
end = attr.find(",", start) # search for a comma / end of item
if end < 0: end = len(attr)
value = attr[1:end-1].split(",") if attr[0] == "(" else attr[:end]
if value[0] == "'": value = value[1:-1] # remove quotes
attr = attr[end+1:] # remove current attribute item
d[m.group(1)] = values # store into dictionary