this is the input I have generated , that displays the versions of courses for both Jany and Marco at different times .
With GNU awk for true multi-dimensional arrays and sorted_in:
$ cat tst.awk
BEGIN{ RS=""; FS="[[:space:]:]+" }
{
for (i=11; i<=NF; i+=3) {
sched[$7" "$8][$2":"$3][$i] = $(i+1)
courses[$i]
}
}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
for (name in sched) {
printf "%s", name
for (time in sched[name]) {
printf ",%s", time
}
print ""
for (course in courses) {
printf "%s", course
for (time in sched[name]) {
printf ",%s", sched[name][time][course]
}
print ""
}
print ""
}
}
.
$ gawk -f tst.awk file
Marco 1,10:00,14:00
applicaton,halfhour,onehours
theory,geo,programmation
Marco 2,10:00,14:00
applicaton,nothing,nothing
theory,history,philosophy
jany 1,10:00,14:00
applicaton,onehour,twohours
theory,nothing,nothing
jany 2,10:00,14:00
applicaton,twohour,twohours
theory,math,music
It doesn't exactly produce your posted expected output but I think that's because your posted expected output is wrong (e.g. check the output for jany 1 application 14:00 compared to your input - the input is twohours
like my script produces but you say the expected output is halfhour
).