this is the input I have generated , that displays the versions of courses for both Jany and Marco at different times .
Try this:
BEGIN {
# set records separated by empty lines
RS=""
# set fields separated by newline, each record has 3 fields
FS="\n"
}
{
# remove undesired parts of every first line of a record
sub("the course of ", "", $1)
sub(" is :", "", $1)
sub("on ", "", $1)
# now store the rest in time and course
time=$1
course=$1
# remove time from string to extract the course title
sub("^[^ ]* ", "", course)
# remove course title to retrieve time from string
sub(course, "", time)
# get theory info from second line per record
sub("course:theory:", "", $2)
# get application info from third line
sub("course:applicaton:", "", $3)
# if new course
if (! (course in header)) {
# save header information (first words of each line in output)
header[course] = course
theory[course] = "theory"
app[course] = "application"
}
# append the relevant info to the output strings
header[course] = header[course] "," time
theory[course] = theory[course] "," $2
app[course] = app[course] "," $3
}
END {
# now for each course found
for (key in header) {
# print the strings constructed
print header[key]
print theory[key]
print app[key]
print ""
}
I hope the comments are self explanatory, if you have questions about the script be sure to ask them.