These terms were used in my data structures textbook, but the explanation was very terse and unclear. I think it has something to do with how much knowledge the algorithm has a
An online algorithm processes the input only piece by piece and doesn't know about the actual input size at the beginning of the algorithm.
An often used example is scheduling: you have a set of machines, and an unknown workload. Each machine has a specific speed. You want to clear the workload as fast as possible. But since you don't know all inputs from the beginning (you can often see only the next in the queue) you can only estimate which machine is the best for the current input. This can result in non-optimal distribution of your workload since you cannot make any assumption on your input data.
An offline algorithm on the other hand works only with complete input data. All workload must be known before the algorithm starts processing the data.
Workload: 1. Unit (Weight: 1) 2. Unit (Weight: 1) 3. Unit (Weight: 3) Machines: 1. Machine (1 weight/hour) 2. Machine (2 weights/hour) Possible result (Online): 1. Unit -> 2. Machine // 2. Machine has now a workload of 30 minutes 2. Unit -> 2. Machine // 2. Machine has now a workload of one hour either 3. Unit -> 1. Machine // 1. Machine has now a workload of three hours or 3. Unit -> 2. Machine // 1. Machine has now a workload of 2.5 hours ==> the work is done after 2.5 hours Possible result (Offline): 1. Unit -> 1. Machine // 1. Machine has now a workload of one hour 2. Unit -> 1. Machine // 1. Machine has now a workload of two hours 3. Unit -> 2. Machine // 2. Machine has now a workload of 1.5 hours ==> the work is done after 2 hours
Note that the better result in the offline algorithm is only possible since we don't use the better machine from the start. We know already that there will be a heavy unit (unit 3), so this unit should be processed by the fastest machine.