I already know what a self-join does. Thank you, I also read all the other computerised operational descriptions on stack overflow, so I know this is not actually a
The reason why the employee-manager example is so common, is because it hits the nail on the head. A self join on a table looks for pairs of rows, like any join, but with both rows coming from the same table. Nothing special really.
The database designer gives each base table a predicate (sentence template parameterized by column names).
Parent(person, child) -- person PERSON is parent of person CHILD
Likes(person, food) -- person PERSON likes food FOOD
Relational algebra is designed so that the value of a relational expression (base table name or operator call) holds the rows that make a true proposition (statement) from its predicate.
/* (PERSON, CHILD) rows where
person PERSON is parent of person CHILD
*/
Parent
The predicate of an expression that is a call to operator NATURAL JOIN is the AND of the predicates of its inputs.
/* (PERSON, CHILD, FOOD) rows where
person PERSON is parent of person CHILD AND person PERSON likes food FOOD
*/
Parent NATURAL JOIN Likes
Ditto for UNION & OR, MINUS & AND NOT, PROJECT column(s) & EXISTS other column(s), RESTRICT condition & AND condition and RENAME of a column & rename of a parameter.
/* (CHILD, FOOD) rows where
there EXISTS a value for PERSON such that
person PERSON is parent of person CHILD AND person CHILD likes food FOOD
*/
PROJECT child, food (Parent NATURAL JOIN (RENAME person:=child Likes))
So every query expression's value holds the rows that make its predicate into a true statement.
Suppose we define algebraic self-join of a table as NATURAL JOIN of two tables got from an original via sequences of zero or more renamings. Per above we NATURAL JOIN for rows that satisfy the AND of predicates. A self-join arises when we want the rows that satisfy a result predicate expressed via predicates that differ only in parameters/columns.
/* (PERSON, FOOD, CHILD) rows where
person PERSON likes food FOOD AND person CHILD likes food FOOD
*/
Likes NATURAL JOIN (RENAME person:=child Likes)
There's nothing special about a self-join arising in a given query in a given application other than that.
SQL SELECT DISTINCT statements can be described via algebraic operators. They also calculate query predicates. First FROM table columns are RENAMEd by prefixing a table alias (correlation name) & a dot. (SQL NATURAL JOIN doesn't dot common columns.) The new tables are NATURAL JOINed. ON and WHERE RESTRICT per a condition. Then the SELECT DISTINCT clause RENAMES to remove dots from returned columns & PROJECTS away unwanted dotted columns.
We can convert SQL to predicates directly: Dotting input columns renames. NATURAL/CROSS/INNER JOIN, ON & WHERE give AND. Each dot-free result column gives an AND that it equals its dotted version. Finally dropping all dotted columns gives EXISTS.
/* same as above */
/* (PERSON, FOOD, CHILD) rows where
there EXISTS values for P.* & C.* such that
PERSON = P.PERSON AND CHILD = C.person AND FOOD = P.FOOD
AND person P.CHILD likes food P.FOOD
AND person C.CHILD likes food C.FOOD
AND P.FOOD = C.FOOD
*/
SELECT DISTINCT p.person AS person, c.person AS child, p.food AS food
FROM Likes p INNER JOIN Likes c
ON p.food = c.food
Again: In SQL we say there is a self-join when multiple table aliases of a JOIN are associated with the same table value; in application terms that means we can express a query meaning in terms of predicates differing in some parameters/columns; there's nothing special about applications or table meanings for this to arise.
See this re query semantics which happens to include a link to this re self-join semantics in particular.