Modeling set-based code-listings of sql data manipulation operations

后端 未结 1 948
一个人的身影
一个人的身影 2021-01-19 19:36

Technologies like LINQ do a good job being able to describe relational data queries, with types such as IQueryable, IGrouping, and IOrderedQu

1条回答
  •  余生分开走
    2021-01-19 20:11

    Unlikely as it might seem, the one thing you don't need is "something beyond" relational algebra. It's not a theoretical problem at all, but one of imagination and engineering. The problem you're talking about crosses several domains: programming language, library support, and DBMS. It could be done (and should). But first it needs to be commonly understood as realistic and desirable, and we're not there yet.

    As far as the algebra is concerned, all that's missing is assignment. If you've read Date's Third Manifesto, you may recall that insert/update/delete are just variations on assignment:

    S += f(R)        -- insert
    S += f(R) - g(S) -- update
    S -= f(R)        -- delete
    

    (Python does a fair job of demonstrating that with the set class in its standard library, btw, except that you don't get operators for sets-of-tuples out of the box.)

    So it's not a theoretical problem; the algebra is fine. And you're not asking purely about syntax, either. What you want, it seems to me, is a DBMS that you can manipulate functionally, without SQL -- and SQL generators -- acting as an intermediary. Wouldn't it be nice if the tables in your database appeared as variables in your programming language, and there was a relational-algebra library (for that language) that supported select, project, and join?

    For that matter, why not incorporate relational operators into the language proper? Why, 40 years after relational theory was invented, is its use limited to databases? That in fact has been a lament of the database community for decades. Although it's been done -- cf. Datalog, for example -- the surfeit of new languages we've seen in recent years has been notable for continuing the C tradition of no support for set-theoretic operations.

    As it happens, though, just having relations and relational operators built into the language wouldn't be enough. Programming languages generally expect to define their variables, and to own them exclusively. That's practically the definition of a programming language: something that defines and manipulates chucks of memory, the lifetime of which is bounded by the execution of the program. And the interesting data usually starts "out there", somewhere, not in program memory.

    So, what you really, really want is to manipulate data "in the database" as though those tables were program variables (otherwise known as action at a distance), and then some super-convenient, ideally transparent, way to move the results into program memory. Like, oh, assignment. And to make any headway at all in that direction, you need the cooperation of the DBMS.

    To interact with a typical DBMS these days, you formulate your question in its language (usually SQL) and fetch the output row by row into program memory. It's an I/O model: write string, read results. To take that I/O out of the programming model, you need a different API, something more like RPC. If the programming language and the DBMS use the same data model (relations) and functions (relational algebra) and data types, then you have a fighting chance at operating on both remote and local data in the same way.

    That's the suite:

    • language support for relations and relational operations
    • language recognition of local and out-of-machine variables
    • DBMS support to programmatically expose table definitions, such that a compiler/interpreter can "link" to them, as library symbols
    • DBMS support for remote invocation of relational operators, function by function, not statement by statement

    You may have noticed that, to a reasonable approximation, no one is trying to do the above. Language designers universally ignore set theory and predicate logic. DBMS vendors -- and popular free projects -- are shackled to SQL, utterly uninterested in fixing SQL's set-theoretic flaws or exposing their systems through a logical-function API. The furthest thing from anyone's mind is developing a congruent set of types and operators.

    So what do we have instead? Linc is a good example of a dancing bear, glopping together SQL from strings and primitive types, squirting it over a pipe, and expressing database tables to row-by-row operations supplied by the host language. It's a pretty good show, given the reality of today's environment. But, as your question suggests, the novelty wears off, and the work doesn't get any easier. You might want to hold onto your ticket, though: judging progress by its current speed and direction, you'll be in the same seat for another 40 years.

    0 讨论(0)
提交回复
热议问题