Blending SQL and Python with Sqlorm

66 points by emixam 6 days ago

somat 2 days ago

Here is my rather naive take on the same subject. But I had a very different motivation than the author. See I actually quite like SQL and enjoy programming in it, but what I don't like is mixing sql and python. So one night in a flash of inspiration or perhaps a fever dream I wrote this thing that lets you have stand alone parameterized sql queries and you call them like a python function or generator. It is one of those overly clever things where I sort of hate the magic, but I find myself using it more and more which I will probably regret one day.

https://nl1.outband.net/fossil/query/file?name=query.py&ci=t...

In short you have your query in file sql/dept_personal.sql and you call it like

    for row in q.dept_personal(db_cursor, department='manpower'):

atombender 21 hours ago
That's basically the idea behind Sqlc [1]. By letting SQL be SQL, you avoid the many awkward mechanisms ORMs need to integrate SQLisms into the native language, and you define the query only in terms of its inputs and outputs, which can be made type-safe since they're declaratively defined.
The downside is that parameterized queries are a bit of a chore; for example, if a query should support an optional filter on user_id, you need to craft it like this:
```
    WHERE ...
      AND CASE
        WHEN sqlc.narg('user_id') IS NOT NULL THEN sqlc.narg('user_id')
        ELSE true
      END
```
This is not too bad, though, and the conditionals get optimized away by the database planner.
[1] https://sqlc.dev/
truculent 2 days ago

You may enjoy https://pugsql.org
- jumski 2 days ago
  
  You reminded me of https://github.com/krisajenkins/yesql which inspired HugSQL which was an inspiration for PugSQL. Fun times!
- getnormality 2 days ago
  
  This is a nice package, and a great illustration of how languages other than R suffer from the lack of an aesthetically elegant way to select list elements with bare words, like R's $ operator.
  Because their lists don't have selection by bare words, they have to go one of several other specialized, distinct, built-in Abstract Data Types to get it. They have to create whole so-called "Classes" and "Modules", when all they really needed was a list whose elements can be accessed with a dot and a bare word.
  The pandas package for tabular data manipulation requires even more complicated workarounds. It has a DataFrame Class composed of objects of Column Class. Then it makes an arbitrary bunch of common functions, so common that many are built into Python itself, Methods of said Columns. (In R, a table is just a list of vectors, and no Methods are needed.)
  So now you've got a thing that's supposedly a real Class, but it's really just a container of completely arbitrary fields and data types. These fields are themselves instances of another Class that is supposedly specific to pandas, but is really just a vector, and a vector doesn't necessarily have anything to do with being part of a table. And that Class has some random methods that give you additional ways to do basic things the language already does, and are often not the functions you actually need to work with the data therein.
  All that just so that we can write stuff like df.col.max(), and... gosh, what is that even supposed to mean? Can we all just admit that we like writing code in chains separated by dots, and stop tying that capability to hierarchies of Official Abstract Data Types?
  These non-R languages make you utter such strange incantations just to put something in a key-value container and access that thing with nice-looking code. I feel like this makes it harder to realize that very often this is the best way of doing things.
  R has a bit more varied and sometimes mildly ugly syntax than other languages, but once you get used to the building blocks it gives you, it has all these powers to do very dynamic things in very easy ways, without a bunch of ponderous specialized concepts.
  - cluckindan 2 days ago
    
    What do you mean? Many languages allow accessing named properties like that. Even JavaScript :)
    The strange thing here seems to be R’s use of ”list” as a name for a map-like key-value structure. The word ”list” is commonly understood to refer to a data structure which needs to be linearly (linked list) or partially (skiplist) iterated through to access a value at a particular index.
    
    nevertoolate 2 days ago
    
    Total nitpick - you say list is commonly understood to be linearly iterated. I’d expect a list to refer to an ordered sequence - default implementation of access and mutation varies wildly between languages. E.g. java code usually defaults to ArrayList, lisps to cons cells, C++ doubly linked list, etc.
    Sql has “tuples” for the rows of a result-set which are neither tuples nor lists in the “general sense” and are of a “record” type - names with values.
    So what is a list? Depends on the context.
    
    getnormality 2 days ago
    
    I guess I don't know enough about enough other languages to make broad generalizations. Oh well, it's too late to edit now.
    My impression is that JavaScript is another language like R that values flexibility a lot.
    And yeah, I agree that R is rather casual about lists vs maps. It doesn't really care that maps are a great data structure in their own right. It just wants to slap names on list elements when it's convenient to access elements of the list by name.
  - getnormality 2 days ago
    
    Correction, the Column object is called Series, but the same points apply.
digdugdirk 2 days ago

I honestly have no opinion in this discussion, but I will 100% upvote the first Fossil repository I've seen shared on here! How do you find developing on the Fossil platform?
- somat 2 days ago
  
  It has a sane ui/syntax, all the additional infrastructure you might want is built in (webpage, wiki, forum, tickets) and is trivial to set up. It is nearly the perfect version control system. Especially for the small independent amateur developer.
  - digdugdirk a day ago
    
    Awesome. Does it have any sort of CI/CD capabilities? And do you have any recommendations for learning resources for someone to get started?
    
    somat a day ago
    
    For getting started there is the quick start guide https://fossil-scm.org/home/doc/trunk/www/quickstart.wiki and after that the built in help gets you pretty far. Really, most distributed version control systems have the same workflow the devil is in the exact syntax used.
    As for CI there are pre-commit hooks I suspect could be used for that process. https://fossil-scm.org/home/help/hook and https://fossil-scm.org/home/doc/trunk/www/hooks.md

Rajni07 6 days ago

Really like the idea of keeping SQL explicit while still getting ORM conveniences. The @sqlfunc syntax feels clean, and the no-session approach makes cross DB work simpler. Curious how it handles async or pooling. Seems like a solid middle ground between raw SQL and heavy ORMs.

gmac 2 days ago

Agreed. I have a TypeScript library that attempts something similar: https://jawj.github.io/zapatos/

itopaloglu83 2 days ago

A common problem I found myself in is that I have to develop the query in one file and frequently run it to verify the data accuracy. I define the variables at top reminding me types and limitations etc. int vs varchar(10) vs varchar(50)

So I made a very simple module that takes those sql files and turns them into SQLAlchemy text objects with variables in them.

Would it be possible to add something like this to the project or does it require many sql parsing libraries etc. to ensure sql validity to find variables in the sql file?

tcdent 2 days ago

Kudos for making the leap.

Your pattern of re-interpreting __doc__ is kinda weird though. Why not just add a `return` statement?

duncanfwalker 2 days ago

I guess it's more clear that it should be a to statically readable value? eg you shouldn't do things like use arguments to build the str
- jcz_nz a day ago
  
  I would def use this if there was a return “select …” option. There are heaps of scenarios where sql is modified based on parameters. If no doc string just use the return value maybe?
  Our queries are typically large, not 3-5 liners.
  (Filter view queries where you might add additional CTA’s to provide the necessary filter conditions, but aren’t desirable if particular filter parameter is nill, etc.)
  - tcdent 12 hours ago
    
    Just keep in mind best practice is to use the built-in parameter interpolation that comes with your db library, since it handles escaping SQL injection for you.
    Be very careful if you ever use bare string formatting to construct your queries.
  - emixam a day ago
    
    Hello, author here. It is actually possible to use return instead with a different set of decorators: check out the first "tip" block on this page: https://hyperflask.github.io/sqlorm/sql-functions/

harvey9 2 days ago

The author says "the name isn't great", but I like it. Very close to 'squirm' or even 'slurm'.

whinvik 2 days ago

I find the following way of expressing args awkward.

with engine: fn(args)

I would rather have

with engine: fn(args, engine)

This makes testing way easier.

develatio 2 days ago

I think this is very similar to Django’s ORM.

JodieBenitez 2 days ago

> However, I've always felt some of the design choices didn't fit how I like to use an ORM. Notably:

I feel the same, hence why I prefer a Django-like ORM to SQLAlchemy in spite of all the praises it gets. The author says "SQLAlchemy is the best. I don't like the API or codebase of the others", but actually what he describes feels like the Django ORM (or Tortoise, or many others).

Also, sometimes just a thin layer above SQL is fine. For small personal projects I use my own wrapper above sqlite like so:

    import oora
    from dataclasses import dataclass

    db = oora.DB(
        db_path=":memory:",  # or /path/to/your/db.sqlite3
        # migrations are just pairs of key=>val where key is an arbitrary (but unique) label and val is a SQL script or a callable.
        # If val is a callable, it must take a sqlite3.Cursor as first parameter.
        # migrations are executed in order
        migrations={
            # here's an initial migration:
            "0000": "CREATE TABLE IF NOT EXISTS user(id INTEGER PRIMARY KEY, name TEXT UNIQUE NOT NULL);",
            # simulating a schema evolution, let's add a field:
            "0001": "ALTER TABLE user ADD COLUMN email TEXT NULL;",
        },
    )
    db.migrate()

    db.insert("user", {"name": "John"})
    db.insert("user", {"name": "Jack"})
    db.insert("user", {"name": "Jill"})


    # dataclasses are perfect to represent rows
    # while still allowing custom behaviour
    @dataclass
    class User:
        id: int
        name: str
        email: str

        def __str__(self):
            return self.name


    # fetch a random instance
    user = db.hydrate(User, db.execute("select * from user ORDER BY RANDOM() limit 1").fetchone())
    print(f"User(id {user.id}), original name: {user}")

    # change name and email
    user.name = "Richard"
    user.email = "richard@acme.tld"
    db.save(user) # name of table is infered from the dataclass name
    print(f"User(id {user.id}), updated name: {user} <{user.email}>")

    # persist changes
    db.commit()import oora
    from dataclasses import dataclass

    db = oora.DB(
        db_path=":memory:",  # or /path/to/your/db.sqlite3
        # migrations are just pairs of key=>val where key is an arbitrary (but unique) label and val is a SQL script or a callable.
        # If val is a callable, it must take a sqlite3.Cursor as first parameter.
        # migrations are executed in order
        migrations={
            # here's an initial migration:
            "0000": "CREATE TABLE IF NOT EXISTS user(id INTEGER PRIMARY KEY, name TEXT UNIQUE NOT NULL);",
            # simulating a schema evolution, let's add a field:
            "0001": "ALTER TABLE user ADD COLUMN email TEXT NULL;",
        },
    )
    db.migrate()

    db.insert("user", {"name": "John"})
    db.insert("user", {"name": "Jack"})
    db.insert("user", {"name": "Jill"})


    # dataclasses are perfect to represent rows
    # while still allowing custom behaviour
    @dataclass
    class User:
        id: int
        name: str
        email: str

        def __str__(self):
            return self.name


    # fetch a random instance
    user = db.hydrate(User, db.execute("select * from user ORDER BY RANDOM() limit 1").fetchone())
    print(f"User(id {user.id}), original name: {user}")

    # change name and email
    user.name = "Richard"
    user.email = "richard@acme.tld"
    db.save(user) # name of table is infered from the dataclass name
    print(f"User(id {user.id}), updated name: {user} <{user.email}>")

    # persist changes
    db.commit()