SQLAlchemy commit(), flush(), expire(), refresh(), merge() - what's the difference?

Nov 2nd, 2017 in  by Michael Cho

I've always struggled to remember between these very similar methods in the SQLALchemy ORM - so I took some time out to research these and write it down.

SQLAlchemy is an object-relational mapper widely used in the Python world, making it easier (usually!) for developers to interface with their database of choice. I have always fudged my way around with the various database methods for syncing state between the in-memory object instances and the database, without fully understanding the differences. It's time to settle once and for all when to use session commit() vs flush() vs expire() vs refresh()! 

Note: Altough my experience is with SQLAlchemy's Flask variant, most (all?) of these principles should apply to all flavours of SQLAlchemy.

Tell me about the Session

Key to the rest of this article is the concept of the Session in the SQLAlchemy world. Sessions are an in-memory "limbo" state for objects associated with database records. 

Let's break down what this means:

  1. As an ORM, SQLAlchemy enables you to manipulate database records as Python objects. For example, a row in your users table would be represented as a <User> object, which has attributes, methods, and so on.
  2. These objects are held in memory and need to be synchronised with its representation in your database at some interval, otherwise the in-memory representation differs from your persistent database record.
  3. Sessions are a scope or context within which you can change these objects. Note that this does not necessarily mean any changes you make to the objects are (yet) synchronised back to the database.
  4. Sessions have a natural lifecyle in which objects are first instantiated from the database record, changes are made to these objects, and then the changes are either persisted back to the database or discarded. 

The various methods mentioned in this article (commit / flush / expire / refresh / merge) are all slightly different ways to accomplish that last step in the lifecycle of persisting changes back to the database.

You may be wondering at this stage what determines when the Session lifecycle begins and ends. SQLAlchemy has taken an opinionated stance where it is usually up to the developer to decide when this begins and ends, with the availability of methods such as db.session.begin() and db.session.commit(). However in most web applications, the established pattern is to begin and end the Session with each http request.

Expiring objects

Let's start with the most straightforward of the methods we are investigating.

The methods for db.session.expire(some_object) and db.session.expire_all() expires one or all objects in the current Session respectively. This means that:

  1. Expiring marks all attributes for that object as being stale or out of date.
  2. Subsequently, the next time that object is accessed, a new query will be issued to update the object with the current database record.

A key behaviour of expiring is that all un-flushed changes to the object is discarded and not persisted to the database. For example:


user.name = 'user2'
db.session.expire(user)

user.name  # ==> 'user1'

Note: Objects are automatically expired already whenever the Session ends. This means if there is a call to db.session.commit() or db.session.rollback() (automatically at the end of a HTTP request in the case of a web application), all objects are expired.

So when do you actually need to explicitly expire objects? You do so when you want to force an object to reload its data, because you know its current state is possibly stale. This is commonly when:

  • Some SQL has been emitted outside the scope of the ORM, eg if you executed some raw SQL statements.
  • Changes are made outside the Session context, eg a concurrent Celery task has been executed

Refreshing objects

Now that we understand expiring objects, the methods for db.session.refresh(some_object) becomes much easier to understand.

Basically, refreshing means to expire and then immediately get the latest data for the object from the database. It involves an immediate database query, which you may consider unnecessarily expensive.

Here's how I decide when to use expire vs refresh?

  • Expire - I persisted some changes for an object to the database. I don't need this updated object anymore in the current method, but I don't want any subsequent methods to accidentally use the wrong attributes.
  • Refresh - I persisted some changes for an object to the database. I need to use this updated object within the same method.

Flushing objects

Remember earlier in this article we mentioned that expiring objects will discard all un-flushed changes? Flushing means to push all object changes to the database. Note that this does not necessarily mean that changes have been made to the database records - you must still call db.session.commit() to update the database or db.session.rollback() to discard your changes.

Pushing object changes to the database means your database now holds the changes in its transaction buffer. This means there are 2 common gotchas with using flush():

  1. If you configured your Session with autocommit: True:
    • you are essentially requesting SQLAlchemy to call db.session.commit() whenever a transaction is not present
    • therefore, db.session.flush() will automatically call db.session.commit() unless you explicitly started a transaction with db.session.begin().
      
      # With autocommit: False
      user.name  # ==> 'user1'
      user.name = 'user2'
      db.session.flush()
      user.name  # ==> 'user2', returns the in-memory representation. If you view your db with another application, it will still show 'user1'
      db.session.rollback()
      user.name # ==> 'user1'
      
      # With autocommit: True
      user.name  # ==> 'user1'
      user.name = 'user2'
      db.session.flush()  # ==> db.session.commit() is automatically called
      user.name # ==> 'user2'. If you view your db with another application, it will already show 'user2'
      db.session.rollback()  # ==> too late!
      user.name # ==> 'user2'
      


  2. Without calling db.session.commit(), the changes remain in the database transaction buffer and any calls to refresh will get the unchanged values. That is (assuming autocommit:False) :
    
    user.name  # ==> 'user1'
    user.name = 'user2'
    db.session.flush()
    user.name  # ==> 'user2', returns the in-memory representation. If you view your db with another application, it will still show 'user1'
    db.session.refresh(user)
    user.name  # ==> 'user2'  # ==> SQLAlchemy assumes the database has been changed, even if it hasn't committed! If you view your db with another application it, it will still show 'user1'
    db.session.rollback()
    user.name  # ==> 'user1'  # ==> Rollback discards the database transaction buffer
    

Committing

With this understanding of flush, it's now easier to understand committing. Conceptually db.session.commit() is basically 2 steps in one:

  1. Flushing - push changes from SQLAlchemy's in-memory representation to the database transaction buffer
  2. Commit - persist changes from your database's transaction buffer into the database, ie inserting / updating / deleting.

Note if your Session is configured with autocommit: True, calling flush() will automatically call commit() if outside a transaction.

Merging

Merging is a less common scenario, where you may have more than one in-memory representation what is essentially the same object. Being the "same object" is usually based on the database's primary key. 

Here's an example:


user1 = User.query.get(1)
user1.name  # ==> 'user1'

new_user = User(user_id=1)  # ==> a second in-memory object with the same key!
new_user.name = 'user2'
user1.name  # ==> 'user1'. Without merging, user1 doesn't know it is the same as new_user

db.session.merge(new_user)
user1.name  # ==> 'user2'. Now updated in memory. Note not yet updated in db, needs flush() and commit()

Conclusion

Here's how I decide what to use:

  1. Expire
    • I've made some changes to an object and don't need it immediately but don't want any subsequent methods to use stale values.
  2. Refresh
    • I've made some changes to an object and need its updated values immediately.
    • Costs extra database call as it expires and reads from database immediately.
  3. Flush
    • Push changes from memory to your database's transaction buffer. No database statements are issued yet.
    • If Session has autocommit: False, must still call commit() to persist the changes or rollback() to discard changes.
    • If Session has autocommit: True and you are not explicitly in a transaction, commit() will be automatically called.
  4. Commit
    • Persist changes in your database's transaction buffer to the database. Database statements are issued.
    • Automatically expires objects.
  5. Merge
    • Used when you may have more than 1 in-memory objects which map to the same database record with some key.
    • Merging causes the in-memory objects to be synchronised with each other, does not necessarily persist to the database.

Other articles you may like

Prioritized Code Review Checklist - what to look for first, second, and last
Sep 21st, 2017
Many to many relationships in SQLAlchemy models (Flask)
Jul 28th, 2017
Bash script to relink alembic migrations
Jun 12th, 2017
Using model callbacks in SQLAlchemy to generate slugs
Feb 7th, 2017
Overriding default Jinja dictionary attributes in Python view templates
Jan 29th, 2017