Wednesday, August 29, 2012

Larger large objects

Large objects (BLOBs) have been there since PostgreSQL was born. The size limit of a large objects has been 2GB(assuming default block size) . Now I decide to expand the limit for PostgreSQL 9.3: 4TB is the target. Actually PostgreSQL backend could hold up to 4TB large objects. It has been just limitation of API: for example, lo_lseek() and lo_tell cannot return over 2GB offset. Those function's return type is "int". So you might wonder why not they cannot return over 2GB value? Well, the secret is frontend/backend protocol for large object.

The underlying protocol is called "fast path interface". It's similar to RPC(Remote Procedure Call). Client sends "Function call" packet along with target function OID(Object Id). The called function within backend is executed and the result is returned back through "Function call response".

The functions called in large object interface are:
  • lo_open
  • lo_close
  • lo_creat
  • lo_create
  • lo_unlink
  • lo_lseek
  • lo_tell
  • lo_truncate
  • loread
  • lowrite
Those functions OIDs are retrieved from backend at the first time when accessing large objects and they are cached in the connection handle(I'm talking about libpq and other interfaces such as JDBC's implementation might be different).

Problem is, lo_lseek and lo_tell as I said earlier. First, their offset parameter is defined 4 bytes long. Second their result length is defined as 4 byte long. So we can handle only up 2^31-1 = 2GB. What shall we do? Well, we will add new function in backend namely, lo_lseek64 and lo_tell64. Libpq will check if those 64-bit functions exist. If yes, then use them. Otherwise (that means backend is likely pre-9.3 version) we use plain old 32-bit limited lo_lseek and lo_tell. This way, we do not break backward compatibility. Of course you need to use 9.3 libpq to enjoy "larger large objects".

I hope I'm going to post the first cut of patch by in September.


4 comments:

  1. This sounds great Tatsuo thanks for you hard work!

    ReplyDelete
  2. Ishii-san,

    Might it be worth revisiting the varlena limit, too?

    ReplyDelete
  3. > Might it be worth revisiting the varlena limit, too?
    Is this on the PostgreSQL TODO list?

    ReplyDelete
  4. Tatsuo, thanks for your contributions (re: https://github.com/postgres/postgres/commit/461ef73f0977c95c9452680495bc161618db9227). Is there a reason the lo_write() function doesn't have a complementary lo_write64() function?

    ReplyDelete

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...