Hibernate, PostgreSQL, and @Lob String

Suppose we have the following entity which represents a persistent note — just some arbitrary text. When we use @Lob on an attribute of type String, the JPA specification says that the persistence provider should use a large object of character type to store the attribute value.

[java]
@Entity
@Table(name = “note”)
@Access(AccessType.FIELD)
class NoteEntity {

@Id
private Long id;

@Lob
@Column(name = “note_text”)
private String noteText;

public NoteEntity() { }

public NoteEntity(String noteText) { this.noteText = noteText }
}
[/java]

The challenge for the persistence provider is that large object support in databases is not standardized, and thus the means by which this feature is implemented tends to be rather database specific.

When using Hibernate with the PostgreSQL dialect, the manner in which a string attribute annotated with @Lob is recorded in the database is unusual. If you’re combining Hibernate-based persistence on a PostgreSQL database with direct SQL access (e.g. for data migration, warehousing, etc), you need to be aware of how Hibernate persists @Lob String attribute values in order to avoid losing information.

In PostgreSQL 9.x, all large objects are stored in a single table. Each large object has a unique identifier (UID). PostgreSQL provides many functions that can be used in SQL DML statements to perform CRUD operations on large objects.

The Hibernate PostgreSQL9Dialect stores @Lob String attribute values by explicitly creating a large object instance, and then storing the UID of the object in the column associated with attribute. If we ask Hibernate to generate the schema for our NoteEntity, we’d see that it gets mapped to table like this:

  Column   |  Type  | Modifiers 
-----------+--------+-----------
 id        | bigint | not null
 note_text | text   | 

The thing to notice here is that it uses text type for the column associated with our @Lob String. This column can store the text of the comment itself, and it allows text of arbitrary length, but Hibernate doesn’t use the column in that way. Let’s persist a couple of notes and see what happens.

[java]
entityManager.persist(new NoteEntity(“This is a note!”));
entityManager.persist(new NoteEntity(“This is another note!”));
[/java]

If we then query the note table directly we’ll see something like this:

  id   | note_text  
-------+-----------
 16675 | 64419
 16677 | 64420

Obviously, the text of our notes isn’t really in the column. So where is it? The answer is that Hibernate explicitly created a large object for each note, and stored the UID of the object in the column. If we use some PostgreSQL large object functions, we can retrieve the text itself.

[sql]
SELECT id,
convert_from(loread(
lo_open(note_text::int, x’40000′::int), x’40000′::int), ‘UTF-8’)
AS note_text
FROM note
[/sql]

  id   | note_text  
-------+-----------------------
 16675 | This is a note!
 16677 | This is another note!

Okay, so why do you care? So long as Hibernate is the only thing that is manipulating your database, you probably don’t. Well, you might care if you wanted to do ad-hoc queries of the notes, since the query for the note text is not exactly intuitive. You certainly will care if you plan to use SQL queries to extract the data from the database and put it somewhere else — you probably don’t want to copy the string containing the UID for the large object that contains the text into another database.