How to choose between Relational and Non-Relational Databases

When you stumbled onto this post, you were probably hoping for some simple advice that would help you choose correctly every single time, but there really isn’t an absolute answer to this question. The decision generally depends on the application and how that application will be developed and used. I know, developers really hate answers like that. Just tell me how to choose! has been a refrain I’ve heard over and over again in my time as a technology architect. My goal is to help you learn how to choose, not to make the choice for you or to give you a step-by-step decision tree, because it’s never that easy. Let’s get started.

It helps a lot to really invest the time to understand the architectural differences between relational and non-relational database products, which go well beyond the rather superficial differences conveyed by the term relational. There are some general differences in approach that characterize the SQL vs no-SQL worlds, but you’ve really got to compare specific technologies (e.g. PostgreSQL compared to MongoDB, or MySQL compared to DynamoDB) because as these products continue to evolve there is more and more overlap between them in areas that used to be key differentiators.

In comparing, be sure you’re thinking about the needs of your application and not just comparing the capabilities in the abstract. Think about the patterns of data query and update in your application and consider how this will be accomplished with a given database. A significant amount of your thought and effort in developing and maintaining an application will go into the work of getting your data into and out of your database of choice reasonably efficiently (relative to the expectations of your users), and the time you invest to learn how that will work for different database choices will pay off handsomely.

Learn the CAP theorem and how it applies to the database products you’re considering. Learn about transactions and how transactions support consistency and over what scopes of changes to the structures stored in the database. Learn about how a particular database choice scales to accommodate a growing data set.  As a developer you don’t need to have a deep understanding of all of these different aspects of the choice, but you really do need to have a basic understanding in order to make good choices for yourself or to help your team make good choices.

All of that said, there are some key considerations that often help in making a decision.

  1. Relational databases have been around for a long time and there are many excellent and technically comparable implementations, which are well understood and supported. There’s a lot of collective know-how that you can leverage in getting things done with relational databases. There are some great communities of support around NoSQL databases, but knowledge of SQL and relational databases is unquestionably more pervasive.
  2. Non-relational databases tend to be very flexible about data modeling, allowing you to store almost any data structure without having to statically define that structure through a schema. This allows the database structure to evolve somewhat more easily and naturally with the evolution of your application, and is often cited as the reason to choose a non-relational approach. That said, there are lots of nice approaches available these days to support schema migration as your application evolves, which eases the burden if your choice is to use a relational database.
  3. It used be the case that transaction support was a key differentiator. Because non-relational databases tend to be really good at (horizontal) scaling and partitioning, the CAP theorem implies that they can’t be quite as good at consistency, which is where transactions come in. You can’t just say database A supports transactions but database B does not, because there are shades of support across most database products. You have to think about the transactions your application needs to perform on your data and consider your consistency requirements. A very large set of applications that perform relatively basic CRUD operations, even on rather sophisticated data structures, can be implemented quite successfully with the “less capable” transaction support of almost any NoSQL database.
  4. In enterprise situations, there are a host of other factors to consider because no application database is an island in an enterprise. Here you’ll want to chat with the person who plays the role of enterprise architect to make sure that all of those larger enterprise-y considerations (like distributed transactions, data integration, enterprise-perspective consistency across databases, and many many more) are properly addressed. Also some enterprises tend to favor singular technology choices, and that choice is rarely left to development teams.
  5. If your database is going to be very large or distributed to more than one location, or needs to support a very large set of concurrent users, you should spend a lot more time consulting with experienced database architects before making any decision about database choice.

Like so many architecture decisions, good technology architects rely on a very broad set of knowledge along with experiences and subtle nuances when making choices about database technologies. As a developer, it will serve you well to collect experiences to better inform your understanding of how these choices really work in practice.  When you have the freedom to choose for yourself, be sure to take the opportunity to expand your experience by making a different choice every once in a while.

Lastly, don’t get too hung up on the making the “correct” choice. Except in those extremes mentioned in item 5 above, few of the decisions that you might make about database choice will be categorically wrong, because most database products that you might choose can successfully support a very wide range of applications. There might be some significant advantages in one choice versus another, but it can often be difficult to appreciate those advantages until you’ve had a lot of experience working in many different camps.

Hibernate, PostgreSQL, and @Lob String

Suppose we have the following entity which represents a persistent note — just some arbitrary text. When we use @Lob on an attribute of type String, the JPA specification says that the persistence provider should use a large object of character type to store the attribute value.

[java]
@Entity
@Table(name = “note”)
@Access(AccessType.FIELD)
class NoteEntity {

@Id
private Long id;

@Lob
@Column(name = “note_text”)
private String noteText;

public NoteEntity() { }

public NoteEntity(String noteText) { this.noteText = noteText }
}
[/java]

The challenge for the persistence provider is that large object support in databases is not standardized, and thus the means by which this feature is implemented tends to be rather database specific.

When using Hibernate with the PostgreSQL dialect, the manner in which a string attribute annotated with @Lob is recorded in the database is unusual. If you’re combining Hibernate-based persistence on a PostgreSQL database with direct SQL access (e.g. for data migration, warehousing, etc), you need to be aware of how Hibernate persists @Lob String attribute values in order to avoid losing information.

In PostgreSQL 9.x, all large objects are stored in a single table. Each large object has a unique identifier (UID). PostgreSQL provides many functions that can be used in SQL DML statements to perform CRUD operations on large objects.

The Hibernate PostgreSQL9Dialect stores @Lob String attribute values by explicitly creating a large object instance, and then storing the UID of the object in the column associated with attribute. If we ask Hibernate to generate the schema for our NoteEntity, we’d see that it gets mapped to table like this:

  Column   |  Type  | Modifiers 
-----------+--------+-----------
 id        | bigint | not null
 note_text | text   | 

The thing to notice here is that it uses text type for the column associated with our @Lob String. This column can store the text of the comment itself, and it allows text of arbitrary length, but Hibernate doesn’t use the column in that way. Let’s persist a couple of notes and see what happens.

[java]
entityManager.persist(new NoteEntity(“This is a note!”));
entityManager.persist(new NoteEntity(“This is another note!”));
[/java]

If we then query the note table directly we’ll see something like this:

  id   | note_text  
-------+-----------
 16675 | 64419
 16677 | 64420

Obviously, the text of our notes isn’t really in the column. So where is it? The answer is that Hibernate explicitly created a large object for each note, and stored the UID of the object in the column. If we use some PostgreSQL large object functions, we can retrieve the text itself.

[sql]
SELECT id,
convert_from(loread(
lo_open(note_text::int, x’40000′::int), x’40000′::int), ‘UTF-8’)
AS note_text
FROM note
[/sql]

  id   | note_text  
-------+-----------------------
 16675 | This is a note!
 16677 | This is another note!

Okay, so why do you care? So long as Hibernate is the only thing that is manipulating your database, you probably don’t. Well, you might care if you wanted to do ad-hoc queries of the notes, since the query for the note text is not exactly intuitive. You certainly will care if you plan to use SQL queries to extract the data from the database and put it somewhere else — you probably don’t want to copy the string containing the UID for the large object that contains the text into another database.

Hibernate Envers, @OneToMany List, and @OrderColumn

Using Hibernate Envers with a @OneToMany association for a List property that uses @OrderColumn requires a little extra glue. Without this extra glue, Envers will throw a NullPointerException in a lazy initializer when you try to load revisions.

Suppose our domain model has a Sponsor entity. A Sponsor can have multiple business names, represented by a Sponsor Name entity. The names are ordered by preference, so we use a @OneToMany with a List. Here’s how we might map the Sponsor entity.

[java]
@Entity
@Audited
@Access(AccessType.FIELD)
public class SponsorEntity {

@OneToMany(mappedBy = “sponsor”, fetch = FetchType.LAZY, orphanRemoval = true)
@OrderColumn(name = “name_index”)
private List names = new ArrayList<>();

}
[/java]

And here’s the SponsorNameEntity.

[java]
@Entity
@Audited
@Access(AccessType.FIELD)
public class SponsorNameEntity {

@Column(nullable = false)
private String name;

@ManyToOne(optional = false, fetch = FetchType.LAZY)
private SponsorEntity sponsor;

}
[/java]

If we use these entities with Hibernate Envers, we’ll get an NPE when we try to load revisions of the SponsorEntity. It happens because the name_index column specified by the @OrderColumn annotation is not included in the audit table for SponsorNameEntity.

We can fix this easily, with an additional field on SponsorNameEntity and an extra annotation on SponsorEntity. The extra field on SponsorNameEntity is used to expose the value of the order column as a field:

[java]
@Entity
@Audited
@Access(AccessType.FIELD)
public class SponsorNameEntity {

// This field is used to capture the value of the column named
// in the @OrderColumn annotation on the referencing entity.
@Column(insertable = false, updatable = false)
private int name_index;

@Column(nullable = false)
private String name;

@ManyToOne(optional = false, fetch = FetchType.LAZY)
private SponsorEntity sponsor;

}
[/java]

In SponsorEntity we use an additional annotation to inform Envers about the field that contains the order column value:

[java]
@Entity
@Audited
@Access(AccessType.FIELD)
public class SponsorEntity {

@OneToMany(mappedBy = “sponsor”, fetch = FetchType.LAZY, orphanRemoval = true)
@OrderColumn(name = “name_index”)
@AuditMappedBy(mappedBy = “sponsor”, positionMappedBy = “name_index”)
private List names = new ArrayList<>();

}
[/java]

The positionMappedBy attribute of the @AuditMappedBy annotation informs Envers that the position of each entry in the list is given by the value of the name_index field that we added to SponsorNameEntity. It seems a little redundant, but we’re required to also specify the value of the mappedBy attribute, which should be the same as the value given in the @OneToMany annotation.

If you discover that you need this fix after you’ve already got some revisions out there in your audit tables, don’t forget to

  1. Add the column specified by @OrderColumn to the appropriate audit table. In our example, this column goes into the audit table for SponsorNameEntity.
  2. Initialize the column value in each of the existing rows. You can usually query the table associated with the entity to get the appropriate index value — e.g. in our case we’d query the table for SponsorNameEntity to get the name_index column value and use it to update the corresponding rows in the audit table.

Property Injection with CDI

Over the last several months, I’ve been working on a large enterprise web application development effort using Java EE 7. As a long user of the Spring Framework for Java web application development, one of the things I immediately missed were the simple property injection techniques supported by Spring’s PropertyOverrideConfigurer and PropertyPlaceholderConfigurer.

Out of the box, CDI provides no good solution for property injection. However, CDI’s really outstanding extension mechanism makes it really easy to get CDI to do things that it doesn’t inherently support. I made a CDI extension named Pinject that provides all of the property injection support needed by the typical Java web application.

It’s as simple as this:

[java]
package com.example.myapp;

public class MyBean implements MyService {

@Inject
@Property
private URL location;

@Inject
@Property
private int maxRetries;

}
[/java]

And we place a properties file named beans.properties on the classpath containing the
values to inject:

[java]
com.example.myapp.MyBean.location=http://www.google.com
com.example.myapp.MyBean.maxRetries=3
[/java]

Pinject provides a great deal of flexibility in resolving and converting property values and injecting them into your beans. It even provides its own extension points so that you can easily cover use cases I didn’t think to provide. Please check it out and let me know if you find it useful.

Wildfly Datasource Deployment

One of the things I really like about Wildfly and its immediate predecessors is the ability to deploy JDBC drivers and datasource configurations without needing to edit the configuration or restart the container.

JDBC Driver Deployments

The ability to treat a JDBC driver as a deployment depends on the Java service loader mechanism that was introduced with JDK 6 and incorporated in JDBC 4.  Virtually every database vendor has a JDBC4-compliant driver that can be auto-deployed in Wildfly by simply dropping the JAR file for the driver into the container’s deployments directory.  Once the JDBC driver has been deployed, one or more datasources can be configured that make use of it.

DataSource Deployments

The ability to deploy datasource configurations is enabled by IronJacamar which Wildfly uses for its Java Connector Architecture (JCA) support.   An IronJacamar datasource configuration is specified in an XML file that defines the JNDI name for the datasource along with all of the configuration needed to access and manage the underlying database resource.

Here’s an example (non-XA) datasource configuration for a MySQL datasource. As you can see, most of the element and attribute values in the file are fairly intuitive.

[xml]



jdbc:mysql://localhost:3306/my_database
com.mysql.jdbc.Driver
mysql-connector-java-5.1.29.jar

my_username my_password








[/xml]

You can deploy a datasource descriptor such as the previous example, by placing it a file with name that is suffixed with -ds.xml and dropping it into the deployments directory. The example datasource shown above might be named my_datasource-ds.xml.

By copying my_datasource-ds.xml into the deployments directory, IronJacamar will use the database driver specified in the descriptor to create a datasource object and bind it into the container’s JDNI registry with the given JNDI name. Once the datasource is deployed, your applications can access it just as they would any other JNDI resource.

XA DataSource Deployments

IronJacamar also supports XA datasources, which are important if you need to make use of transactions that involve more than one database and/or a database and a JMS message broker.

The configuration descriptor for an XA datasource is quite similar to the basic datasource, as you can see in the following example for an XA datasource using a PostgreSQL driver.

[xml]


localhost
5458
my_database
org.postgresql.xa.PGXADataSource

postgresql-9.1-902.jdbc4.jar

my_username my_password



true
false





[/xml]

Note that the property names specified by <xa-datasource-property> are driver specific — a driver from another database vendor would use different property names.

References

MySQL, Java EE, and Transactions

This probably isn’t news to anyone but me, but I thought I’d post it anyway for anyone else who reads the documentation only as a last resort.

I was recently debugging a Java EE application that was performing some errant database updates at times when it should have been rolling back the transaction. My first thought was that the (checked) exception being thrown wasn’t properly marked for rollback. After some investigation, I determined that the transaction was indeed being rolled back, but the database seemed to be ignoring the rollback and updating the database anyway.

Digging in a bit more, I found out that the application was using MySQL 5.1.x. I spent some time reviewing the user manual and discovered that the default database engine in that version of MySQL does not support transactions. I found this a bit surprising (hasn’t transaction support been around for a couple of decades now?) but switching to a different database engine in the MySQL configuration was simple enough. I found the instructions for doing so in the section of the manual entitled Setting the Storage Engine.

Basically it all boiled down to adding this line to the my.cnf configuration file:

default_storage_engine = innodb

It was necessary to drop the application schema and recreate it.

Apparently, later versions of MySQL use the InnoDB storage engine by default.

JSF Resources and Security Constraints in web.xml

If your JSF application uses the standard Java Servlet security mechanisms (<security-role>, <security-constraint>, <login-config>, et al), and your application allows a mixture of public and non-public access, you’ll probably want to make the JSF resource library available to the browsers of both public and non-public users.

Assuming that you’re using the JSF resource library mechanisms (like <h:outputStylesheet>), you’ll need this security constraint:

[xml]


Public Resources
/javax.faces.resource/*




[/xml]

If (like me) you’re mixing use of JSF tags like (<h:outputStylesheet>) with some direct references to resources, you’ll also want to include a URL pattern that allows that direct access:

[xml]


Public Resources
/javax.faces.resource/*
/resources/*




[/xml]

Since these security constraints don’t specify an auth constraint, they are accessible to any browser that requests them. As noted, you can still include a <user-data-constraint> to enable SSL, if you like.

UNIX-compatible Password Encryption for Java

I’m working on a project in which there is a need to encrypt passwords in format that is compatible with the crypt(3) function from the GNU C Library. I looked around a bit and found a couple of alternatives, but none that implemented the SHA-512 and SHA-256 variants which are now commonplace on Linux systems.

Lacking any better alternatives, I produced my own Crypt4j module which provides a crypt(3) compatible implementation in Java, by using the open-source C implementations as a reference.

Crypt4j is open source, and uses the ASL 2.0 license. If you use it in your own project, I’d like to hear about it.

Bootstrap, Glyphicons, and JavaServer Faces 2.x

Having trouble getting Glyphicons to work with JavaServer Faces (JSF) 2.x?

The Problem

You’re probably using the JSF resource mechanism to include the Bootstrap stylesheets and scripts:

[xml]


[/xml]

When you use these tags, the resulting link that is included in the page looks like this:

[html]
[/html]

Note that the href doesn’t contain the actual path to the stylesheet resource. When the browser requests this resource, the Faces servlet is going to interpret the request URI and query parameters to figure out which resource is needed.

So now what happens if this stylesheet contains relative URLs? Glyphicons are defined by font resources that need to be loaded by the browser. If we look at the non-minified bootstrap.css, we can see that it uses relative URLs to reference the necessary font resources:

[css]
@font-face {
font-family: ‘Glyphicons Halflings’;
src: url(‘../fonts/glyphicons-halflings-regular.eot’);
src: url(‘../fonts/glyphicons-halflings-regular.eot?#iefix’)
format(’embedded-opentype’),
url(‘../fonts/glyphicons-halflings-regular.woff’)
format(‘woff’),
url(‘../fonts/glyphicons-halflings-regular.ttf’)
format(‘truetype’),
url(‘../fonts/glyphicons-halflingsregular.svg#glyphicons_halflingsregular’)
format(‘svg’);
}
[/css]

These URLs are relative to the URL for the stylesheet itself, so when the browser request the first font resource here, it’s going to use this URL:

[html]
/myapp/javax.faces.resource/fonts/glyphicons-halflings-regular.eot
[/html]

Of course, the Faces servlet doesn’t know how to interpret this request. In fact, depending on how you’ve configured the <servlet-mapping/> for the Faces servlet, it might not even be asked to handle this request.

Since the URL for the font resource doesn’t correspond to the path of an actual resource in your Faces application, the browser gets a 404 when it makes this request, and consequently the Glyphicons in your application are all broken.

Solution 1

The easy solution is to use an ordinary <link> instead of using <h:outputStylesheet>. In this case:

[html]
[/html]

Note that I’m using an EL expression to get the context path, and simply appending the path to the actual stylesheet resource in my application. Now, the relative URLs in the stylesheet will work just fine (assuming that I put Bootstrap’s fonts folder alongside the css folder).

Solution 2

If you really want to use the JSF resource mechanism, you’re going to need to modify the URLs in the Bootstrap’s font face definition. Obviously, you’re going to want to work with the non-minified version of the stylesheet here.

You might be surprised that you can use an EL expression inside of a stylesheet, in the same way that you would in a facelet page. The idea here is to replace each of the relative URLs in the font face definition with an EL expression that will produce a Faces-compatible URL:

[css]
@font-face {
font-family: ‘Glyphicons Halflings’;
src: url(“#{resource[‘fonts:glyphicons-halflings-regular.eot’]}”);
src: url(“#{resource[‘fonts:glyphicons-halflings-regular.eot?#iefix’]}”)
format(’embedded-opentype’),
url(“#{resource[‘fonts:glyphicons-halflings-regular.woff’]}”)
format(‘woff’),
url(“#{resource[‘fonts:glyphicons-halflings-regular.ttf’]}”)
format(‘truetype’),
url(“#{resource[‘fonts:glyphicons-halflings-regular.svg#glyphicons_halflingsregular’]}”)
format(‘svg’);
}
[/css]

Each expression uses the resource EL implicit object, which is a Map whose keys are composed of a JSF library name and a resource name, separated with a colon character.

Conclusion

The JSF resource mechanism doesn’t necessarily play well with third-party stylesheets and scripts. In my opinion, the need to modify a third-party stylesheet is sufficient reason to avoid using <h:outputStylesheet/> in such cases. However, if you’re really set on using the JSF resource mechanism, you can use EL expressions to replace relative URLs and make it all work.