Friday, August 26, 2016

Grails JMS plugin 1.2 hands off messages too early during the plugin relay of application startup . . .

Rio 2016 Olympics concluded recently with the traditional last event of track and field: 4x100 meter relay. In relay, the teams that qualify are the ones that gracefully complete the race. This involves a great co-ordination among team runners. One important rule of this particular race is handing off the baton within a changeover box. Dropping the baton or failing to transfer within the box disqualifies the team. The baton has to be carried till finish line in the race. It is a synchronous race.

Messaging in software applications is like a relay except that it is asynchronous. It is a great way to integrate modern applications asynchronously achieving great performance and scalability with guaranteed delivery of messages. But it coms with certain complexities in handling messages gracefully. Java based applications using JMS API are no exceptions. Spring framework's JMS support greatly simplifies the use of JMS. Grails JMS plugin which underpins Spring JMS makes it even better.

Recently I had to investigate and put a fix for a known issue of Grails JMS plugin that we used in our Grails applications. The known issue is- Messages get queued up when the application is down (expected behavior, of course). But when the app comes up, listeners grab messages too early from queues before the application is fully started. If message processing involves accessing domain model or database, they error out forcing you to deal with this situation.

We have two Grails applications integrated asynchronously through JMS messaging with ActiveMQ as the underlying JMS implementation. Messages flow back and forth between two applications. The most recent app of the two is a RESTful API app developed using Grails 3.1.6 rest-api profile and JMS plugin ver 2.0.0.M1 whereas the other application is a bit older Grails web application using Grails 2.2.1 and Grails JMS plugin 1.2. The Grails 3.1.6 RESTful API app is a gateway for client web/mobile apps that exposes resources in RESTful way and hands off requests to the Grails 2.2.1 web app for further processing. The communication channel between these two apps is JMS through queues. So, it was important to make sure we don't lose any messages either side due to any application down times. During testing everything on the Grails 3.1.6 side looked good. After it came up, it successfully processed messages that were waiting in the queues during it's down time. The other Grails 2.2.1 app simply errored out with t's message processing during a similar situation when it came back up after being down when the messages arrived and waiting in the queues. The following is a high-lvil architectural diagram depicting this.
Applications integrated through JMS

Known Issue with Grails JMS plugin 1.2

As the application starts up, listener containers get started bit early in the plugin race, earlier than dataSource and Hibernate are fully up and ready. Hence, if message processing involves domain-model access or db access, it errors out.

Options tried

Upgrading the plugin to ver 1.3

The plugin version 1,3 seemed to have a fix for this issue when I looked into plugin's GitHub source code  repository. The fix seemed convincing and pretty simple from the source code change. There were three more plugin dependencies added to the loadAfter list of plugins: 'dataSource', 'hibernate', and 'hibernate4'.

Steps to upgrade plugin:

1. Change the dependency in BuildConfig.groovy from compile ":jms:1.2" to compile ":jms:1.3"

plugins { ... compile ":jms:1.3" ... }

2. Refresh dependencies using the following command:
grails -Dgrails.env=<your-env> refresh-dependencies

3. Grails build system prompts as below for a confirmation:
> You currently already have a version of the plugin installed [jms-1.2]. Do you want to update to [jms-1.3]? [y,n]

Press y to upgrade.

However, with this upgrade the issue became bit worse and all listeners completely stopped listening to messages. Not only the messages that were waiting in the queue when the app was down, but also any new messages that come after the application comes up were not picked up by application listeners. After looking at the source code, JmsGrailsPlugin.groovy and comparing it with that of 1.2 plugin, noticed that the method startListenerContainer which starts up all listener container beans was totally missing. 

Upgrading the plugin to 1.3-SNAPSHOT

Then I tried 1.3-SNAPSHOT version by following the above steps to update the plugin and it's  dependencies. This ran into spring ClassNotFoundException (java.lang.NoClassDefFoundError: org/springframework/core/type/classreading/AnnotationMetadataReadingVisitor) and the application wouldn't even come up. When looked into the plugin source code it had the three additional plugin dependencies added to the list loadAfter, and the method startListenerContainer was was also in place. When I compared this source with version to 1.2, the only change I noticed was the additional 3 plugins in the list for loadAfter.

That was puzzling to me and I read Grails documentation carefully word by word to understand how this loadAfter works. The sentence with an example: Here the plugin will be loaded after the controllers plugin if it exists, otherwise it will just be loaded caught my attention. Then I went back to check and see if we had hibernate4 plugin. Apparently, we didn't have that and as hibenate4 plugin was for Grails 2.5.0 or higher. That gave me a clue.

The fix

Having looked at 1.3, 1.3-SNAPSHOT and 1.2 and since we had 1.2 plugin checked into our source repository along with the application's source code, we decided to edit plugin code by adding just 2 dependencies: 'dataSource' and 'hibernate' to the list loadAfter. That worked beautifully as expected and we decided to go with that fix.

Here is the final fix, changed code in JmsGrailsPlugin.groovy from:
def loadAfter = ['services', 'controllers']

to:
// Load jms plugin after the following plugins. // BEWARE, if any of the listed plugins don't exist, the list is ignored // and this jms plugin gets loaded and executed // Ref: http://docs.grails.org/latest/guide/plugins.html#understandingPluginStructure def loadAfter = ['services', 'controllers', 'dataSource','hibernate']

With this fix, the application once came up, gracefully handled all pending messages that were awaiting when it  was down. Completed the message relay by handing off messages gracefully without dropping or erring out of the plugin execution race.

TIP

If due to any reason if it doesn't work, simply check application log file or by other means find out which plugin is the last one in the loading process and set loadAfter with a list containing that plugin. That should delay pushing this plugin to the end in the plugin loading process ;)

There seems no easy way to know all the plugin dependencies of a Grails project.

Running grails command grails dependency-report will list dependency graph.

If using IntelliJ, either Project Settings > Modules > dependencies,  or Project View > External Libraries or even trying to open a file (Mac: CMD + shift + O) for *GrailsPlugin.groovy should give the list of plugins for the project.

Useful Links & References







No comments:

Post a Comment