Lately I was assigned a task which involved something that a unit test seemed to be able to solve, but later it turned out to be way more complex than that.
Almost a third of our codebase consists of POJOs generated from Xsd descriptors scattered around in our projects. The generation is performed during build, using a Maven plugin that performs the parsing and generation tasks.
These generated classes are no more than simple value objects descending from a common root object, that contains some well-known methods like toString (), that all methods should share. Unfortunately this particular toString does not know about it’s descendants’ inner structure, so it relies heavily on reflection to perform it’s task. This springs the idea of generating the classes with their own toString methods, so the resulting code is more efficient.
To do this to be backwards compatible, I would have to generate all known xsds, invoke their old and generated toString implementation and compare the results. Also I’d have to fill these objects with values, so I’d have data to compare.
So much for the task, but what’s the fuss about? Well I want the testing kept inside Maven, so should the generated or inherited code change, we can see it in the build process.
There seem to exist a vast number of ways to be used for mojo testing, starting from the blunt get a mojo instance, fill it with values, and run execute (), to running a separate Maven instance and use the generated test results. Given that in this particular case I’d to start the compiler and then the surefire I (eventually ) went for the later approach.
The plugin testing harness is an example of the earlier approach, while it seems from the usage example the former. Avoid if you have to invoke more, than just your own mojo. It only reads the pom.xml you provide, but doesn’t set up the required Maven environment like builders, project, …etc. It doesn’t do the setup of the default values of your mojo (not that it could resolve the non-existing values for that) So all you end up with is using a helper method for setting up your mojos and executing them from your testcase. I went as far as generating and compiling my classes, but when it came to the point of actually testing them with a surefire mojo I gave up on it, as setting that up would really have been impossible.
I then realized that from testing point of view it is in fact an integration test, not a “mere” unit test, as the testing involves interaction with third party plugins (compiler and surefire) as well. So I converted the package from the plugin testing harness’s structure to the maven-invoker plugin’s recommended structure. It was nothing more than moving a few directories around, but I could simply throw the complicated mojo invocation out and use a well defined and simple pom instead. This plugin creates a local repository for itself that contains your project’s dependencies and the artifact to be tested. It’s advised to create a settings.xml that points to your local repository, to limit the unnecessary fetching from your repositories.
So far, so good. You should keep in mind however, the invoker plugin has some disadvantages. It’s way slower than the harness. It doesn’t aggregate the test results, so all you can see on a failed build, that there were errors in a particular run. Also, since it runs the test in a separate Java environment, you cannot debug your test code, which in my case is quite complex.
So you either create a separate test module and refer to it from the test pom, and create a unit test to test your code with known data, or just be brave write well thought out code and do it properly using log messages to trace your work, just like the good old times. After all repeated failure builds character. 🙂
I actually created a small and a full test environment, testing the particularities of the test separately. To debug the test code you’ve got to import the generated Maven project in your IDE and start the test execution with the local repository and the generated settings.xml.
What were the results? I achieved a speedup of about 90% for completely empty classes, about 70% for classes with data, when the output buffer is limited, and around 30% for classes recursively and completely filled in a depth of four iterations. Overall I could say it’s a 50% improvement on the original method, with a proven backwards compatibility. Considering that this is invoked at least 6 million times a day it might bring a favorable impact on the performance of our systems.
In the process I found 3 errors in the current code, that proved that the output of method’s not used to heavily in the live system. Since the logging it produces is mandatory for regulatory compliance it might just be a good idea to roll it out in the near future.