My friend and colleague Christian has been doing some performance optimizations for WebGate’s XPages Toolkit. And he’s found some interesting results revealing the exceptional performance of NoteCollection.
But being the obsessive that I am, I wanted to take the idea a bit further, so I thought of a few ways that the process might be faster. First, instead of walking the NoteCollection in the traditional way using .getFirstNoteID()/.getNextNoteID(), I decided to try it with .getNoteIDs(), which simply outputs the array of ints from the NoteCollection in the most direct way possible. Second, I thought it would be useful if the XPT’s DominoStorageService could create an object instance from just a Document’s metadata rather than the Document itself. That way actually accessing the source data could be deferred until properties and methods are actually called.
Of course, this would require some significant code on Christian’s part, so I decided to grab a shower while waiting to hear back from him…
I don’t know about you, but I find I get a lot of my best ideas in the shower. Maybe there’s something about the isolation and the mechanical process of getting clean that frees up the creative mind. Maybe there’s something about the warmth that surrounds you that takes your subconscious back to the womb when you had no preconceived notions about the world. Maybe there’s something even more primal about the water that triggers the random, fleeting amphibian brain at the center of our absurdly complex nervous system.
I honestly have no idea. I just know that, whether I want them or not, my best ideas seem to flow out of the showerhead and hit me in the face. All I can do is go with them to where ever they take me.
This morning, they took me to the thought that I didn’t really have to wait for Christian to adapt the XPT to support my idea of a deferred data object. I already had one: org.openntf.domino.Document. The Document in the OpenNTF Domino API is a wrapper around the original lotus Document, and since it already has extensive logic to resolve its underlying delegate in the event that it’s missing, there would be very little work in defining a Document that was nothing besides it’s parent Database and a NoteID.
So I sat down and overloaded two methods in the org.openntf.domino.Database interface to create .getDocumentByID(String noteid, boolean defer) and .getDocumentByUNID(String unid, boolean defer). The concept is simple: return a lotus.domino.Document that isn’t actually connected to a C API handle behind the scenes, and instead only links to the underlying lotus.domino API when needed. It took just over 20 minutes to go from sitting down to completed implementation.
Have I mentioned that controlling the core API is *REALLY* useful?
I didn’t get to write a unit test for it until about 10 hours later, but it turns out that this was one of those truly rare cases where the first implementation just works, and I was up and running with deferred documents.
So how to test the performance? I keep a handy NSF around for large-scale testing. It’s not THAT big at 2.78 million documents, but it’s nice and complicated with them, since it’s an NSF version of the raw movie data from imdb.com; basically, every movie or TV show ever made crammed into a Notes database. It’s a handy benchmarking tool.
I wanted to find out a few things: how did the NoteCollection/Document list strategy work when dealing with a few million records? What was the impact of using a non-optimized Selection formula on the NoteCollection? Would it make a difference if I deferred the load of the Document or not?
Since I just want to know relative impact, I didn’t do a multi-run average case benchmark. The numbers that I’ll talk about are generated just once so they might be subject to all sorts of nuances. But I think it’s the comparison where the issue really stands, so I don’t feel a need to be pedantic in my measurements.
Here’s the test case if you’re interested. It basically just grabs the movie database, gets a count on the documents, builds a NoteCollection according to some individual criteria, iterates that to build an array of Documents, then walks over a single 100-doc page of that array to build a set of output Strings. You can toggle the selection formula on an off with a comment, and you can toggle the deferred loading on and off with an argument.
If I load the entire NSF Document set into memory with full lotus.domino.local.Documents, I see the following: Done with document details for 100 out of 2787488 (2787488 total in db.) NoteCollection build took 11322ms, Document build took 15941659ms, Document page took 1ms
If I add the selection formula @Begins(Title; “B”), I see the following: Done with document details for 100 out of 134333 (2787488 total in db.) NoteCollection build took 4216ms, Document build took 469061ms, Document page took 1ms
If I switch to deferred Documents and load the entire NSF document set into memory, I see the following: Done with document details for 100 out of 2787488 (2787488 total in db.) NoteCollection build took 10157ms, Document build took 72623ms, Document page took 16ms
If I add the selection formula again, still using the deferred Documents, I see the following: Done with document details for 100 out of 134333 (2787488 total in db.) NoteCollection build took 4184ms, Document build took 4009ms, Document page took 13ms
It’s about two orders of magnitude faster. In fact, it’s fast enough that you could potentially offer a real time @formula selection filter for a database with 2.7 million documents in it, and it would be a viable case for a “working” spinner. I mean, 72.6 seconds is still a really long time to build the Document array, but it’s still far better than 4 and a half hours!
I’m delighted to find that this is a useful enhancement. I have some thoughts on how to make it even better in the future, but for now I’ll take my 10000% improvement in execution time and be happy about it. And I hope this both encourages you to try out the OpenNTF API and really think about the ways you draw performance gains in your own code.
P.S: And I can’t resist: if we add an internal HashMap to the OpenNTF Document, then we could wait until write-time to actually instantiate the delegate Document and write the appropriate values, much like the DominoDocument wrapper does in XPages. What’s more, since we could defer it, we could make the wrapper itself not have thread affinity. And if we did that, we could defer Document writes to separate threads, where the wrapper was just passed to a tasklet that writes to the NSF asynchronously.
Food for thought.