Handling Large Collections


Joe Bain

About me

I have been working at OMG Life on Autographer, a wearable camera.

I had a background writing C#, Java, TCL and Objective-C. Now freelance.

The camera takes between 500 and 2000 images a day. It is sold with an ecosystem of software - desktop, mobile and web.

The desktop software manages images from the camera for users. It allows them to import, tag and search the images.

Each image can contain GPS data and the app also lets users create videos and gifs, and share everything to social networks.


Around 20k lines of Javascript on the frontend.

2 years in development.

Backend is C++ and uses SQLite and Mongoose libraries.

On Windows it uses embedded Firefox, WebView (Safari) on OSX.

Backbone.js is used to structure the app. Currently 0.9.1 with modifications.

We use a global event system.


Backbone doesn't have controllers.

Backbone.Controller = Backbone.Events.extend();



Object pooling and DOM Element pooling

var myView = ViewPool.get();
/* ... */

var ViewPool = function() {
	this.views = [];
ViewPool.prototype.get = function() {
	if (this.views.length) {
		return this.views.splice(0,1);
	} else {
		var view = new View();
		var that = this;
		view.onDestroy(function() {
		return view;

Pooling is hard to get right.

We had a lot of bugs.

Event listeners

Make sure to remove event listeners.

Dangling event handlers will trap objects in their closures and clog up the heap.

object.listenTo() rather than event.on()

Is this necessary?

Browser memory management is getting better all the time.

We got more noticeable improvements by upgrading Firefox.

Although this is not an option for everyone.


DOM maipulation dominates


Avoid touching the DOM.

Avoid jQuery, use native methods or close to native.

Avoid empty() inside of render() in large views.

In order to perform updates as efficiently as possible, we diff the return value from the previous call to render with the new one, and generate a minimal set of changes to be applied to the DOM. - Pete Hunt, React developer

The viewable content has to be updated whenever the user scrolls one row or more.

Originally we would clear and readd every element.

Changed to incremental addictions and removals. Only add elements which are not already on the page and remove those which are no longer in view.

We build up an array of rows with images and date markers when the data changes. This is like a proxy for the DOM.

We cache about 8 pages so we don't have to make so many requests.

Bring database problems to the user

Avoid big joins in SQL, the user will be waiting for them.

Let the server return the bare minimum.

Defer loading data until as late as possible.

Every image was being loaded in with a list of tags and full metadata.

We only need to know whether the image is a favourite or has a tag.

We would load every image for a day in the month up-front.

Better user experience if we only load once the user interacts.



We added a new type for Backbone to share data between collections.

ProxyCollection example

var master = new PhotoCollection();
var proxyOne = new ProxyPhotoCollection(master);
var proxyTwo = new ProxyPhotoCollection(master);

PhotoCollection is a Backbone.Collection
ProxyPhotoCollection is a Backbone.ProxyPhotoCollection

proxyOne.fetch({pages: [1,2]});

proxyOne will load pages 1 and 2, proxyTwo will stay empty. master
proxyOne have the same contents once fetch() completes.

proxyTwo.fetch({pages: [2,3]});

proxyTwo will share the photos from page 2 with proxyOne.
master contains pages 1, 2 and 3.;

The photo is removed from all three collections.

Other problems

Chrome developer tools frequently grind and crash.

Test data sets measure in GBs.

Problems I didn't have

Page load time

Cross platform

Lessons learned

node-webkit looks like a really good solution - although it wasn't available when we started

We should have thought more about our rendering - i.e. using a templating system, native methods, or another library, rather than just using jQuery because it was there

Write more tests - this would have helped enourmously when we came to refactor to improve performance

Write benchmarks too - it's very hard to eyeball performance with so many different data sets and users. Concrete numbers are necessary to make progress

Backbone is still a good choice - it is very extensible and suitable for non-standard applications like Autographer. It doesn't assume too much.


OMG Life

Joe Bain