QuickDrawGH is out!

Last week we published our first official public Grasshopper plugin, QuickDrawGH, available now on Food4Rhino.

QuickDrawGH is a set of components that allows for the usage of Google's "Quick, Draw!" dataset inside of Grasshopper. The components are Format, Load, Read, and Draw. It is meant to be a utility for creating digital art, pen plotter art, laser cutting and engraving, or simply exploring the millions of doodles that Google has crowdsourced.

In the spirit of the "Quick, Draw!" dataset, we have shared the code for each component on GitHub for anyone to study and build upon further.

Read on if you are interested in learning about the coding process of QuickDrawGH.

The first thing to know about the "Quick, Draw!" dataset is that it isn't a .zip file of a bunch of vector files or image files. Each of the 345 drawing categories is downloadable as a large .NDJSON file. (NDJSON stands for Newline-Delimited JavaScript Object Notation). Each file has around 100,000 lines of text in it (though some have over 200,000), and each line describes one drawing.

This comes out to about 50 million individual drawings. You can see why the data isn't stored as .svg's or images; it would take up hundreds of gigabytes. Text (as .NDJSON files) is much tidier.

Taking right from Google's Github page, this is what a typical drawing looks like:

{ "key_id":"5891796615823360", "word":"nose", "countrycode":"AE", "timestamp":"2017-03-01 20:41:36.70725 UTC", "recognized":true, "drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]] }

Although in the real file it is squished onto one line. There is really only a few parts of this that are useful for QuickDrawGH, and that is the word, the recognized tag, and the drawing array. The word is obvious, you want to know what category you are drawing from; the recognized tag conveys whether or not Google correctly guessed what the drawing was, which will be important later; and the drawing array is the long string of numbers and brackets that follows the word "drawing".

The way the drawing array is formatted is as follows:

[ [ // First stroke [x0, x1, x2, x3, ...], [y0, y1, y2, y3, ...] ], [ // Second stroke [x0, x1, x2, x3, ...], [y0, y1, y2, y3, ...] ], ... // Additional strokes ]

Where x and y are the pixel coordinates of each vertex that makes up each stroke that makes up each drawing.

So looking at how the drawings were presented, as lists of coordinates, I had to figure out a way to actually draw them in Grasshopper/Rhino, as polylines. The first step to doing this was drawing a single stroke from a single drawing. This was easy enough as it mostly involved splitting strings at brackets and commas, making points from the x and y coordinates, then drawing a polyline through those points.

The next step was to draw all the strokes that make up a single drawing. This step wasn't much more than the last step, and it meant making a list of polylines, one for each stroke. Just a bit more string splitting and list making.

Once I was able to draw that, I started thinking about the structure of the plugin. Would it be just one component? Well, that one component would have to load the files, read the drawings, and draw the drawings you ask for. Loading is fast, drawing is fast, but reading the drawings is very slow.

Better to split each task into its own component. One component to load the files, given a directory to search in and a list of categories to find files for. Another component to read a specified range of lines from the files that were loaded. And a third component to draw the lines that were read, as polylines. This way the important part, the actual drawing, can stay fast; while reading the lines from each file can be something that only has to happen infrequently.

This worked well, for one or two categories at a time. If I asked it for more (or all categories at once), it became too slow to use. Remember when I said it was 50 million drawings total? Well, turns out you can't simply ask a computer to read that many lines of text in a reasonable timeframe. And that was what I was doing when I asked it to draw from every category at once. (The total dataset is about 22gb in size, so most computers couldn't even load that much data into memory anyways.)

I had to find a way to speed this up, a lot. I tried a few things, including converting the .NDJSON files to .JSON in hopes I could somehow read from those faster (spoiler alert: it was way slower. There's a reason Google used the .NDJSON format for this, I guess).

The best idea I had was to partition the data. So instead of each category being one file with ~100,000 lines of text, I wrote some code to split each file into 10,000 line chunks, each being its own file, and gave each file a name like apple1.NDJSON, apple2.NDJSON, etc. But I didn't just partition all the data. I only wanted the recognized:true drawings, because the majority of the unrecognized drawings were scribbles or inappropriate doodles. This became the format component, the fourth and final component of the plugin.

//separate out the true lines only

List<string> trueLines = new List<string>();

for (int i = 0; i < allLines.Length; i++) {

if (allLines[i].Contains("true") {

trueLines.Add(allLines[i]);

}

//for each 10,000 lines in the trueLines list

//separate it into a new list and add to a list-of-lists "chunks"

int chunk = 10000;

var chunks = new List<List<string>>();

int chunkCount = (trueLines.Count / chunk);

if (trueLines.Count % chunk > 0) {

chunkCount++;

}

for (int i = 0; i < chunkCount; i++) {

chunks.Add(trueLines.Skip(i * chunk).Take(chunk).ToList());

}

I had reduced the maximum number of drawings the read component would ever have to deal with from 50 million to 3,450,000. (345 drawing categories multiplied by 10,000 drawings each.) That's a reduction of 93.1%. And that is only when I ask for drawings from all 345 categories. Combined with a parallel.for loop inside the read component (since it doesn't matter if the drawings come back out of order), the plugin had been sped up significantly. Now it only takes around 15 - 20 seconds to read 3,450,000 drawings.

If you enjoyed reading this explanation of coding QuickDrawGH, give us a follow on Instagram @emergent.design and @jaymezd, and keep an eye out for more content, code, and design tools from us. And give QuickDrawGH a try if you have Rhino!

Here's a few results from plotting on an Axidraw v3a3 using QuickDrawGH.