Jason Davies → Word Cloud Generator
How the Word Cloud Generator Works
The layout algorithm for positioning words without overlap is available on GitHub under an open source license as d3-cloud. Note that this is the only the layout algorithm and any code for converting text into words and rendering the final output requires additional development.
As word placement can be quite slow for more than a few hundred words, the layout algorithm can be run asynchronously, with a configurable time step size. This makes it possible to animate words as they are placed without stuttering. It is recommended to always use a time step even without animations as it prevents the browser’s event loop from blocking while placing the words.
The layout algorithm itself is incredibly simple. For each word, starting
with the most “important”:
- Attempt to place the word at some starting point: usually near the
middle, or somewhere on a central horizontal line.
- If the word intersects with any previously-placed words, move it one step
along an increasing spiral. Repeat until no intersections are found.
The hard part is making it perform efficiently! According to Jonathan
Feinberg, Wordle uses a combination of
hierarchical bounding boxes and quadtrees to achieve reasonable speeds.
Glyphs in JavaScript
There isn’t a way to retrieve precise glyph shapes via the DOM, except
perhaps for SVG fonts. Instead, we draw each word to a hidden canvas element,
and retrieve the pixel data.
Retrieving the pixel data separately for each word is expensive, so we draw
as many words as possible and then retrieve their pixels in a batch operation.
Sprites and Masks
My initial implementation performed collision detection using sprite masks.
Once a word is placed, it doesn't move, so we can copy it to the appropriate
position in a larger sprite representing the whole placement area.
The advantage of this is that collision detection only involves comparing a
candidate sprite with the relevant area of this larger sprite, rather than
comparing with each previous word separately.
Somewhat surprisingly, a simple low-level hack made a tremendous difference:
when constructing the sprite I compressed blocks of 32 1-bit pixels into 32-bit
integers, thus reducing the number of checks (and memory) by 32 times.
In fact, this turned out to beat my hierarchical bounding box with quadtree
implementation on everything I tried it on (even very large areas and font
sizes). I think this is primarily because the sprite version only needs to
perform a single collision test per candidate area, whereas the bounding box
version has to compare with every other previously-placed word that overlaps
slightly with the candidate area.
Another possibility would be to merge a word’s tree with a single large tree
once it is placed. I think this operation would be fairly expensive though
compared with the analagous sprite mask operation, which is essentially ORing a
whole block.
Hierarchical Bounding Boxes
I didn’t want the hierarchical bounding box code to go to waste. Click to
see more glyphs, or drag to test for collisions!
Future Work
- I think a bit more performance can be squeezed out of the sprite
collision detections e.g. by using a hierarchy of coarse sprite masks, but
the 32-bit compression trick is a bit fiddly to implement.
- It should be possible to use HTML and CSS3 instead of SVG, but the
positions currently refer to the bottom-centre of each word so this makes it
slightly trickier in CSS3.
Beware!
Use word clouds with care; while they’re æsthetically pleasing, there are
almost always better ways to visualise and analyse data. For an in-depth
rant about how they can be bad, see:
- Word clouds considered harmful
by Jacob Harris, a New York Times senior software architect
(via FlowingData).
Further Reading
- Wordle. The original Java applet by Jonathan Feinberg.
- Beautiful Visualization, Chapter 3 by Jonathan Feinberg.
- emotion.fractal: a similar concept by Jared Tarball. Sizes words randomly before greedily packing them.
- Participatory Visualization with Wordle by
Fernanda B. Viégas, Martin Wattenberg, and Jonathan Feinberg.
- Comparison of Tag Cloud Layouts: Task-Related Performance and Visual Exploration by Lohmann, S., Ziegler, J., Tetzlaff, L.
- Algorithm to implement something like Wordle on StackOverflow: a handy list of Wordle implementations.
- Tag cloud on Wikipedia.
The thrillingness was slightly shorn of its vibrations by the return of her mother, who had a great deal to say about the felicitous manner in which she had opened the bazaar. She had brought back with her a small plush monkey climbing a string, and a realistic representation of a spiders web, with a woolly spider sitting in the middle of it. The rim of the web was fitted with hooks, so that you could hang it up anywhere. She selected the base of the pink clock as the most suitable site. "Why, he's going to Japan," said Mrs. Bassett. With the first streak of dawn the boys were on deck, where they were joined by Doctor Bronson. The sun was just rising when the steamer dropped her anchor, and, consequently, their first day in the new country was begun very early. There was an abundance of sights for the young eyes, and no lack of subjects for conversation. XV VENUS AND MARS "Well, neither shall I." CHAPTER XLVII. FLOWN. To search out the secret of things, and putting away two trunkfuls of clothes (it doesn't seem believable 28 Jeff! Dick knelt and lifted the mans head. The White explained carefully that it was not a contract, that it was nothing at all, in fact. Webster's Primary, but I aint to be menshuned in the saim "Go ahead, there, and divide them rations, as I ordered you, and be quick about it, for we must hurry off." "Thought you'd be interested to hear. I remember as how you used to be unaccountable friendly wud them Jurys, considering the difference in your position." HoMEëԭ
ENTER NUMBET 0016jnlpsw.com.cn
iegvc.com.cn
kaichebao.org.cn
www.fdukcg.com.cn
ilynn.com.cn
www.qbcpzy.com.cn
pchao.com.cn
www.rockderma.com.cn
ngchain.com.cn
www.wztnre.com.cn