Thursday, February 19, 2009

Experiments in Web Readability: A Book

Last year I began a project to try to push readability on the Web as far as I could using today's technology, which would also help me be clearer about what I felt was missing.

I started in August, because I've been working in the Internet Explorer team at Microsoft, and that was when the team shipped Internet Explorer 8, Beta 2. Microsoft had previously announced that Internet Explorer 8 would use Web-standards rendering by default, and Beta 2 was the point at which I felt standards support became robust enough to try doing some real work using only standards markup.

We have now shipped the final version of Internet Explorer 8, and its standards support received plaudits from some unexpected sources last as the Web woke up to the reality that IE8 is not merely standards-compliant - its has the best support for CSS 2.1 of any of the current Web browsers.

Now, we're not as far forward on CSS3 as some of the others, but in my opinion the approach the team took (I can claim no personal credit for it) was exactly right. Rather than support a "mixed bag" of some CSS 2.1 and some CSS3, it's better to fill out one level first before moving on to the next. That seems to me like a disciplined, methodical and dependable approach.

I'd done some work earlier to demonstrate the use of embedded fonts in Web pages to improve readability. However, not being an HTML or CSS guru - but more of a type and design guy, and primarily a writer - I'd used print publishing software to create multiple-column layouts and automatically generate the Web pages from there.

The reaction I got from Web purists - well, it was more like jihad than criticism or useful suggestion...

But they did have a point. The code output from the publishing application was obscure, obtuse, not understandable by any human, and impossible to edit. HTML and CSS validation tools from the W3C, available online ( and ), basically told me I was an idiot, and that my markup wouldn't pass their scrutiny - ever.

So I decided to teach myself some HTML and CSS and do things the hard way, coding by hand. I tried a few editors, but found the one I liked best was Notepad++, available free at If I'd been more of a programmer, I'd probably have used Microsoft Visual Studio or Expression Web, both of which will also output Web-standards markup. (They also have a lot more programming power, so you can do many other things than just hand-edit code - but that was overkill for someone like me).

Now, I was learning as I went. Mea culpa. You can drive a bus through the holes in my coding. It wasn't intended as a coding sample. It wasn't designed for accessibility. I just needed it to work. I did, however, run the W3C's HTML and CSS validation tools on every page.

Next thing was to decide on a first project. A book seemed like the right way to go, since the layout would be pretty straightforward. I could have merely toyed with a few pages of a book. But I wanted to do a real project - an entire book, from "cover" to "cover". Apart from anything else, that would reveal any problems of scaling.

I chose The Mabinogion, a book of mediaeval Welsh tales and Arthurian stories, which was translated from the Welsh in 1849 - no copyright problems there!

I downloaded the text from Project Gutenberg. PG has done us all a great service in converting all this public-domain content - but it's impossible to read as it is, and would really benefit from better layout and attention to readability.

The first thing I wanted to do was create a design which would work with the Web browser in Full Screen mode. All those buttons, address bars, menu bars etc. are great when I'm trying to find content. But once I've found it, they're a distraction; I just want them to go away while I read. Internet Explorer lets you hide everything using the F11 shortcut. Firefox does the same. Sadly, Safari has no FullScreen capability, and neither does Chrome.

I created a title bar at the top, which also had Page Forward-Back and Chapter Forward-Back buttons. The only other features I included on the title bar (I admit, partly out of anti-jihadist mischievousness) were the W3C's "Validated HTML" and "Validated CSS" logos.

I did not want scrolling. I wanted completely paginated content. There's no question it's more readable that way. Of course, that creates a dilemma. Despite the fact that Web content is supposed to be adaptive, it's not really. Almost 20 years ago, when the first Web browser was built, the engineers involved took the "easy option" of creating the bottomless scrolling window for content.

It was an understandable, expedient engineering decision at the time. It's much, much harder to set content in pages. But it makes all the difference in the world when you do. Browsers can't do this yet. And that means you have to do it manually (which sucks, and turns what should be a simple task into a Labor of Hercules).

Isn't it terrible that the Web today is still paying for an expediency decision taken 16 years ago at NCSA when they developed the first Mosaic browser?

Today, you can do pagination if you're working with, say, database-type data, or search results - where you're dealing with lots of individual paragraphs.

But if you want to do the same thing with flowing text that's meant for reading, you have to decide on a page size, then create lots of pages manually. This approach is totally impractical in any production scenario; however, if I wanted to show the benefit of paginated content, that's what I'd have to do.

I've taken a lot of flak for this approach from people like Joe Clark, who's accused me of trying to drag the Web backwards by "apeing 19th- and 20th-Century print layouts" (and a lot more). But I've been studying text and layout for about 40 years, and I contend that print layouts didn't happen by accident. They developed as a result of more than five centuries of evolution, to optimize for the way human visual perception works.

The Web, and the computer screen, haven't changed human perception. And they haven't yet evolved to optimize for it. So, I'm trying to adapt what we've learned over 550 years or so to the Web. I can't yet get where I'd like to end up - but I know where I'm going.

Anyway, I decided upfront that the page size would be 1440x900 pixels, which happens to be the size of my laptop display - a MacBook Pro (running Windows Vista, which I vastly prefer to OSX).

With the aspect ratio of a typical laptop, single-column layout doesn't really work; two-column is much better for a book (even that's a little bit too wide).

Multi-column layout is out there in CSS3; but the best way to do it right now is using Javascript. I found one out on the Web which did the trick. There were issues, which I'll come to later.

That decision made, it was time to get to what's probably the most important decision when laying out a book - the body text. I chose Cambria, which is one of the ClearType-optimized faces we created and shipped in Office 2007, MacOffice2008 and in Vista.

I used the WEFT tool to create Embedded OpenType font objects of all the fonts I planned to use in my projects: Cambria, Calibri, and Candara, all from the same ClearType-optimized collection. It has to be said that we have neglected that tool, and it wasn't as simple or straightforward as I'd hoped. But once I got it working, I could copy and paste the same CSS "@ font-face" declarations into the style sheets for the other projects and use the same .EOT font objects over and over.

I created subsets of the fonts, with only the Basic Latin character set, because I never write in Cyrillic or many of the other languages the complete font supports. That kept the size of the downloadable font objects down. I could have used per-page subsetting or per-site, but that might mean creating new objects every time I added more content - and I only wanted to do this once.

I chose 12point as the body text size (not pixels, a relative dimension, but points - an absolute size). I realized hard-coding the text size would create problems in future, because it didn't support making the text bigger for people who wanted/needed "large print". But the point of doing this book was to show that a readable book could be done on the Web.

One other decision I made up front was to make a very clear division between content and formatting. HTML is the child of SGML and XML, in which content and formatting were supposed to be strictly separated.

So I decided to put ALL formatting in the CSS and only structural markup in the HTML. A few times during this series of projects I broke my own rules in minor ways, (like using markup for italics in the HTML because I was in a hurry), but I managed to remain really strict almost all the time.

I found that on a laptop a two-column layout gave line-lengths which were slightly longer than ideal. I'd probably go in and increase margin and gutter sizes to take care of this in a production scenario. I also found that even with these long lines, word-spacing still got too wide sometimes, so I added another Javascript I found on the Web to hyphenate the problem lines.

I defined styles for headings, pull-quotes, etc. I know I allowed the CSS to get a bit out of hand and it could be cleaned up a lot if I went back to it now. But I was learning as I went...

Manual pagination was a real slog. I ended up with 112 separate Web pages...

The process of creating them became reasonably automatic after a while. But cleaning up the Project Gutenberg text in order to remove some line breaks, replacing their "inches and feet" marks with the right HTML entity coding for single- and double-quotes, apostrophes, etc. was tedious.

Since quotes are used in the HTML markup, you can't do a global Search and Replace. And because some of the quotes are opening ones, and others closing ones, you have to make sure you get the right ones in the right places. (Some closing quotes were missing in the PG markup, too). All this means any attempt at automation is limited; a lot of manual supervision is still needed.

Getting columns and pages to break in the right places was less than trivial, too. The multi-column Javascript uses the Document Object Model (DOM), so it isn't aware of anything smaller than a paragraph.

In practice, that means if you have long paragraphs you'll end up with weird column and page breaks. The Mabinogion was translated in 1849 - when long paragraphs were much more fashionable than they are now. So it meant going in and doing a lot of manual editing, breaking long paragraphs into shorter ones to make column-breaks work. However, eventually I got it all working pretty much the way I wanted.

I hit a problem later. I was updating my version of Internet Explorer with daily builds, from the Beta 2 with which I'd started. And some time along the way an Embedded OpenType bug crept into the Release Candidate builds. It was fixed long before the final release, but if you're running RC1 you'll find the EOT fonts don't always appear, and the text looks awful. Hitting Refresh is a temporary workaround. Go and download the final shipping version if you haven't already done so.

Internet Explorer is currently the only browser which supports EOT font embedding, of course.

I posted the finished book on my website:

Then I went back and read it. For the first time ever, I comfortably read a whole book on the Web!

I'll talk about some of the subsequent experiments in later posts.

No comments:

Post a Comment