The browser main functionality is to present the web resource you choose, by requesting it from the server and displaying it on the browser window. The resource format is usually HTML but also PDF, image and more. The location of the resource is specified by the user using a URI (Uniform resource Identifier).
The way the browser interprets and displays HTML files is specified in the HTML and CSS specifications. These specifications are maintained by the W3C (World Wide Web Consortium) organization, which is the standards organization for the web.
Browsers’ user interface have a lot in common with each other. Among the common user interface elements are:
- Address bar for inserting the URI
- Back and forward buttons
- Bookmarking options
- A refresh and stop buttons for refreshing and stopping the loading of current documents
- Home button that gets you to your home page
Strangely enough, the browser’s user interface is not specified in any formal specification, it is just good practices shaped over years of experience and by browsers imitating each other. The HTML5 specification doesn’t define UI elements a browser must have, but lists some common elements. Among those are the address bar, status bar and tool bar. There are, of course, features unique to a specific browser like Firefox downloads manager.
The browser’s high level structure
- The user interface – this includes the address bar, back/forward button, bookmarking menu etc. Every part of the browser display except the main window where you see the requested page.
- The browser engine – the interface for querying and manipulating the rendering engine.
- The rendering engine – responsible for displaying the requested content. For example if the requested content is HTML, it is responsible for parsing the HTML and CSS and displaying the parsed content on the screen.
- Networking – used for network calls, like HTTP requests. It has platform independent interface and underneath implementations for each platform.
- UI backend – used for drawing basic widgets like combo boxes and windows. It exposes a generic interface that is not platform specific. Underneath it uses the operating system user interface methods.
- Data storage. This is a persistence layer. The browser needs to save all sorts of data on the hard disk, for examples, cookies. The new HTML specification (HTML5) defines ‘web database’ which is a complete (although light) database in the browser.
It is important to note that Chrome, unlike most browsers, holds multiple instances of the rendering engine – one for each tab,. Each tab is a separate process.
The rendering engine
The responsibility of the rendering engine is well… Rendering, that is display of the requested contents on the browser screen.
By default the rendering engine can display HTML and XML documents and images. It can display other types through a plug-in (a browser extension). An example is displaying PDF using a PDF viewer plug-in. We will talk about plug-ins and extensions in a special chapter. In this chapter we will focus on the main use case – displaying HTML and images that are formatted using CSS.
Our reference browsers – Firefox, Chrome and Safari are built upon two rendering engines. Firefox uses Gecko – a “home made” Mozilla rendering engine. Both Safari and Chrome use Webkit.
Webkit is an open source rendering engine which started as an engine for the Linux platform and was modified by Apple to support Mac and Windows. See http://webkit.org/ for more details.
The main flow
The rendering engine will start getting the contents of the requested document from the networking layer. This will usually be done in 8K chunks.
After that this is the basic flow of the rendering engine:
The rendering engine will start parsing the HTML document and turn the tags to DOM nodes in a tree called the “content tree”. It will parse the style data, both in external CSS files and in style elements. The styling information together with visual instructions in the HTML will be used to create another tree – the render tree.
The render tree contains rectangles with visual attributes like color and dimensions. The rectangles are in the right order to be displayed on the screen.
After the construction of the render tree it goes through a “layout” process. This means giving each node the exact coordinates where it should appear on the screen. The next stage is painting – the render tree will be traversed and each node will be painted using the UI backend layer.
It’s important to understand that this is a gradual process. For better user experience, the rendering engine will try to display contents on the screen as soon as possible. It will not wait until all HTML is parsed before starting to build and layout the render tree. Parts of the content will be parsed and displayed, while the process continues with the rest of the contents that keeps coming from the network.
Main flow examples
Webkit main flow
Mozilla’s Gecko rendering engine main flow
Parsing – general
Since parsing is a very significant process within the rendering engine, we will go into it a little more deeply. Let’s begin with a little introduction about parsing.
Parsing a document means translating it to some structure that makes sense – something the code can understand and use. The result of parsing is usually a tree of nodes that represent the structure of the document. It is called a parse tree or a syntax tree.
Example – parsing the expression “2 + 3 – 1” could return this tree:
mathematical expression tree node
At a high level, most modern browsers carry out the following steps to render an HTML page:
- Load the HTML
- Parse it
- Apply styles
- Build frames
- Layout the frames (flow)
- Paint the frames
1.Load: The browser tries to fetch the page from the specified location. Typically this would be thru a HTTP client. However, a HTML page may also be loaded from a filesystem. Irrespective, the loader fetches the HTML page from its location. The super important concept of Browser Cache comes into play over here – but more on this later. The way the HTML page gets loaded is different from the way the resources get loaded. In WebKit there are two different pipelines – one for loading the page and another for loading resources:(source: http://webkit.org/blog/1188/how-webkit-loads-a-web-page/)
2.Parse: As the stream comes thru from the loader, an HTML parser starts building the DOM (also called a “Content Tree”) – each node here is an HTML element. Now a lot of HTML on the net is broken, and each browser has had to implement its own quirks to parse HTML leading to subtle incompatibilities. HTML 5 however specifies the parsing algorithm. As this gets adopted, the x-browser incompatibilities because of parsing should go away. While parsing, the engine may come across resources (JS, CSS, images, fonts, etc.) When that happens the particular resource is queued for loading and parsing continues. Again, there is more to this, which we’ll tackle later.
3.Compute Styles: The browser provides a default stylesheet. Often the HTML page also has a set of styles specified. These styles need to be applied to the Content Tree. For this purpose, a “Rendering Tree” is built – this essentially consists of elements that are to be rendered. For example an element with display set to none would not appear in this tree (nor would its descendants). Nor would elements like HEAD and SCRIPT. Nodes in the Render Tree represent style information: CSS box model, z-order, opacity are all specified here
4.Construct Frames: Most render-able elements follow the CSS box model: They have height, width, border, spacing, padding, margin and position. For these objects, a rectangular box – called a Frame – is created. Not all objects have a frame – for example the SVG image above does not have a frame. It is put inside an iframe, which has a frame. A frame has all the information on how the object itself is going to be rendered. What is not known however, is how is the element going to be placed w.r.t other elements.
5.Compute Flow: Flow Computation or Layout Computation is about how elements are placed w.r.t each other and is mostly controlled by the CSS Visual Rendering Model. This is typically a recursive process from the root of the tree to leafs. Also, this is typically a lazy process – it is done on a need-basis. Basically when the layout engine determines that an element needs to be laid out (for example a newly added Node), it marks it as such by setting a dirty bit. The actual layout is done only when some method is called which requires the new information. Most browsers do flow calculation at a higher resolution than what any display would have. This is to support zooming – when the user zooms in or out, the objects can be drawn correctly on the screen without requiring any extra steps other than mapping the coordinates to real pixels.
6.Paint: Once the engine knows exactly where the objects need to be drawn, comes the process of actually rendering the objects on the screen. This process – called Painting – is described in agonizing detail in Appendix E of the CSS 2.1 Spec. This is basically a Tree walk from the root of the Rendering tree, where each node is asked to paint itself. The actual rendering is abstracted out thru a Graphics Engine which is responsible for actually turning on the pixels and things like hardware acceleration.