When it comes to software which is both powerful and versatile, nothing comes close to a web browser. Be it an Intel machine running x86 architecture or a smartphone which uses ARM microcode; web browsers offer phenomenal performance on any hardware you use. They are so powerful that they can replace a full-blown operating system, and the Chrome OS is a prime example of this.
Browsers are a work of art but have you ever wondered what goes on behind the scenes; the entire process of you entering a query and the browser returning the result? Well, in this article, we will be looking at how a browser works and how it renders webpages in a matter of seconds.
It all begins with requests and the Networking layer
When you visit a website on the Internet, all your browser does is connect to a remote computer (webserver) and request the resources to paint the page. This might look trivial, but under the hood, your browser is crunching millions of numbers to find and render the website on your screen.
To render a webpage, the first thing your browser needs to do is find the remote server that hosts the website. To do this, it tries to find the IP address of the URL you entered in the address bar. This IP address can uniquely identify the web server, and once the browser has this address, it can make requests to the server to get data.
To find the IP address the browser performs DNS resolution, which can only be done in two ways. It can either look into your browser’s cache memory which could hold the IP address of a URL if you have visited the site in the past. If that is not the case, then it requests your ISP, Google or Cloudflare to find the IP address for a particular website using their DNS servers.
Once your browser has the IP address of the website you are looking for, the networking layer of your browser gets to work. It tries to create a connection between your device and the server so that data can transfer between the two devices. To create this connection, the networking layer uses sockets, which is a way of connecting two devices on a network using their IP address and a designated port on each device.
Now that the networking layer has connected the two devices and data packets can transmit between them, the network layer starts performing the next most important task in any communication on the Internet, encryption.
To encrypt the data, the networking layer performs a TLS handshake between the two communicating devices. Once the handshake is complete, all the data travelling between the devices is encrypted and cannot be read by any third party.
The TLS handshake is only performed when data is transferred using the HTTPS protocol, and in the case of HTTP, only the TCP handshake is performed. This does not encrypt data; therefore you should never submit sensitive data over an HTTP connection as any malicious entity can see your data
After setting up a communication channel between the two devices, the networking layer sends a request to the server for the resources. In case of a webpage, this is an HTTPS/HTTP request, which asks the webserver to send an HTML file that contains all the information a browser needs to render a webpage. Once the server receives the request, it sends an HTML document to the browser in the form of ones and zeros over the communication channel, which has been established by the networking layer.
Finally, the browser has the resources it needs to render a webpage, but they are in the form of bytes and need to be converted into a format, which looks like a webpage. To do this, the browser uses its rendering engine.
Getting meaning out of bits using the Rendering Engine
Now that the networking layer has made requests to the webserver and received all the data the browser needs, the rendering engine comes into the picture.
The main job of the rendering engine is to translate the bits of data into a form that can be used by the browser to create a webpage. To understand how the rendering engine works, it is essential to understand all the parts that make up a website.
- HTML (Hyper Text Markup Language) is used to define the structure of a webpage.
- CSS (Cascading Style Sheets) is used to direct the browser on how each element on the website is supposed to look.
The rendering engine uses parsers to convert bits of data into meaningful information which can be used by the browser to render a webpage. The rendering engine has two different parsers, one for HTML and one for CSS. Let’s look at how the HTML parser works to get an idea of the parsing process.
The HTML parser takes bits of data as input and creates a logical representation of the HTML document in the memory of the device. This logical representation of data is known as the DOM structure and represents the HTML data in a hierarchical manner.
To create the DOM structure, the HTML parser performs several steps that can be described as follows
- Characterisation extracts the characters from the bytes of information that the HTML parser gets from the network layer.
- Tokenization finds the tokens in a stream of characters that help the browser in determining the structure of the data.
- Node creation after identifying the tokens and the information contained in them, the browser creates memory nodes to hold this data.
- DOM creation the parser hierarchically links the memory nodes to create a DOM representation of received bytes of data.
The HTML document that the browser receives contain links to CSS files. These links are processed by the networking layer and sent to the CSS parser. This parser creates a CSSOM (CSS Object Model) output, which defines how each element in the DOM is supposed to be styled.
Also read: How does Shazam work?
Creating the rendering tree and layout for the webpage
Once the DOM has been created, and the CSS parser has completed parsing the CSS file, the rendering engine uses a style engine to join both CSSOM and DOM. This creates a rendering tree which contains information about the structure and style of the webpage, which is to be rendered. The rendering tree only consists of visible nodes and does not have any nodes that are invisible to the user on the screen.
After creating the rendering tree, the rendering engine starts the layout process. This process takes into consideration the resolution of the screen and how each element should be placed on the device. It also calculates the size of each element that is going to be rendered on the screen and its relative position to other elements.
Now that the rendering engine has all the information about the webpage in a format that our system can understand, we can begin to render the page on the browser
Painting the canvas and compositing the webpage on the screen
Once the rendering engine has completed the layout process, it needs to paint each pixel on the screen according to the layout, which was created using the rendering tree. This process is known as rasterization, which is the process of painting the screen. Most browsers use the CPU to perform this task, but as it is a process that involves repetitive processing, it can be offloaded to the GPU for getting better results.
The painting operation occurs in a layered format, and the rendering engine creates multiple layers of elements to create the webpage. This layered structure helps the browser to make changes faster when the user interacts with the webpage.
Once all the layers have been created, the rendering engine sends this information to the user interface, displaying the webpage on the screen. This process is known as compositing the webpage and is the last step performed by the rendering engine
This process of creating the webpage from bits of data is known as the critical rendering path and is the main determinant for the performance of any webpage you visit on the Internet.
Also read: What is AES Encryption? How does it work?
Looking into the future
Also read: What is a Router and how does it work?