Sunday, August 19, 2012

What Happens When You Type in a website URL such as www.amazon.com or www.google.com into Your Browser and Press ‘Enter’?

This article explains  how web applications work and what technologies are involved in high level.

When you type www.amazon.com/www.google.com into your browser and press ‘Enter’, it invokes a series of operations and executes a sequence of information exchanges using standard communication and application protocols within your web browser and across the internet and an Amazon/Google web server where the website www.amazon.com//www.google.com  is hosted.

At the high level, your web browser client connects to the Amazon/Google web server over the internet, requests the Amazon/Google home page by sending the HTTP request to the server.  The Amazon/Google web server receives the HTTP request, locates the resource that is requested, processes it to build the Amazon/Google dynamic home page, constructs a HTTP response, and sends the response back to your browser. Your browser interprets received content and displays it on your browser screen.

The communications between your browser and the Amazon/Google web server can be divided into four layers: HTTP application protocol layer, TCP transmission control protocol layer, IP internet protocol layer and hardware Ethernet layer. Let’s consider the technical details of each procedure:

In order for your browser to contact the Amazon/Google web server, it needs to translate the www.amazon.com/www.google.com host name into the IP address by looking it up in your local DNS cache or querying your ISP’s DNS server configured using  TCP or UDP over the internet.

After the Amazon/Google IP address is resolved, the browser connects to the Amazon/Google web server via the TCP reliable transmission protocol at that IP address using the default HTTP listen port 80. The Amazon/Google is providing a cluster of the Amazon/Google web servers in order for high scalability and high availability, the Amazon/Google web server load balancer is used to deliver the connection request to the specific Amazon/Google web server. 

Once the TCP connection is successfully established between your browser and the Amazon/Google web server, your browser sends the following HTTP GET message to the server:
               GET / HTTP/1.1[CRLF]
               Host: www.amazon.com[CRLF]
               User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9)Firefox/3.0[CRLF]
               Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
               …..
      It is noted that cookies may be sent from your browser to the Amazon/Google web server with the HTTP GET request if the Amazon/Google web server has been visited and the cookie is not cleaned up.

      The Amazon/Google web server receives the HTTP GET request, and creates a session for this very first HTTP request. The Amazon/Google web server is a fully-distributed, decentralized multi-tiered web application. Its web tier that implements servlet server side web technology converts the HTTP request to the HTTPServletRequest. The HTTPServletRequest is delivered to the web components which can interact with the business components or the database components to generate dynamic content.  The web components provide dynamic extension capabilities for the web server to process servlets, JSP pages or web service endpoints. The business components perform business logic. The database components retrieve data from the data warehouse for dynamic content.  The requested resources include files, images, etc.

      The web components then create an HTTPServletResponse, convert it to the following HTTP response message, and send it back to your browser:
            Status: HTTP/1.1 200 OK
            Date: Tue, 05 Jun 2012 03:53:32 GMT
            Server: Server
            pragma:  no-cache
            cache-control: no-cache
            Content-Type: text/html; charset=ISO-8859-1
            Set-cookie: session-id-time=2082787201l; path=/; domain=.amazon.com; expires=Tue,                  01-Jan-2036 08:00:01 GMT
            Transfer-Encoding: chunked
            …..
             Content:
               “<html>
               <head>
            <script type="text/javascript">var ue_t0=ue_t0||+new Date();
               <script>var BtechCF={a:2,cf:function(){if(--BtechCF.a == 0){uet('cf');}}};
                 <script type="text/javascript">
                    new Image().src = "http://g-ecx.images-amazon.com/images/G/01/...";
                    new Image().src = "http://g-ecx.images-amazon.com/images/G/01/...";
               …..
               </html>”

      Depending on the Amazon/Google web server implementation, generally speaking, the servlet is a Java programming language class that dynamically process requests and construct responses. JSP page is a text-based document that executes as servlets but allow a more natural approach to creating both static and dynamic content. Many other web technologies, including ASP, JSF, HTML, DHTML, CSS, AJAX, JSON, PHP, CGI, XML, JavaScript, RSS, etc., can be used to implement the Amazon/Google web pages. Web components are supported by the services of a run time platform called - web container. A web container provides services such as request dispatching, security, concurrency, and life-cycle management.

      The above HTTP response is sent back to your browser by the Amazon/Google web server.  The browser rendering engine parses the HTML document and the tags to DOM nodes in a tree called the "content tree". It will parse the style data, both in external CSS files and in style elements. The styling information together with the visual instructions in HTML will be used to create another tree - the render tree. Then it goes through the layout and painting processes to display the content on the browser screen. Sometimes, the Amazon/Google web page contains links to files that your browser can not display or play, such as sound, animation files. In that case, you need to install a plug-in application in your browser.

In the end, a couple of additional important things you need to be aware of during the procedures described above. 

      The Amazon/Google web server sends a cookie in the HTTP header with the HTTP response to your browser. Refer to the Set-cookie header line in the above HTTP response message. The Amazon stores the following information in the cookie:  a main user Id, Id for each session, the time session started on your machine. The Amazon also uses cookie to implement the shopping cart.

     The Amazon/Google web home page includes AJAX JavaScript to allow the parts of the page to be updated asynchronously by exchanging small amounts of data with the Amazon web server.    
              
     The HTTP application protocol used for retrieving the web pages is connectionless. Your web browser client opens a connection and sends a HTTP request message to a HTTP server; the server then returns a HTTP response message, usually containing the resources requested. After delivering the response, the server closes the connection. HTTP is stateless. This is a direct result of the HTTP being connectionless.  The server and the client are aware of each other only during request. As a work-around, HTTP servers implement various session management methods, utilizing identifiers in cookie to track the requests originating from the same client. 

No comments:

Post a Comment

How Does the Server Certificate SSL Work?

The Secure Socket Layer (SSL) is the standard security transportation protocol for establishing an encrypted communication between the clie...