HttpPrimer

From Hashphp.org
Revision as of 04:18, 21 September 2011 by GoogleGuy (Talk | contribs) (The Request)

Jump to: navigation, search

What Is HTTP

To start off simple think of HTTP like an understanding between you and some of your friends. You may have an understanding that when you meet you may shake hands a certain way or exchange a particular greeting. It is not written into law and no one can force you to do so, but it has been mutually agreed upon among you that this will be standard. Perhaps you have an understanding with family, coworkers, colleagues, or fellow students as well. One that depicts how you will behave around one another and what requests will illicit which responses. Over the last two decades this is pretty much how the web has evolved. It was through this protocol called HTTP that we developed a mutual understanding of how computers should communicate over the World Wide Web.

HTTP, or Hyper Text Transfer Protocol, (the protocol the web relies on most) is built on a request-response model that governs most client-server relationships. What this means is that you (the client) make a request (like typing a URL into your browser address bar) and on the other end (the server) will respond to that request (like supplying a web page with information).

The Request

In order to make some of the request/response information sent over HTTP a little more structured and presentable I decided to provide a snapshot of what Chrome's Developer Tools presents when I type http://php.net in my browser address bar and hit enter. This is the request!

Chrome-developer-tools-snap.jpg

As you can see above one request can actually yield multiple requests. Because the page can supply additional resources for my browser to allocate like javascript files from a separate URI, CSS style sheets, images, etc... each of these additional URIs may also produce an additional request. Each request in this list has a Name/path, a Method, a Status, a Type, a Size, and a record of how much Time it took. The very first request in the list is http://php.net which is exactly what I typed into my browser. This request has to have a method, because there is more than one method in HTTP just as there is more than one way for you to ask a friend or family member for something. You may be asking if you can give them something or you may be asking to take something from them or even both. They should probably know in advance so in HTTP we try to stick with a well known set of methods call GET and POST. There are a few other methods that are used in HTTP, but for now we'll stick with these as they are the most common that you'll come across.

GET The GET request METHOD tells the server that you'd like to get some information from it. Pretty simple, right? So let's take a look at how we ask the server for this information and what the request actually looks like. File:HTTP-headers.gif The request is made up of a HOST and a PATH that applies to that particular host where I want to retrieve the information from. In this request the HOST is php.net and the path is just root denoted by a forward-slash. The snapshot above illustrates how Chrome's developer tools outline the information, but in the actual request sent to the server it's organized a little differently. Since HTTP is a plain-text protocol it's pretty simple in structure and uses nothing more than ASCII characters in the headers. The very first part of any HTTP request/response (called the header) must supply each part of the header (first portion of the request/response) on an individual line. The line is terminated by a CRLF (or Carrage-Return Line-Feed). Each line represents a sort of directive that indicates to the server and client what is being communicated and in what context.

The very first line of the request header is made up of three parts. The first is the method (in this case GET) and the second is the path (in this case / or the root path) and the third indicates which version of the HTTP protocol the machine understands (in this case HTTP 1.1). The second line of the request header indicates the HOST (in this case php.net). The next few lines tell us a little about the connection the client is trying to make, the User-Agent information supplied by the client and some acceptable attributes for the information they expect to receive back from the server. It may also include addtional optional headers like a query string, cookies, etc... We'll get into those a little later, but for now here's an example of exactly what this request header might look like if you were looking at it as it were to be sent directly by your browser.

GET / HTTP/1.1 Host: php.net Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding:gzip,deflate,sdch Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3 Accept-Language:en-US,en;q=0.8 Cookie:COUNTRY=USA%2C0.0.0.0; LAST_LANG=en

Statelessness

HTTP is a stateless protocol. Stateless means it does not respond to any single request as though it were related or tied to any prior or forth-coming request. HTTP is built on a request-response model like most client-server relationships. Knowing and understanding these fundamental principles about HTTP will bring you a long way with writing PHP scripts that generate dynamic web pages. Since the protocol is closely tied to the web and since PHP is most notable for use in web content generation these things go hand-in-hand.


How HTTP Can Be Stateless but Allow User Sign-In

So how can the web be stateless and still retain state? That's actually a good question, because when the World Wide Web was first implemented the intention was strictly to publish content. The web is already very good at this. If I write a dissertation on Network Security and would like to publish my thesis online it's very easy for me to put that content on a website where anyone with Internet access can view it. However, static content is boring. Anyone can write something and publish it online (that's not why blogs are popular today by the way). When you need to collect information from others or communicate collectively or in pairs, or when you want to run Software as a Service (SaS) over the web like Gmail, Hotmail, or Y!Mail the web seems to lack in many areas.

If the web seems like a horrible way to do things like Instant Messaging, Video/Audio broadcasting/conferencing, etc... that's because it was never designed as a broadcast mechanism and thus requests weren't initially designed to be stateful (that is to say that the webserver can tie one request from a client to any other previous or future request).

The client, however, does make retaining some kind of state possible with things like cookies. HTTP cookies are made up of key/value pairs and sent as a part of the request/response headers in most GET/POST HTTP requests and can be sent by either the client or server or both. In most cases the idea is that the server provides the client with some information that they would like them to retain in future requests by sending one or more cookies in the HTTP response header of a particular request. The client's browser (or User Agent) then stores this cookie on their machine either in memory or on disk and continues to send the same cookie back in each future request (as a part of the request header) made to the domain assigned by that cookie. The server can make use of this information in a number of ways to track the client in subsequent requests, like determine if a user is logged in or authenticated, identifying a user's session information stored on the server, store user preferences or even shopping cart content.


PHP and State

So let's get down to PHP's role in all of this. First, let's start by understanding exactly how cookies work and how PHP can use them to retain some information about a client during subsequent requests. Below is a snapshot I took using netcat (a network utility for reading/writing networking connections using TCP/UDP) in the terminal window on the right and a browser to demonstrate what cookies actually look like in HTTP. Remember HTTP is a plain text protocol. It really doesn't do much of anything fancy behind the scenes.

In this example I'm basically substituting what the webserver and PHP would do when I go to my browser address bar and type in http://localhost/foobar and in this case as you can see what happens is my browser sends a request (since I'm listening in on the port with netcat I can capture and respond to this request myself). The response header I sent back includes a cookie using the Set-cookie: HTTP response header. The cookie is given a name and a value and an expiration date. You'll notice that actually I set this cookie in a previous request as well along with another cookie and so the current request includes them in the client's request header. Cookie.png What you notice from the response body is that it's just standard markup. The same thing you would see if you clicked on "View Source" from your browser (CTR+U in Chrome, FireFox and Opera, CTR+ALT+U in Safari, and ALT+V+C in IE). But you normally don't see the request/response headers on the page in your browser and by the time you do see the response body what you're actually looking at is the actual parsed HTML. Developer tools offered standard in newer versions of mainstream browsers like Chrome, FireFox (with Firebug), Opera and IE8-9 allow you to see information from the request/response headers and there you can find some useful information about the requests and responses you make and get over the web (HTTP).