Difference between revisions of "HttpPrimer"

From Hashphp.org
Jump to: navigation, search
(The Request)
(GET)
Line 23: Line 23:
 
[[File:HTTP-headers.gif]]
 
[[File:HTTP-headers.gif]]
  
The request is made up of a '''HOST''' and a '''PATH''' that applies to that particular host where I want to retrieve the information from. In this request the HOST is php.net and the path is just ''root'' denoted by a ''forward-slash''. The snapshot above illustrates how Chrome's developer tools parse the information, but in the actual request sent to the server it's organized a little differently. Since HTTP is a plain-text protocol it's pretty simple in structure and uses nothing more than ASCII characters in the headers. The very first part of any HTTP request/response (called the header) must supply each part of the header (first portion of the request/response) on an individual line. The line is terminated by a CRLF (or ''Carrage-Return Line-Feed''). Each line represents a sort of directive that indicates to the server and client what is being communicated and in what context.
+
The request is made up of a '''HOST''' and a '''PATH''' that applies to that particular host where I want to retrieve the information from. In this request the HOST is php.net and the path is just ''root'' denoted by a ''forward-slash''. The snapshot above illustrates how Chrome's developer tools parse the information, but in the actual request sent to the server it's organized a little differently. Since HTTP is a plain-text protocol it's pretty simple in structure and uses nothing more than ASCII characters in the headers. The very first part of any HTTP request/response (called the header) must supply each part of the header (first portion of the request/response) on an individual line. The line is terminated by a CRLF (or ''Carrage-Return Line-Feed''). Each line represents a sort of directive that indicates to the server and client what is being communicated and in what context. To separate the ''header'' from the ''body'' we use two consecutive CRLFs to terminate the header and it's understood that this means everything after the double CRLF is the body. This is where the client would begin parsing the HTML, for example.
  
 
The very first line of the request header is made up of three parts. The first is the method (in this case GET) and the second is the path (in this case / or the root path) and the third indicates which version of the HTTP protocol the machine understands (in this case HTTP 1.1). The second line of the request header indicates the HOST (in this case php.net). The next few lines tell us a little about the connection the client is trying to make, the User-Agent information supplied by the client and some acceptable attributes for the information they expect to receive back from the server. It may also include addtional optional headers like a query string, cookies, etc... We'll get into those a little later, but for now here's an example of exactly what this request header might look like if you were looking at it as it were to be sent directly by your browser.
 
The very first line of the request header is made up of three parts. The first is the method (in this case GET) and the second is the path (in this case / or the root path) and the third indicates which version of the HTTP protocol the machine understands (in this case HTTP 1.1). The second line of the request header indicates the HOST (in this case php.net). The next few lines tell us a little about the connection the client is trying to make, the User-Agent information supplied by the client and some acceptable attributes for the information they expect to receive back from the server. It may also include addtional optional headers like a query string, cookies, etc... We'll get into those a little later, but for now here's an example of exactly what this request header might look like if you were looking at it as it were to be sent directly by your browser.

Revision as of 05:10, 21 September 2011

What Is HTTP

HTTP stands for Hyper Text Transfer Protocol, (the protocol the web relies on most). A protocol is something computers use to communicate with each other when they use different software without having to understand specifically how each software works. This is a way for a group of computers to interface with each other without the need for them to all run the same software (so as not to create any conflicts in communication).

To start off simple think of HTTP (or basically any other protocol a computer uses) like an understanding between computers. Much like the understanding you and some of your friends might have when you meet up. You may have an understanding that when you meet you shake hands a certain way or exchange a particular greeting. If a different group were introduced in the mix they might not understand this shared protocol and probably wouldn't be able to communicate with you. But much like a protocol, a hand shake doesn't necessarily require that we speak the same language. Just that we have an understand of what that handshake generally means.

It is not written into law and no one can force you to do so, but it has been mutually agreed upon among you that this will be a standard for greeting. Perhaps you have an understanding with family, coworkers, colleagues, or fellow students as well. One that depicts how you will behave around one another and what requests will illicit which responses. Over the last two decades this is pretty much how the web has evolved. It was through this protocol called HTTP that we developed a mutual understanding of how computers should communicate over the World Wide Web.

HTTP is built on a request-response model that governs most client-server relationships. What this means is that you (the client) make a request (like typing a URL into your browser address bar) and on the other end (the server) will respond to that request (like supplying a web page with information).

The Request

In order to make some of the request/response information sent over HTTP a little more structured and presentable I decided to provide a snapshot of what Chrome's Developer Tools presents when I type http://php.net in my browser address bar and hit enter. This is the request!

Chrome-developer-tools-snap.jpg

As you can see above one request can actually yield multiple requests. Because the page can supply additional resources for my browser to allocate like javascript files from a separate URI, CSS style sheets, images, etc... each of these additional URIs may also produce an additional request. Each request in this list has a Name/path, a Method, a Status, a Type, a Size, and a record of how much Time it took. The very first request in the list is http://php.net which is exactly what I typed into my browser. This request has to have a method, because there is more than one method in HTTP just as there is more than one way for you to ask a friend or family member for something. You may be asking if you can give them something or you may be asking to take something from them or even both. They should probably know in advance so in HTTP we try to stick with a well known set of methods called GET and POST. There are a few other methods that are used in HTTP, but for now we'll stick with these as they are the most common that you'll come across.

GET

The GET request METHOD tells the server that you'd like to get some information from it. Pretty simple, right? So let's take a look at how we ask the server for this information and what the request actually looks like.

File:HTTP-headers.gif

The request is made up of a HOST and a PATH that applies to that particular host where I want to retrieve the information from. In this request the HOST is php.net and the path is just root denoted by a forward-slash. The snapshot above illustrates how Chrome's developer tools parse the information, but in the actual request sent to the server it's organized a little differently. Since HTTP is a plain-text protocol it's pretty simple in structure and uses nothing more than ASCII characters in the headers. The very first part of any HTTP request/response (called the header) must supply each part of the header (first portion of the request/response) on an individual line. The line is terminated by a CRLF (or Carrage-Return Line-Feed). Each line represents a sort of directive that indicates to the server and client what is being communicated and in what context. To separate the header from the body we use two consecutive CRLFs to terminate the header and it's understood that this means everything after the double CRLF is the body. This is where the client would begin parsing the HTML, for example.

The very first line of the request header is made up of three parts. The first is the method (in this case GET) and the second is the path (in this case / or the root path) and the third indicates which version of the HTTP protocol the machine understands (in this case HTTP 1.1). The second line of the request header indicates the HOST (in this case php.net). The next few lines tell us a little about the connection the client is trying to make, the User-Agent information supplied by the client and some acceptable attributes for the information they expect to receive back from the server. It may also include addtional optional headers like a query string, cookies, etc... We'll get into those a little later, but for now here's an example of exactly what this request header might look like if you were looking at it as it were to be sent directly by your browser.


GET / HTTP/1.1

Host: php.net

Connection: keep-alive

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.163 Safari/535.1

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Encoding: gzip,deflate,sdch

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Accept-Language: en-US,en;q=0.8

Cookie: COUNTRY=USA%2C0.0.0.0; LAST_LANG=en

The Response

Of course this request came back with a response from the server. As you can see the response is structured fairly similarly to our request and again it's in plain text and fairly self explanatory.


HTTP/1.1 200 OK

Date: Wed, 21 Sep 2011 09:48:13 GMT

Server: Apache/1.3.41 (Unix) PHP/5.2.17

X-Powered-By: PHP/5.2.17

Content-language: en

Last-Modified: Wed, 21 Sep 2011 12:41:39 GMT

Connection: close

Transfer-Encoding: chunked

Content-Type: text/html;charset=utf-8


The X-* parts of the response header are extra information the server is willing to share with the client. You won't necessarily see these in every response header, but some parts of the response are mandatory like the first line which tells us what version of the HTTP protocol the server understands followed by a status code that indicates how the server responded to this particular request (in our case the Server indicates that the request was received OK and there was no problem). The status codes are defined by the RFC 2616 Fielding and each status code should be dealt with in that fashion.

POST

So let's take a look at the other method we talked about POST and see how that might produce a slightly different request/response.

HTTP-POST.gif

Here we're not actually looking at the parsed header information, but now we see something a little different. In this request we a have section called Form Data (referred to as the POST body in most cases) and it contains information we want to give the server this time.

Statelessness

HTTP is a stateless protocol. Stateless means it does not respond to any single request as though it were related or tied to any prior or forth-coming request. HTTP is built on a request-response model like most client-server relationships. Knowing and understanding these fundamental principles about HTTP will bring you a long way with writing PHP scripts that generate dynamic web pages. Since the protocol is closely tied to the web and since PHP is most notable for use in web content generation these things go hand-in-hand.


How HTTP Can Be Stateless but Allow User Sign-In

So how can the web be stateless and still retain state? That's actually a good question, because when the World Wide Web was first implemented the intention was strictly to publish content. The web is already very good at this. If I write a dissertation on Network Security and would like to publish my thesis online it's very easy for me to put that content on a website where anyone with Internet access can view it. However, static content is boring. Anyone can write something and publish it online (that's not why blogs are popular today by the way). When you need to collect information from others or communicate collectively or in pairs, or when you want to run Software as a Service (SaS) over the web like Gmail, Hotmail, or Y!Mail the web seems to lack in many areas.

If the web seems like a horrible way to do things like Instant Messaging, Video/Audio broadcasting/conferencing, etc... that's because it was never designed as a broadcast mechanism and thus requests weren't initially designed to be stateful (that is to say that the webserver can tie one request from a client to any other previous or future request).

The client, however, does make retaining some kind of state possible with things like cookies. HTTP cookies are made up of key/value pairs and sent as a part of the request/response headers in most GET/POST HTTP requests and can be sent by either the client or server or both. In most cases the idea is that the server provides the client with some information that they would like them to retain in future requests by sending one or more cookies in the HTTP response header of a particular request. The client's browser (or User Agent) then stores this cookie on their machine either in memory or on disk and continues to send the same cookie back in each future request (as a part of the request header) made to the domain assigned by that cookie. The server can make use of this information in a number of ways to track the client in subsequent requests, like determine if a user is logged in or authenticated, identifying a user's session information stored on the server, store user preferences or even shopping cart content.


PHP and State

So let's get down to PHP's role in all of this. First, let's start by understanding exactly how cookies work and how PHP can use them to retain some information about a client during subsequent requests. Below is a snapshot I took using netcat (a network utility for reading/writing networking connections using TCP/UDP) in the terminal window on the right and a browser to demonstrate what cookies actually look like in HTTP. Remember HTTP is a plain text protocol. It really doesn't do much of anything fancy behind the scenes.

In this example I'm basically substituting what the webserver and PHP would do when I go to my browser address bar and type in http://localhost/foobar and in this case as you can see what happens is my browser sends a request (since I'm listening in on the port with netcat I can capture and respond to this request myself). The response header I sent back includes a cookie using the Set-cookie: HTTP response header. The cookie is given a name and a value and an expiration date. You'll notice that actually I set this cookie in a previous request as well along with another cookie and so the current request includes them in the client's request header. Cookie.png What you notice from the response body is that it's just standard markup. The same thing you would see if you clicked on "View Source" from your browser (CTR+U in Chrome, FireFox and Opera, CTR+ALT+U in Safari, and ALT+V+C in IE). But you normally don't see the request/response headers on the page in your browser and by the time you do see the response body what you're actually looking at is the actual parsed HTML. Developer tools offered standard in newer versions of mainstream browsers like Chrome, FireFox (with Firebug), Opera and IE8-9 allow you to see information from the request/response headers and there you can find some useful information about the requests and responses you make and get over the web (HTTP).