web technologies- html

Ohud92
IT230_Modified_Chapter1_Week2.ppt

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Chapter 1
Web Essentials: Clients, Servers, and Communication

WEB TECHNOLOGIES

A COMPUTER SCIENCE PERSPECTIVE
JEFFREY C. JACKSON

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet

  • Technical origin: ARPANET (late 1960’s)
  • One of earliest attempts to network heterogeneous, geographically dispersed computers
  • Email first available on ARPANET in 1972 (and quickly very popular!)
  • ARPANET access was limited to select DoD-funded organizations

*

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet

  • Open-access networks
  • Regional university networks (e.g., SURAnet)
  • CSNET for CS departments not on ARPANET
  • NSFNET (1985-1995)
  • Primary purpose: connect supercomputer centers
  • Secondary purpose: provide backbone to connect regional networks

*

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

The Internet

  • Internet: the network of networks connected via the public backbone and communicating using TCP/IP communication protocol
  • Backbone initially supplied by NSFNET, privately funded (ISP fees) beginning in 1995

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Internet Protocols

  • Communication protocol: how computers talk
  • Cf. telephone “protocol”: how you answer and end call, what language you speak, etc.
  • Internet protocols developed as part of ARPANET research
  • ARPANET began using TCP/IP in 1982
  • Designed for use both within local area networks (LAN’s) and between networks

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Internet Protocol (IP)

  • IP is the fundamental protocol defining the Internet (as the name implies!)
  • IP address:
  • 32-bit number (in IPv4)
  • Associated with at most one device at a time (although device may have more than one)
  • Written as four dot-separated bytes, e.g. 192.0.34.166

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

IP

  • IP function: transfer data from source device to destination device
  • IP source software creates a packet representing the data
  • Header: source and destination IP addresses, length of data, etc.
  • Data itself
  • If destination is on another LAN, packet is sent to a gateway that connects to more than one network

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

IP

Source

Gateway

Gateway

Network 1

Network 2

Destination

Network 3

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Transmission Control Protocol (TCP)

  • Limitations of IP:
  • No guarantee of packet delivery (packets can be dropped)
  • Communication is one-way (source to destination)
  • TCP adds concept of a connection on top of IP
  • Provides guarantee that packets delivered
  • Provide two-way (full duplex) communication

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP

Source

Destination

Can I talk to you?

OK. Can I talk to you?

OK.

Here’s a packet.

Got it.

Here’s a packet.

Here’s a resent packet.

Got it.

Establish

connection.

{

{

{

Send packet

with

acknowledgment.

Resend packet if

no (or delayed)

acknowledgment.

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP

  • TCP also adds concept of a port
  • TCP header contains port number representing an application program on the destination computer
  • Some port numbers have standard meanings
  • Example: port 25 is normally used for email transmitted using the Simple Mail Transfer Protocol (SMTP)
  • Other port numbers are available first-come-first served to any application

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

TCP

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

User Datagram Protocol (UDP)

  • Like TCP in that:
  • Builds on IP
  • Provides port concept
  • Unlike TCP in that:
  • No connection concept
  • No transmission guarantee
  • Advantage of UDP vs. TCP:
  • Lightweight, so faster for one-time messages

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Domain Name Service (DNS)

  • DNS is the “phone book” for the Internet
  • Map between host names and IP addresses
  • DNS often uses UDP for communication
  • Host names
  • Labels separated by dots, e.g., www.example.org
  • Final label is top-level domain
  • Generic: .com, .org, etc.
  • Country-code: .us, .sa, etc.

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

DNS

  • Domains are divided into second-level domains, which can be further divided into subdomains, etc.
  • E.g., in www.example.com, example is a second-level domain
  • A host name plus domain name information is called the fully qualified domain name of the computer
  • Above, www is the host name, www.example.com is the FQDN

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

DNS

  • nslookup program provides command-line access to DNS (on most systems)
  • looking up a host name given an IP address is known as a reverse lookup
  • Recall that single host may have mutliple IP addresses.
  • Address returned is the canonical IP address specified in the DNS system.

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Higher-level Protocols

  • Many protocols build on TCP
  • Telephone analogy: TCP specifies how we initiate and terminate the phone call, but some other protocol specifies how we carry on the actual conversation
  • Some examples:
  • SMTP (email)
  • FTP (file transfer)
  • HTTP (transfer of Web documents)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

World Wide Web

  • Originally, one of several systems for organizing Internet-based information
  • Competitors: WAIS, Gopher, ARCHIE
  • Distinctive feature of Web: support for hypertext (text containing links)
  • Communication via Hypertext Transport Protocol (HTTP)
  • Document representation using Hypertext Markup Language (HTML)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

World Wide Web

  • The Web is the collection of machines (Web servers) on the Internet that provide information, particularly HTML documents, via HTTP.
  • Machines that access information on the Web are known as Web clients. A Web browser is software used by an end user to access the Web.

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Hypertext Transport Protocol (HTTP)

  • HTTP is based on the request-response communication model:
  • Client sends a request
  • Server sends a response
  • HTTP is a stateless protocol:
  • The protocol does not require the server to remember anything about the client between requests.

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP

  • Normally implemented over a TCP connection (80 is standard port number for HTTP)
  • Typical browser-server interaction:
  • User enters Web address in browser
  • Browser uses DNS to locate IP address
  • Browser opens TCP connection to server
  • Browser sends HTTP request over connection
  • Server sends HTTP response to browser over connection
  • Browser displays body of response in the client area of the browser window

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP

$ telnet www.example.org 80

Trying 192.0.34.166...

Connected to www.example.com (192.0.34.166).

Escape character is ’^]’.

GET / HTTP/1.1

Host: www.example.org

HTTP/1.1 200 OK

Date: Thu, 09 Oct 2003 20:30:49 GMT

{

Send

Request

{

Receive

Response

Connect

{

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Structure of the request:
  • start line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Structure of the request:
  • start line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Start line
  • Example: GET / HTTP/1.1
  • Three space-separated parts:
  • HTTP request method
  • Request-URI
  • HTTP version

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Start line
  • Example: GET / HTTP/1.1
  • Three space-separated parts:
  • HTTP request method
  • Request-URI
  • HTTP version
  • We will cover 1.1, in which version part of start line must be exactly as shown

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Start line
  • Example: GET / HTTP/1.1
  • Three space-separated parts:
  • HTTP request method
  • Request-URI
  • HTTP version

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Uniform Resource Identifier (URI)
  • Syntax: scheme : scheme-depend-part
  • Ex: In http://www.example.com/
    the scheme is http
  • Request-URI is the portion of the requested URI that follows the host name (which is supplied by the required Host header field)
  • Ex: / is Request-URI portion of http://www.example.com/

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

URI

  • URI’s are of two types:
  • Uniform Resource Name (URN)
  • Can be used to identify resources with unique names, such as books (which have unique ISBN’s)
  • Scheme is urn
  • Uniform Resource Locator (URL)
  • Specifies location at which a resource can be found
  • In addition to http, some other URL schemes are https, ftp, mailto, and file

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Start line
  • Example: GET / HTTP/1.1
  • Three space-separated parts:
  • HTTP request method
  • Request-URI
  • HTTP version

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Common request methods:
  • GET
  • Used if link is clicked or address typed in browser
  • No body in request with GET method
  • POST
  • Used when submit button is clicked on a form
  • Form information contained in body of request
  • HEAD
  • Requests that only header fields (no body) be returned in the response

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Structure of the request:
  • start line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Header field structure:
  • field name : field value
  • Syntax
  • Field name is not case sensitive
  • Field value may continue on multiple lines by starting continuation lines with white space
  • Field values may contain MIME types, quality values, and wildcard characters (*’s)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Multipurpose Internet Mail Extensions (MIME)

  • Convention for specifying content type of a message
  • In HTTP, typically used to specify content type of the body of the response
  • MIME content type syntax:
  • top-level type / subtype
  • Examples: text/html, image/jpeg

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Quality Values and Wildcards

  • Example header field with quality values:
    accept:
    text/xml,text/html;q=0.9,
    text/plain;q=0.8, image/jpeg,
    image/gif;q=0.2,*/*;q=0.1
  • Quality value applies to all preceding items
  • Higher the value, higher the preference
  • Note use of wildcards to specify quality 0.1 for any MIME type not specified earlier

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Request

  • Common header fields:
  • Host: host name from URL (required)
  • User-Agent: type of browser sending request
  • Accept: MIME types of acceptable documents
  • Connection: value close tells server to close connection after single request/response
  • Content-Type: MIME type of (POST) body, normally application/x-www-form-urlencoded
  • Content-Length: bytes in body
  • Referer: URL of document containing link that supplied URI for this HTTP request

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Sample HTTP Requests

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Sample HTTP Requests

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Structure of the response:
  • status line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Structure of the response:
  • status line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Status line
  • Example: HTTP/1.1 200 OK
  • Three space-separated parts:
  • HTTP version
  • status code
  • reason phrase (intended for human use)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Status code
  • Three-digit number
  • First digit is class of the status code:
  • 1=Informational
  • 2=Success
  • 3=Redirection (alternate URL is supplied)
  • 4=Client Error
  • 5=Server Error
  • Other two digits provide additional information

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Structure of the response:
  • status line
  • header field(s)
  • blank line
  • optional body

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP Response

  • Common header fields:
  • Connection, Content-Type, Content-Length
  • Date: date and time at which response was generated (required)
  • Location: alternate URI if status is redirection
  • Last-Modified: date and time the requested resource was last modified on the server
  • Expires: date and time after which the client’s copy of the resource will be out-of-date
  • ETag: a unique identifier for this version of the requested resource (changes if resource changes)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

  • A cache is a local copy of information obtained from some other source
  • Most web browsers use cache to store requested resources so that subsequent requests to the same resource will not necessarily require an HTTP request/response
  • Ex: icon appearing multiple times in a Web page

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

Browser

Web

Server

1. HTTP request for image

2. HTTP response containing image

Client

Server

Cache

3. Store image

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

Browser

Web

Server

Client

Server

Cache

I need that

image

again…

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

Browser

Web

Server

Client

Server

Cache

I need that

image

again…

HTTP request for image

HTTP response containing image

This…

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

Browser

Web

Server

Client

Server

Cache

I need that

image

again…

Get
image

… or this

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Client Caching

  • Cache advantages
  • (Much) faster than HTTP request/response
  • Less network traffic
  • Less load on server
  • Cache disadvantage
  • Cached copy of resource may be invalid (inconsistent with remote version)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Character Sets

  • Every document is represented by a string of integer values (code points)
  • The mapping from code points to characters is defined by a character set
  • Some header fields have character set values:
  • Accept-Charset: request header listing character sets that the client can recognize
  • Ex: accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
  • Content-Type: can include character set used to represent the body of the HTTP message
  • Ex: Content-Type: text/html; charset=UTF-8

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web Clients

  • Many possible web clients:
  • Text-only “browser” (lynx)
  • Mobile phones
  • Robots (software-only clients, e.g., search engine “crawlers”)
  • etc.
  • We will focus on traditional web browsers

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web Browsers

  • Primary tasks:
  • Convert web addresses (URL’s) to HTTP requests
  • Communicate with web servers via HTTP
  • Render (appropriately display) documents returned by a server

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

HTTP URL’s

  • Browser uses authority to connect via TCP
  • Request-URI included in start line (/ used for path if none supplied)
  • Fragment identifier not sent to server (used to scroll browser client area)

http://www.example.org:56789/a/b/c.txt?t=win&s=chess#para5

host (FQDN)

port

authority

path

query

fragment

Request-URI

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web Servers

  • Basic functionality:
  • Receive HTTP request via TCP
  • Map Host header to specific virtual host (one of many host names sharing an IP address)
  • Map Request-URI to specific resource associated with the virtual host
  • File: Return file in HTTP response
  • Program: Run program and return output in HTTP response
  • Map type of resource to appropriate MIME type and use to set Content-Type header in HTTP response
  • Log information about the request and response

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Web Servers

  • httpd: UIUC, primary Web server c. 1995
  • Apache: “A patchy” version of httpd, now the most popular server (esp. on Linux platforms)
  • IIS: Microsoft Internet Information Server
  • Tomcat:
  • Java-based
  • Provides container (Catalina) for running Java servlets (HTML-generating programs) as back-end to Apache or IIS
  • Can run stand-alone using Coyote HTTP front-end

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Tomcat Web Server

  • HTML-based server administration
  • Browse to
    http://localhost:8080
    and click on Server Administration link
  • localhost is a special host name that means “this machine”

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Tomcat Web Server

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers

  • Since HTTP messages typically travel over a public network, private information (such as credit card numbers) should be encrypted to prevent eavesdropping
  • https URL scheme tells browser to use encryption
  • Common encryption standards:
  • Secure Socket Layer (SSL)
  • Transport Layer Security (TLS)

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers

Browser

Web

Server

I’d like to talk securely to you (over port 443)

Here’s my certificate and encryption data

Here’s an encrypted HTTP request

Here’s an encrypted HTTP response

Here’s an encrypted HTTP request

Here’s an encrypted HTTP response

TLS/
SSL

TLS/
SSL

HTTP

Requests

HTTP

Responses

HTTP

Requests

HTTP

Responses

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers
Man-in-the-Middle Attack

Browser

Fake

DNS

Server

What’s IP

address for

www.example.org?

100.1.1.1

Fake

www.example.org

100.1.1.1

Real

www.example.org

My credit card number is…

*

Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0

Secure Servers
Preventing Man-in-the-Middle

Browser

Fake

DNS

Server

What’s IP

address for

www.example.org?

100.1.1.1

Fake

www.example.org

100.1.1.1

Real

www.example.org

Send me a certificate of identity

*