Assignment

sb230529
CH15.pptx

Chapter 15 The World Wide Web

Chapter 15 Overview

Fundamentals of hypertext documents

Web protocols and basic websites

Modern content management systems

Assuring the security of web-based activities

The Web and Hypertext

Hypertext – links – are the Web's foundation

Like email, the Web has two sets of standards:

Formatting standards – how to construct web pages that a browser can display

Protocol standards – how to retrieve a web page from a server

Standards are maintained by W3C, not IETF

Web developed by Tim Berners-Lee, founder of W3C

Formatting: HTML

Hypertext Markup Language

Modern HTML can display a page with images, varying type styles, and links to other pages

Type styles – handled via Cascading Style Sheets (CSS)

Hypertext links – handled via the “a” tag in HTML markup

Images – handled via the “img” tag

Sample HTML

Resulting Web Page

Courtesy of Dr. Richard Smith.

Hypertext Link Format

Hypertext Transfer Protocol (HTTP)

The protocol used to retrieve web pages

Traditionally very simple

Client opens a connection

Client sends the page's file name (URL)

Server retrieves the file and transmits down the connection, prefixed by a text message indicating success or failure

Modern web server software

Apache – open source

Internet Information Service (IIS) – Microsoft

Addressing Web Pages

We call them URLs

Stands for Uniform Resource Locator

Indicates the location of a resource

Technically they are identifiers

Or, Uniform Resource Identifiers (URIs)

Web page addresses usually indicate the identity of the resource, not its location

We call them URLs anyway

URL Format for Web Pages

Email Address URL (Really, URI)

The URL Authority Field

Retrieving a Static Web Page

The process follows these steps:

Enter the URL into the browser

The browser resolves the domain name

The browser opens a TCP connection

Port 80 at the server's IP address

The browser sends a GET statement

Includes the URL

The server retrieves the named file and sends it back over the same TCP connection

Retrieving a Static Web Page

Retrieving a Web Page

If we don't specify a file name, the server guesses the file name, or uses some other default: index.html, default.htm, home.htm...

Pages may consist of multiple files

Images reside in separate files

The server may open separate connections to retrieve the separate files

Statelessness: The client retains all state when retrieving a static web page

Directories and Search Engines

Directories evolved as a way to find web content

Yahoo! was a pioneering directory

Directories are labor intensive

Must keep the number of entries in a particular category short

Requires editing and analysis

Search engines – Alta Vista, now Google & Bing

Use crawlers to find linked content on Web

Search engines can find sensitive and unprotected data on websites

Basic Web Security

Topics

Client policy issues

Static website security

Server authentication

Server masquerades

Client Policy Issues

Acceptable Use Policies for web access

Avoid distractions from business tasks

Minimize non-business web use

Prohibit inappropriate content

Resist malware infestations

Client management techniques

Traffic blocking

Traffic monitoring – Trust, but Verify

Training – part of overall security education

Traffic Blocking Techniques

Website whitelist – list all accepted websites

Applies Deny by Default

Requires a lot of management

Content control or blacklists

Often provided by third party vendors

Products may block sites unconditionally or issue warnings for suspicious sites

Web traffic scanning – like antivirus scanning

Reviews actual content being retrieved

Can detect malware infection attempts

HTTP Tunneling

Most sites permit HTTP traffic through firewalls

Some vendors “tunnel” through firewalls

Allows connections between internal and external vendor hosts, despite blocking

May support improved customer service

May also allow unauthorized access to site

Firewalling an HTTP tunnel

Basic packet and session filtering can't detect HTTP firewalling

Firewall must examine HTTP traffic itself

Static Website Security

Risks to the static site server

Attackers may deface the site if they can find a way to modify the files

Sensitive information might be disclosed if it is placed in the site hierarchy accidentally

Bogus site – attacker redirects visitors to a site masquerading as the real site

Risks to clients

Maliciously formatted files: JPEG of death

Server Authentication: SSL

Authenticating a Certificate

Server Authentication Failures

SSL authentication doesn't always succeed

Failure may be an administrative error

Types of failures detected by browsers

Domain names don't match (may be OK)

Untrusted certificate authority (may be or not)

Expired certificate (often still safe)

Revoked certificate (Unsafe)

Invalid digital signature (Unsafe)

Assessing a Failure

Mismatched domain name: Whose certificate?

Would the actual owner of the certificate legitimately host this website?

Does the naming error make sense?

Untrusted certificate authority: Who signed it

It's “untrusted” because the browser didn't have the authority's certificate already

US military doesn't distribute its CA certificate with commercial browsers

Can we reliably download a valid certificate?

Server Masquerades

Sophisticated attacks will undermine SSL

Techniques to trick browsers

Bogus certificate authority

Usually detected by the browser

Misleading domain name

Examples: “paypai.com” “ebay-login.com”

Stolen private key – sign bogus certificates

Tricked certificate authority

The authority itself issues the certificate

Dynamic Websites

Static websites serve pre-built pages from files

Dynamic websites construct pages on demand

Performing a POST operation

Alice retrieves a “form” page from the server

The server transmits the HTML page

Alice fills out fields in the form, clicks “Submit”

Formats the fields into a POST operation

Sends them to the server

Server processes the POST, sends response

Processing a Web Form

Scripts for Dynamic Websites

Modern sites use scripts

Instead of retrieving a file from the site directory, the server executes a script

The script interprets the URL's path name

These are server-side scripts

The scripts execute on the server

Sites also use client-side scrupts

The scripts are embedded in the web page

The client executes the scripts

Server-Side Scripts

Scripting Languages

Perl – PL

Active Server Pages (Extended) – ASP, ASPX

Microsoft system that supports Visual Basic, Javascript, ActiveX, and the .Net framework

PHP – Hypertext Processor

Javascript – JS – often used on the client side

Java Server Pages – JSP

Python – PY

Ruby – RB

Client Scripting Security

Client-side risk

A script could modify files or software on the client's computer – a “drive-by download”

Waledec botnet does this

Cross-site scripting – script resides elsewhere

Client-side defenses

Same origin policy – all of script's accesses must use same host, port number, protocol

Sandboxing – block access to client resources except those allowed in by user

States and HTTP

HTTP servers don't save state themselves

We use cookies to establish state

Otherwise sites can't maintain shopping carts

Also makes it difficult to track individual visitors

Scripting language libraries handle cookies

Provide functions to track individual visitors

Provide functions to establish “sessions” and maintain data from one to the next

Content Management Systems

Manage contents of a dynamic website

Web contents stored in a database

Pages are built by a set of scripts

Four parts:

Operating system and protocol stack

Web server software

Database management software

Web scripting language

Open source systems often use “LAMP”

aka Linux, Apache, MySQL, and PHP

Organization of a CMS

Database Management Systems

A typical modern DBMS is relational

Stores data in a set of tables

Each table has rows of individual records

Each column is a different attribute

In some tables, an attribute will select records in a different table – making a relationship

Most use Structured Query Language (SQL)

A standard notation for database operations

A Relational Database

A Database Query in SQL

Retrieving a CMS Page

User types in a URL

Browser constructs an HTML GET or POST command and transmits it – either will work

Server receives the command and extracts the path name and any arguments from it

Server runs the main CMS script and passes it the arguments

The script locates database entries required to respond to the arguments

The script builds the page to send to browser

Command Injection Attacks

Attack on the chain of control at the DBMS

Trick the DBMS into executing an SQL command written by a visitor

The attacker enters malicious text into a text field in one of the site's forms

The malicious text is inserted into an SQL query, and its contents fool the DBMS

The contents either modify the meaning of the SQL query or add another query to the existing one

An SQL Injection Vulnerability (1 of 3)

An SQL Injection Vulnerability (2 of 3)

An SQL Injection Vulnerability (3 of 3)

OWASP Top Ten Risks

Injection

Command or SQL

Broken authentication

Sensitive data exposure

XML external entities

Broken access control

Security misconfiguration

Cross-site scripting

Aka XSS

Insecure deserialization

Using components with known vulnerabilities

Insufficient logging and monitoring

Ensuring Web Security Properties

Serving confidential data

SSL protects data in transit, but not at rest

This is like the DRM problem

Collecting confidential data

PCI-DSS standards for payment card data

Most sites off-load credit card processing

Site integrity

Protect site from external modification

If users can modify contents, extra caution is needed

Levels of Website Availability

Routine – no special steps ensure availability

High availability – downtime only takes place when scheduled – no unexpected downtime

Continuous operation – system operates with no scheduled outages, only unexpected ones.

Ongoing maintenance swaps out redundant equipment without taking the system offline

Continuous availability – system operates with no scheduled or unscheduled downtime

Combines the two features

Web Privacy

Software often keeps records of user activities

Browsers “cache” copies of pages

Servers record visitor IP addresses

Anonymous proxies – sites that perform NAT and redirect visitors to other sites

Masks the user's actual IP address

Onion routing and TOR – a proxy by the EFF

Private browsing

Browser mechanisms to minimize or erase the browser history

image2.jpg

image3.jpg

image4.jpg

image5.jpg

image6.jpg

image7.jpg

image8.jpg

image9.jpg

image10.jpg

image11.jpg

image12.jpg

image13.jpg

image14.jpg

image15.jpg

image16.jpg

image17.jpg

image18.jpg

image19.jpg

image1.jpg