Assignment
Chapter 15 The World Wide Web
Chapter 15 Overview
Fundamentals of hypertext documents
Web protocols and basic websites
Modern content management systems
Assuring the security of web-based activities
The Web and Hypertext
Hypertext – links – are the Web's foundation
Like email, the Web has two sets of standards:
Formatting standards – how to construct web pages that a browser can display
Protocol standards – how to retrieve a web page from a server
Standards are maintained by W3C, not IETF
Web developed by Tim Berners-Lee, founder of W3C
Formatting: HTML
Hypertext Markup Language
Modern HTML can display a page with images, varying type styles, and links to other pages
Type styles – handled via Cascading Style Sheets (CSS)
Hypertext links – handled via the “a” tag in HTML markup
Images – handled via the “img” tag
Sample HTML
Resulting Web Page
Courtesy of Dr. Richard Smith.
Hypertext Link Format
Hypertext Transfer Protocol (HTTP)
The protocol used to retrieve web pages
Traditionally very simple
Client opens a connection
Client sends the page's file name (URL)
Server retrieves the file and transmits down the connection, prefixed by a text message indicating success or failure
Modern web server software
Apache – open source
Internet Information Service (IIS) – Microsoft
Addressing Web Pages
We call them URLs
Stands for Uniform Resource Locator
Indicates the location of a resource
Technically they are identifiers
Or, Uniform Resource Identifiers (URIs)
Web page addresses usually indicate the identity of the resource, not its location
We call them URLs anyway
URL Format for Web Pages
Email Address URL (Really, URI)
The URL Authority Field
Retrieving a Static Web Page
The process follows these steps:
Enter the URL into the browser
The browser resolves the domain name
The browser opens a TCP connection
Port 80 at the server's IP address
The browser sends a GET statement
Includes the URL
The server retrieves the named file and sends it back over the same TCP connection
Retrieving a Static Web Page
Retrieving a Web Page
If we don't specify a file name, the server guesses the file name, or uses some other default: index.html, default.htm, home.htm...
Pages may consist of multiple files
Images reside in separate files
The server may open separate connections to retrieve the separate files
Statelessness: The client retains all state when retrieving a static web page
Directories and Search Engines
Directories evolved as a way to find web content
Yahoo! was a pioneering directory
Directories are labor intensive
Must keep the number of entries in a particular category short
Requires editing and analysis
Search engines – Alta Vista, now Google & Bing
Use crawlers to find linked content on Web
Search engines can find sensitive and unprotected data on websites
Basic Web Security
Topics
Client policy issues
Static website security
Server authentication
Server masquerades
Client Policy Issues
Acceptable Use Policies for web access
Avoid distractions from business tasks
Minimize non-business web use
Prohibit inappropriate content
Resist malware infestations
Client management techniques
Traffic blocking
Traffic monitoring – Trust, but Verify
Training – part of overall security education
Traffic Blocking Techniques
Website whitelist – list all accepted websites
Applies Deny by Default
Requires a lot of management
Content control or blacklists
Often provided by third party vendors
Products may block sites unconditionally or issue warnings for suspicious sites
Web traffic scanning – like antivirus scanning
Reviews actual content being retrieved
Can detect malware infection attempts
HTTP Tunneling
Most sites permit HTTP traffic through firewalls
Some vendors “tunnel” through firewalls
Allows connections between internal and external vendor hosts, despite blocking
May support improved customer service
May also allow unauthorized access to site
Firewalling an HTTP tunnel
Basic packet and session filtering can't detect HTTP firewalling
Firewall must examine HTTP traffic itself
Static Website Security
Risks to the static site server
Attackers may deface the site if they can find a way to modify the files
Sensitive information might be disclosed if it is placed in the site hierarchy accidentally
Bogus site – attacker redirects visitors to a site masquerading as the real site
Risks to clients
Maliciously formatted files: JPEG of death
Server Authentication: SSL
Authenticating a Certificate
Server Authentication Failures
SSL authentication doesn't always succeed
Failure may be an administrative error
Types of failures detected by browsers
Domain names don't match (may be OK)
Untrusted certificate authority (may be or not)
Expired certificate (often still safe)
Revoked certificate (Unsafe)
Invalid digital signature (Unsafe)
Assessing a Failure
Mismatched domain name: Whose certificate?
Would the actual owner of the certificate legitimately host this website?
Does the naming error make sense?
Untrusted certificate authority: Who signed it
It's “untrusted” because the browser didn't have the authority's certificate already
US military doesn't distribute its CA certificate with commercial browsers
Can we reliably download a valid certificate?
Server Masquerades
Sophisticated attacks will undermine SSL
Techniques to trick browsers
Bogus certificate authority
Usually detected by the browser
Misleading domain name
Examples: “paypai.com” “ebay-login.com”
Stolen private key – sign bogus certificates
Tricked certificate authority
The authority itself issues the certificate
Dynamic Websites
Static websites serve pre-built pages from files
Dynamic websites construct pages on demand
Performing a POST operation
Alice retrieves a “form” page from the server
The server transmits the HTML page
Alice fills out fields in the form, clicks “Submit”
Formats the fields into a POST operation
Sends them to the server
Server processes the POST, sends response
Processing a Web Form
Scripts for Dynamic Websites
Modern sites use scripts
Instead of retrieving a file from the site directory, the server executes a script
The script interprets the URL's path name
These are server-side scripts
The scripts execute on the server
Sites also use client-side scrupts
The scripts are embedded in the web page
The client executes the scripts
Server-Side Scripts
Scripting Languages
Perl – PL
Active Server Pages (Extended) – ASP, ASPX
Microsoft system that supports Visual Basic, Javascript, ActiveX, and the .Net framework
PHP – Hypertext Processor
Javascript – JS – often used on the client side
Java Server Pages – JSP
Python – PY
Ruby – RB
Client Scripting Security
Client-side risk
A script could modify files or software on the client's computer – a “drive-by download”
Waledec botnet does this
Cross-site scripting – script resides elsewhere
Client-side defenses
Same origin policy – all of script's accesses must use same host, port number, protocol
Sandboxing – block access to client resources except those allowed in by user
States and HTTP
HTTP servers don't save state themselves
We use cookies to establish state
Otherwise sites can't maintain shopping carts
Also makes it difficult to track individual visitors
Scripting language libraries handle cookies
Provide functions to track individual visitors
Provide functions to establish “sessions” and maintain data from one to the next
Content Management Systems
Manage contents of a dynamic website
Web contents stored in a database
Pages are built by a set of scripts
Four parts:
Operating system and protocol stack
Web server software
Database management software
Web scripting language
Open source systems often use “LAMP”
aka Linux, Apache, MySQL, and PHP
Organization of a CMS
Database Management Systems
A typical modern DBMS is relational
Stores data in a set of tables
Each table has rows of individual records
Each column is a different attribute
In some tables, an attribute will select records in a different table – making a relationship
Most use Structured Query Language (SQL)
A standard notation for database operations
A Relational Database
A Database Query in SQL
Retrieving a CMS Page
User types in a URL
Browser constructs an HTML GET or POST command and transmits it – either will work
Server receives the command and extracts the path name and any arguments from it
Server runs the main CMS script and passes it the arguments
The script locates database entries required to respond to the arguments
The script builds the page to send to browser
Command Injection Attacks
Attack on the chain of control at the DBMS
Trick the DBMS into executing an SQL command written by a visitor
The attacker enters malicious text into a text field in one of the site's forms
The malicious text is inserted into an SQL query, and its contents fool the DBMS
The contents either modify the meaning of the SQL query or add another query to the existing one
An SQL Injection Vulnerability (1 of 3)
An SQL Injection Vulnerability (2 of 3)
An SQL Injection Vulnerability (3 of 3)
OWASP Top Ten Risks
Injection
Command or SQL
Broken authentication
Sensitive data exposure
XML external entities
Broken access control
Security misconfiguration
Cross-site scripting
Aka XSS
Insecure deserialization
Using components with known vulnerabilities
Insufficient logging and monitoring
Ensuring Web Security Properties
Serving confidential data
SSL protects data in transit, but not at rest
This is like the DRM problem
Collecting confidential data
PCI-DSS standards for payment card data
Most sites off-load credit card processing
Site integrity
Protect site from external modification
If users can modify contents, extra caution is needed
Levels of Website Availability
Routine – no special steps ensure availability
High availability – downtime only takes place when scheduled – no unexpected downtime
Continuous operation – system operates with no scheduled outages, only unexpected ones.
Ongoing maintenance swaps out redundant equipment without taking the system offline
Continuous availability – system operates with no scheduled or unscheduled downtime
Combines the two features
Web Privacy
Software often keeps records of user activities
Browsers “cache” copies of pages
Servers record visitor IP addresses
Anonymous proxies – sites that perform NAT and redirect visitors to other sites
Masks the user's actual IP address
Onion routing and TOR – a proxy by the EFF
Private browsing
Browser mechanisms to minimize or erase the browser history