Imprint | Privacy Policy

Web and E-Mail

(Usage hints for this presentation)

Summer Term 2020
Dr. Jens Lechtenbörger (License Information)

DBIS Group
Prof. Dr. Gottfried Vossen
Chair for Computer Science
Dept. of Information Systems
WWU Münster, Germany

1 Introduction

1.1 Learning Objectives

  • Explain communication patterns for Web and e-mail exchanges
    • Perform simple HTTP requests via telnet or gnutls-cli
    • Interpret E-Mail headers
  • Explain the concept of “stateless servers”
  • Explain constraints and advantages of caching
  • Discuss alternatives to and weaknesses of e-mail security established by secure channels between MUA and MTA

1.2 Previously on CACS …

1.2.1 Communication and Collaboration

  • Communication frequently takes place via the Internet
    • Telephony
    • Instant messaging
    • E-Mail
    • Social networks
  • Collaboration frequently supported by tools using Internet technologies
    • All of the above means for communication
    • ERP, CRM, e-learning systems
    • File sharing: Sciebo, etherpad, etc.
    • Programming (which subsumes file sharing): Git, subversion, etc.
  • All of the above are instances of DSs

1.2.2 Recall: Internet Architecture

  • “Hourglass design”

    Internet Architecture with narrow waist

  • IP is focal point
    • “Narrow waist”
    • Application independent!
      • Everything over IP
    • Network independent!
      • IP over everything
  • Today: HTTP and SMTP at application layer

1.3 Today’s Core Questions

  • What does your browser do when you enter a URI in the address bar?
  • How does e-mail transfer work?

2 Web

2.1 History of the Web (1/2)

2.2 History of the Web (2/2)

  • 1992, NCSA Web Server available
    • National Center for Supercomputing Applications, University of Illinois, Urbana-Champaigne
  • 1993, Mosaic browser created at NCSA
  • 1994, World Wide Web Consortium (W3C) founded by Tim Berners-Lee
    • Publication of technical reports and “recommendations”
  • Now
    • Web 2.0, Semantic Web, cloud computing, browser as access device

2.3 WWW/Web

  • Standards
    • W3C (HTML 4 Specification)
      • “The World Wide Web (Web) is a network of information resources.”
    • HTTP/1.1 Specification (RFC 7230)
      • “The Hypertext Transfer Protocol (HTTP) is a stateless application-level protocol for distributed, collaborative, hypertext information systems.”
  • Distributed information system
    • Client-Server architecture
      • Web clients (browsers) and servers exchange HTTP messages based on Internet standards
    • Sample Web standards (application layer of Internet architecture)
      • URIs (Uniform Resource Identifiers, generalize URLs and URNs)
      • HTTP (now)
      • ((X)HTML)

3 HTTP

3.1 HTTP

3.2 Excursion: Manual Connections

3.2.1 Warnings

  • Next two slides demonstrate how to type HTTP commands (for an improved understanding of the protocol)
    • Subsequent examples with www.informationelle-selbstbestimmung-im-internet.de require GnuTLS
      • Server redirects from port 80 to port 443
    • If your manual typing is too slow, connections may time out (e.g., “Peer has closed the GnuTLS connection”)
    • Also, use of backspace or cursor keys may destroy connections
  • Suggestion: Type in text editor and copy&paste into command line

3.2.2 telnet

  • Original telnet purpose: Login to remote host
    • Insecure plaintext passwords
    • Nowadays, remote login performed with Secure Shell, ssh
  • Establish TCP connection to destination port
    • telnet www.google.de 80 (port 80 for HTTP)
      • (For variants without visual feedback possibly followed by ctrl-+ or ctrl-], set localecho [enter] [enter])
      • GET / HTTP/1.1 [enter]
      • Host: www.google.de [enter] [enter]
      • (Context for above lines soon)
    • telnet wi.uni-muenster.de 25 (port 25 for SMTP)
    • Beware: Buggy telnet implementations may stop sending after first line (use Wireshark to verify)

3.2.3 gnutls-cli

  • Establish TLS protected TCP connection with GnuTLS
    • Alternative to telnet on previous slide
    • gnutls-cli --crlf www.informationelle-selbstbestimmung-im-internet.de
      • (HTTPS on port 443 by default)
      • GET /chaosreader.html HTTP/1.1 [enter]
      • Host: www.informationelle-selbstbestimmung-im-internet.de [enter] [enter]
    • gnutls-cli --crlf --starttls -p 25 wi.uni-muenster.de (SMTP for e-mail)
      • Type ehlo localhost, then starttls; press ctrl-d to enter TLS mode

3.3 Excursion: Browser Tools

  • Modern browsers offer developer tools
    • E.g., press ctrl-shift-I with Firefox
    • Tools to inspect HTML, CSS, Javascript
    • Tools to inspect HTTP traffic (Network tab)
      • Live view on browser requests and server responses
        • With details on timing, caching, headers
    • Console with error messages
    • And much more

3.4 HTTP Messages

  • Requests and responses
    • Generic message format of RFC 822, 1982 (822→2822→5322)
      • Originally for e-mail, extensions for binary data
    • Messages consist of
      • Headers
        • In HTTP always a distinguished start-line (request or status)
        • Then zero or more headers
      • Empty line
      • Optional message body
    • Sample GET request (does not have a body)
      • GET /chaosreader.html HTTP/1.1\r\n
        Host: www.informationelle-selbstbestimmung-im-internet.de\r\n
        \r\n
  • Excerpt of sample HTTP response to previous GET request
    • HTTP/1.1 200 OK\r\n
      Date: Wed, 08 Apr 2020 13:30:10 GMT\r\n
      Server: Apache\r\n
      Last-Modified: Wed, 24 Jul 2019 12:25:46 GMT\r\n
      ETag: "2cd1-58e6c6898dce2"\r\n
      Content-Length: 11473\r\n
      more headers omitted
      Content-type: text/html; charset=utf-8\r\n
      \r\n
      HTML code as body

3.5 HTTP Methods

  • Case-sensitive (capital letters)

3.6 Conditional GET

  • GET under conditions
    • Requires (case-insensitive) request header
      • (Can be used by browser to check if cached version still fresh)
      • If-Modified-Since
      • If-Match
      • If-None-Match
  • Example
    • Request
      • GET /chaosreader.html HTTP/1.1
        Host: www.informationelle-selbstbestimmung-im-internet.de
        If-None-Match: "2cd1-58e6c6898dce2"
    • Response
      • HTTP/1.1 304 Not Modified
        Date: Wed, 08 Apr 2020 14:07:31 GMT
        additional headers

3.7 Sample Status Codes

  • Three digits, first one for class of response
    • 1xx: Informational - Request received, continuing process
      • 100: Continue - Client may continue with request body
    • 2xx: Successful - Request successfully received, understood, and accepted
      • 200: OK
    • 3xx: Redirection - Further action necessary to complete request
      • 302: Found (temporarily under different URI)
      • 303: See Other (redirect to different URI in Location header)
      • 304: Not Modified (previous slide)
    • 4xx: Client Error - Request with bad syntax or cannot be fulfilled
      • 403: Forbidden
      • 404: Not Found
    • 5xx: Server Error - Server failed for apparently valid request

3.8 Review Question

  • Did you execute GET requests and conditional GET requests on the command line? Any surprises?
    • Note that examples with www.informationelle-selbstbestimmung-im-internet.de require GnuTLS (server redirects HTTP requests on port 80 to HTTPS port 443).

4 Server State and Cookies

4.1 State Models

  • Stateless: Server does not maintain client state
    • Advantages
      • Simplified server design, reduced resource usage
      • State changes on server do not require client notifications
      • Recovery (restart after server crash) “simple”: No client state to restore
    • E.g.: HTTP, DNS
      • Server forgets client after request
      • No session
  • Stateful: Server maintains client state
    • E.g., file server with table of pairs (Client, File) for caching
      • Keep track which client has current version
      • Performance improvement via locality
    • Recovery requires to restore consistent state

4.2 Stateful Web Applications

  • HTTP is stateless
    • Yet, Web applications often maintain client state
      • E.g., personalized session after login
        • Virtual shopping cart
        • Shopping history, preferences
        • Exercises in Learnweb
    • Solution for stateful applications
      • Manage state of related requests as session outside HTTP
      • Use HTTP messages to transfer session IDs (next slide)

4.3 Session IDs

  • Session ID = Identifier to connect subsequent/related requests and responses
    • Typical variant: Client-side storage of IDs in browser
      • ID sent by server S, stored by browser (cookie or local storage)
      • Browser includes IDs set by S for every subsequent visit of S
        • Think of automatic ID card (whose contents you do not understand)
        • My browsers remove cookies and clear local storage upon exit
    • Alternative: Server-side, session ID embedded in dynamically generated URIs
      • May hinder caching
        • URI does not identify resource any longer

4.3.1 Cookies (1/2)

  • RFC 6265: HTTP State Management Mechanism
    • Idea
      • Client stores data sent by server
      • Client sends this data with subsequent requests
        • Without understanding that data at all
    • Details
      • Cookie is named byte string
      • Server transfers cookie in Set-Cookie (2) header in response
        • Set-Cookie: Version 0/Netscape and RFC 6265
        • Set-Cookie2: Version 1, RFC 2965
        • (Besides, JavaScript may create cookie at client)
      • Client sends cookie in Cookie header in requests

4.3.2 Cookies (2/2)

  • Note: Sometimes you may read that cookies are text files
    • That is usually wrong, misleading, and irrelevant
    • Modern browsers store cookies as rows in a relational database
      • Storage in filesystem or database is an implementation detail
  • Cookies have name, value, optional attributes/flags, e.g.:
    • Expires, Max-Age
      • Determine lifetime of cookie
      • If both missing: “Session” cookie to be deleted when browser exits
    • Domain
      • DNS domain of servers to which the cookie should be sent

5 Caching

5.1 HTTP Caching

  • Caching reduces latency and server load for identical requests

    HTTP cache types

    HTTP cache types” by Mozilla Contributors under CC BY-SA 2.5; from MDN web docs

  • HTTP caching assumptions
    • URI identifies resource, stability, client-independence
  • Semantic transparency
    • Caching is not visible to users
    • Response from cache is equivalent to hypothetical one from server

5.2 HTTP Caching Mechanisms

  • Expiration
  • Validation
    • After expiration date, cache must check whether resource still usable
    • May return new expiration date

5.3 HTTP Caching Rules

  • Complex rules, lots of details
  • Server may limit caching
    • no-store, no-cache, must-revalidate
  • Client may
    • enforce validation
      • no-cache
    • forbid caching
      • no-store

6 Proxies

6.1 Web Proxies

  • Web proxy server is intermediary between client and server
    • Acts as server to client
      • Proxy accepts request from client
        • Then acts as client to server to obtain response
      • Proxy delivers response to client
    • Acts as client to server
      • Proxy sends request of real client to server
        • Server just sees some client request
      • Proxy obtains response from server

6.2 Sample Proxy Applications

  • Cache
  • Firewall/Content filter
  • Anonymizer, e.g., Tor
  • Debugging tool
    • E.g., intercept and analyze app network data
  • Surrogate/Reverse proxy, Content Delivery Network (CDN)
    • Replicated contents, inbound messages intercepted and redirected, e.g.:
      • Load balancing
      • Geographical diversity (reduced latency, increased availability)

6.3 Review Questions

  • What is a stateless protocol? Given that HTTP is a stateless protocol, how can lots of applications that apparently require state be implemented on top of HTTP?
  • Where are HTTP caches typically located? What impact might HTTPS have on caching?

7 E-Mail

7.1 E-Mail Basics

  • Among oldest Internet applications
  • Message format
    • Based on RFC 822, 1982 (later taken up in HTTP)
    • Extended with Multipurpose Internet Mail Extensions (MIME)
      • Content-Type (type of data contained in message)
      • Content-Transfer-Encoding (how data in message body is encoded)
  • Plaintext messages

7.2 Message Transfer

  • Terminology
    • Mail User Agent (MUA): Your mail reader
      • E.g., browser, Thunderbird, Emacs
    • Mail Transfer Agent (MTA): Mail server/daemon
      • E.g., sendmail, exim, postfix
  • Simple Mail Transfer Protocol, 1982 (SMTP, RFC 821→2821→5321)

    Hop-to-hop security of e-mail

    • Outgoing messages, MUA-to-MTA, MTA-to-MTA
      • Plaintext (TCP/IP, port 25)

7.3 SMTP

telnet wi 25
Trying 128.176.159.139...
Connected to wi.uni-muenster.de.
Escape character is '\^]'.
220 wi-vm700.wi1.uni-muenster.de Microsoft ESMTP MAIL Service ready at Tue, 27 Oct 2009 11:22:11 +0100
HELO mouse.nix
250 wi-vm700.wi1.uni-muenster.de Hello [128.176.159.107]
MAIL From: micky@mouse.nix
250 2.1.0 Sender OK
RCPT To: lechten@wi.uni-muenster.de
250 2.1.5 Recipient OK
DATA
354 Start mail input; end with <CRLF>.<CRLF>
Received: from mx1.disney.com ([192.195.66.20]) by smtp.mouse.nix Super Duper SMTP Server; Tue, 27 Oct 2009 11:19:17 +0100
To: 42@universe.com
From: micky@mouse.nuix
Subject: Don't panic

Somebody Else's Problem!  (This is the message body after the empty
line.  Note that headers preceding the empty line have also been
entered manually.  They are ignored by SMTP, but displayed to user.)

.

250 2.6.0 <b13a2a36-f56b-43ec-ad81-41ec44190e6a@wi-vm700.wi1.uni-muenster.de> Queued mail for delivery

7.4 SMTP MUA Header

Microsoft Mail Internet Headers Version 2.0
Received: from wi-vm700.wi1.uni-muenster.de ([128.176.158.92]) by wi-vmail2005.wi1.uni-muenster.de with Microsoft SMTPSVC(6.0.3790.3959); Tue, 27 Oct 2009 11:22:35 +0100
Received: from mouse.nix (128.176.159.107) by wi-vm700.wi1.uni-muenster.de (128.176.159.139) with Microsoft SMTP Server id 8.1.375.2; Tue, 27 Oct 2009 11:22:28 +0100
Received: from mx1.disney.com ([192.195.66.20]) by smtp.mouse.nix Super Duper SMTP Server; Tue, 27 Oct 2009 11:19:17 +0100
To: 42@universe.com
From: <micky@mouse.nuix>
Subject: Don't panic
MIME-Version: 1.0
Content-Type: text/plain
Message-ID: <b13a2a36-f56b-43ec-ad81-41ec44190e6a@wi-vm700.wi1.uni-muenster.de>
Return-Path: micky@mouse.nix
Date: Tue, 27 Oct 2009 11:22:28 +0100
X-OriginalArrivalTime: 27 Oct 2009 10:22:35.0473 (UTC) FILETIME=[66C35410:01CA56EF]

7.5 Review Questions

  • Who found the previous e-mail in his or her inbox?
  • What parts of header data are trustworthy (to what degree)?

7.6 Concluding Questions

  • What did you find difficult or confusing about the contents of the presentation? Please be as specific as possible. For example, you could describe your current understanding (which might allow us to identify misunderstandings), ask questions in a Learnweb forum that allow us to help you, or suggest improvements (maybe on GitLab). Most questions turn out to be of general interest; please do not hesitate to ask and answer in the forum. If you created additional original content that might help others (e.g., a new exercise, an experiment, explanations concerning relationships with different courses, …), please share.

8 Conclusions

8.1 Summary

  • Web browsers and servers talk HTTP
    • Simple message format
    • Stateless request/response protocol
      • State via cookies
    • Different connection types
    • Caching for performance
  • E-Mail transferred via SMTP

8.2 Outlook

  • HTTP used for various applications
    • Web services
      • SOAP messages
    • Ad-hoc request/reply protocols
  • REST
    • Representational State Transfer
    • Software architecture for distributed hypermedia systems
      • Generalization of Web
      • Defining constraints
        • Client/Server
        • Stateless
        • Cacheable
        • Uniform interface, may use: URIs, MIME types, HTTP methods
        • Layered System
        • (Code on demand)

License Information

This document is used to teach basics of distributed systems. Source code and source files are available on GitLab under free licenses.

Except where otherwise noted, the work “Web and E-Mail”, © 2018-2020 Jens Lechtenbörger, is published under the Creative Commons license CC BY-SA 4.0.

No warranties are given. The license may not give you all of the permissions necessary for your intended use.

In particular, trademark rights are not licensed under this license. Thus, rights concerning third party logos (e.g., on the title slide) and other (trade-) marks (e.g., “Creative Commons” itself) remain with their respective holders.