Python program for analysis of email messages
Cyber Forensics
Email Forensics
Dr. Rami 1
Agenda
• Why Email? • Email
– Content, Technology, Addresses, Protocols,
• Email – Client & Servers
• Header and MIME header • Tracing Email • Search • Advanced Search
Dr. Rami 2
Why Email?
• Email is Often the Best Evidence, Why?
– Contents can demonstrate intent
– Header data can demonstrate the source/trace
– Timestamps can show intent to mislead
– It can be used as evidence in a lot of cases
Dr. Rami 3
Why Email?
• Understanding the Email technology can help in:
– Locating e-mail messages thought to be destroyed and
– Proving the source of a message.
Dr. Rami 4
Email Content
• An Email Can have:
– Be Plain text (old days)
• No support graphics
– Be HTML structured (currently)
• support graphics and embedded content/formatting
– Have Attachments
• as a separate file
Dr. Rami 5
Email Technology
• Email Technology has 2 main parts: – Client Side: (Outlook Express, Outlook, Thunderbird,….)
• Mail User Agent -MUA- – Software interface that represents the end user
– The application that provides end user support
– Server Side: (Microsoft Exchange Server or the Linux-based SendMail,…etc.) • Mail Transport Agent - MTA-
– moves messages from point A to point B
• Mail Delivery Agent -MDA- – sorts received emails
– gets each message to the correct recipient
Dr. Rami 6
Email Trip
1. Run/Access Client App.
1. Local application on the user’s computer or (outlook, thunderbird,..)
2. Web application (Gmail, Hotmail, Yahoo,….)
2. Compose message
3. Click Send button
4. Hop several e-mail servers
5. Arrive at recipient server
6. Request to the server to download all new messages
7. Server sends all messages that have accumulated
1. Server can keep copies , OR
2. Server deletes after download (depending on the configuration and protocols used)
Dr. Rami 7
Email Addresses
• User ID – must be unique to a particular domain
– The same user ID on a different domain may or may not represent the same user
– You can choose your User ID
• Issue?: – User IDs can be spoofed with the
right software
– Spoofing is • Altering/modifying information so it appears as if it was sent from somewhere else
Dr. Rami 8
User ID
Domain
Email Addresses
• Domain:
– Is the Domain name that hosts the user account.
• Yahoo.com, gmail.com, saumag.edu,……
Dr. Rami 9
Email Protocols
• Two types of Protocols: (for Sending & Receiving)
– Transport protocols (Sending)
• Simple Mail Transport Protocol (SMTP)
– Mailbox protocols (Receiving)
• Post Office Protocol, ver. 3 (POP3)
• Internet Message Access Protocol (IMAP)
Dr. Rami 10
Email Protocols
• SMTP – Client
• Connects to the server application over TCP/IP port 25 or 587
• Sends a simple handshaking HELO packet – To tell server that [email protected] wants
to send a message to [email protected]
– Server • Examines both addresses
• Tries to send message
Dr. Rami 11
Email Servers
• SMTP Servers:
– handles all outgoing messages
• An SMTP server address, sometimes, looks like: smtp.xyz.com
– verifies the sender and target addresses
– If domain is the same between sender and receiver:
• A functions called delivery agent of SMTP hands the message to the POP3/IMAP server in the same domain
Dr. Rami 12
Email Servers
• SMTP Servers cont.: – If domain is different:
• SMPT server sends a request to a DNS server to resolve the domain name to the IP address of a target SMTP server
• When IP is known, the message will be sent to the address
• Send Success: – an ACK packet will be sent back
• Send Failed: – A NACK packet will be returned
– Failure message be sent from SMTP to POP3/IMAP to sender Client
Dr. Rami 13
Email Servers
• SMTP Servers cont.:
– Each intermediate server will be appended the header of the message with a Received: line.
Dr. Rami 14
Email Protocols • POP3:
– Client: • Connect to server over TCP/IP port 110
– Server: • Downloads messages to the client
• POP3
– Allows for • standard text messages, attachments, and
HTML encoded
– Messages can be: • Configured to:
– Remain on the server , Or
– Deleted Dr. Rami 15
Email Protocols
• IMAP (similar to POP3 but with two main differences)
• IMAP: – leaves all messages on the server after downloading (Good)
– Can be configured so that: • multiple users can administer the same mailbox (Not Good)
– Client: • Connect to server over TCP/IP port 143
– Server: • Downloads messages to the client
Dr. Rami 16
Email Servers
• POP and IMAP Servers: – the post office for the network
– Stores incoming messages
– Waits for users to access and download
– SMTP server: • Retrieves the message traveling in the internet
• Puts in message queue
• Notifies Delivery Agent –DA- of SMTP server
• DA transfers the message to the mail storage folder of POP3/IMAP – IMAP folder is on the server not on the client
Dr. Rami 17
Email Servers
• Important for Forensics
• POP3:
– Default config.:
• Delete messages from server after download
• IMAP:
– Default config.:
• Not to Delete messages from server after download
Dr. Rami 18
Email Clients
• Usually pre-installed with every OS
• Perform some basic functions – Send messages
– Receive messages
– Manage content (including attachments)
– Display list of messages in inbox by header
– Open a message • And associated attachments
– Add attachments to outgoing messages • And Receive attachments with incoming messages
Dr. Rami 19
Provider can apply
some restrictions
Email Clients
• Email Clients:
– Are Operating System specific
– Determine how information is archived on the system
– May be a local client or web-based
• How to archive emails on Outlook 2013 and 2016?
Dr. Rami 20
https://www.youtube.com/watch?v=CFmegBWHmXQ
Information Stores
• Acts as a cabinet for the information stored by the client
– Sent/Received messages
– Address books
– Calendars
• Each client has a specific format for storing data
• For example Microsoft Outlook, you can check this link:
– https://support.office.com/en-us/article/Locating-the-Outlook- data-files-0996ece3-57c6-49bc-977b-0d1892e2aacc
Dr. Rami 21
Email Servers
• Carrying messages among clients, there could be: – at least one e-mail server (example?)
– In most cases two or more
• The servers act as relay agents for moving messages across the Internet
• SMTP servers – handle all outgoing messages – Through the Internet
• IMAP/POP3 servers – handle all incoming messages – From Internet to Client
• Some Server applications combine SMTP with POP/IMAP – Such as Microsoft Exchange
Dr. Rami 22
Email Servers
Dr. Rami 23
Standard Header Information
• The structure of an e-mail is based on a standard called the Multipurpose Internet Mail Extensions (MIME)
• MIME defines message to have
– Header:
• Contains control information used by servers to identify and direct the message
– Body:
• Content created by the author of the message
Dr. Rami 24
Standard Header Information
• Header: – A Metadata fields contained in every message – Fields mainly used by email clients are:
• TO:
– Contains the name of the addressee. Separated by , or ; – Issues?
» Message can reach several people though only few are targeted
• FROM: – Sender of the message – Issues?
» Email spoofing is more likely to happen » Some viruses can send messages in the owner’s name
Dr. Rami 25
Standard Header Information
• Header cont.: • SUBJECT:
– It is optional – Can start with Re: or Fw:
• DATE: – When the message was sent – Generated by the e-mail client – Depends on the time and date of the client machine – Issues?
» Can be modified , » How to reveal the truth?
• Time/date stamps found in the header fields and generated by intermediate transport servers
Dr. Rami 26
Standard Header Information
• There are other metadata fields populated by e-mail clients as well as servers along the path of the message.
– This header information can be extracted easily from the e-mail client
Dr. Rami 27
Standard Header Information
Dr. Rami 28
MIME Header Information
• Information stored in the header that includes:
– Time/Date stamps for various actions along the way
– Server information
• for relay servers along the way
– A message ID
• It is unique to this message across the Internet
– Versions of software used along the way
– IDs of recipients
– A Return path
Dr. Rami 29
Dr. Rami 30
SAU header outlook example
Intermediate servers
start servers
end servers
MIME Header Information
• Gmail:
Dr. Rami 31
Gmail header example
Tracing the Origin of a Message
• Each server that relays the message adds its IP address
• Each relay server maintains logs for a certain period of time that indicates the IP address of the sender as well as the intended recipient
• While the time stamp can be manipulated at the origin, the ones added along the way are likely real
Dr. Rami 32
Tracing the Origin of a Message
• Online tracking example(ip2location.com)
– Copy the header and go to the website
– Paste it in the box
Dr. Rami 33
Track an email account with security concerns
• On Parrot Linux
Dr. Rami 34
Dr. Rami 35
Dr. Rami 36
Tracing the Origin of a Message
• Track an IP address ( Linux command line ):
– Install curl (if not installed already)
• sudo apt install curl
– Run the command:
• First: Sign Up and Get access code from ipstack.com
– The command looks like: • curl http://api.ipstack.com/175.216.169.45?access_key=YOUR_ACCESS_KEY
Dr. Rami 37
Tracing the Origin of a Message
• The data is sent in JSON format
Dr. Rami 38
Tracing the Origin of a Message
• Try the location information: (Google Maps)
– "latitude": 37.5112
– "longitude": 126.9741
Dr. Rami 39
Some Email Search Tools
• Commercial:
– Clearwell
– Paraben
• Free:
– GREP:
• Famous command in Linux
Dr. Rami 40
• Download and extract Enron1
• Be in enron1 folder
Dr. Rami 41
More on
grep
Some Email Search Tools
• Usually connecting terms/operators to improve/narrow results: – AND:
• The search must include both words (or both phrases enclosed in quotes). – Honda AND ford
– OR: • The search must include either of the words or quoted phrases, but not necessarily both.
– Honda OR ford
– + or “”: • Search for the phrase exactly as typed (do not put a space between + and first term of search
string). – Honda+ford
– - or NOT: • Do not include any entity that contains the following string along with the defined search
– Honda -Ford
Dr. Rami 42
Search Results
• Searching emails can result in: – False positives – [it is a hit but it is wrong hit]
• Retrieved but are not relevant
– False negatives [it is a miss but it is wrong miss] • Not retrieved but are relevant
– True positive: [it is a hit and it is a right hit] • Retrieved and relevant
– True Negative: [it is a miss and it is right miss] • Not retrieved and Not relevant
Dr. Rami 43
Search Results
• Precision: – The fraction of retrieved emails that
are relevant to the search query:
• Example: – Entire data: 1000 emails
– Search hit: 100 emails • 90 are relevant (true positive, tp)
• 10 are irrelevant (false positive, fp)
• Precision: 90/ (90+10) = 0.9
Dr. Rami 44
Search Results
• Recall: – The fraction of emails that are relevant
to the query that are successfully retrieved
• Example: – Entire data: 1000 emails
– Search hit: 100 emails • 90 are relevant (true positive, tp)
• 10 are irrelevant (false positive, fp)
• But, the dataset has 500 relevant emails not retrieved (false negative, fn)
• Recall: 90/ (90+500) = 0.15
Dr. Rami 45
Search Results
• Search Accuracy:
Dr. Rami 46
Advanced Search Methods
• Advanced Analysis of User Email Behavior:
– Stationary User Profiles • a method of determining if a user makes use of multiple accounts
– Similar Users • a way of determining if what appears to be a single user is actually multiple users
– Attachment Statistics • a user’s typical behavior regarding attachments is analyzed
– Recipient Frequency
• what types of messages a specific user usually receives
Dr. Rami 47
References
• Digital Archaeology
• http://www.cse.scu.edu/~tschwarz/COEN252_09/Lectures/ Email%20Investigation.html
• http://lifehacker.com/how-spammers-spoof-your-email- address-and-how-to-prote-1579478914
Dr. Rami 48
Email Protocols
Dr. Rami 49
• Email Tracker Pro:
Dr. Rami 50
Smart-ip.net
Dr. Rami 51