[Remops] Description of an 'Anonymous Internet Message Format'

Mon Jul 3 11:10:53 BST 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello list members,

on the occasion of my findings concerning specific header
patterns resp. fingerprints provided by message envelopes in
general, and the partially very emotional debate on that topic
at a.p.a-s., I developed a paper with recommendations on how to
make anon messages more uniform, supported by Zax and Thomas J.
Boschloo, and reinsured about the validity of that strategy by
Richard Christman, the author of Quicksilver. I'm truly pleased
by their support and have to thank them a lot for those efforts.

I would now like you, the developers of anon software, dedicated
remops and experienced users of such services, to join the
discussion on the specifications expressed in that document, in
order to make anon message transfers more secure. A submission
to the IETF to achieve the status of an Internet-Draft would be
the next step.

Many thanks in advance for your questions, comments, and/or
suggestions.

With kind regards

Christian Danner

OmniMix .. protect your privacy
http://www.danner-net.de/om.htm

- ------------------------------------------------------------
Draft of an Anonymous Internet Message Format

Version: 2006/07/02

Author: Christian Danner <christian at danner-net.de>

Abstract

The current standards leave a lot of room for composing Internet
messages in different ways. Such a variety is a potential threat
in the course of anonymous message transmissions. Based on
established Internet standards, this document describes more
restricting rules to avoid individual 'fingerprints' adversaries
can use to extract profiles related to the sender and this
equipment.

1. Introduction

Anonymizing services are there to preserve the sender of a
message from being uncovered.

E.g. with anonymous remailer services a message passes several
onion routers in multiple encrypted envelopes to prevent from
finding out the address of the original sender. At every station
of the remailer chain the router removes one encryption layer
('secure cover') using its private key, and sends the resulting
message to the next router, the address of which is set as the
destination address on the now recovered envelope. The message
to be anonymized stays hidden within the innermost encrypted
envelope, until it gets to the exit remailer. From there, after
the removal of the last envelope, the message has to be
delivered to the final addressee. With the current level of
knowledge, such sort of routing has to be assumed secure, as
long as at least one of the routers in the chain isn't
compromised.

At the exit side of the anonymization process (besides the
irrelevant address of the sender, which here is the previous
remailer) only the message itself is left for an adversary to
retrieve information, which may point to the originator. That's
why it is important to make it contain the least possible amount
of hidden individual information.

But of what parts does such a message consist, and where can
individual information be hidden? Refering to RFC 2822, which
defines the 'Internet Message Format', each message consists of
two parts, the header and the optional body section.

The body of a message represents the information to be
transferred and simply is lines of US-ASCII characters. Within
that the originator has nearly full control of the data he
sends, thus is responsible himself for the information he
reveals.

That's totally different with the header section. There, where
the parameters needed to accomplish a reliable delivery are
stored, the originator has only restricted influence, so far as
mail resp. news client applications are used, which do most of
the value assignments automatically by themselves. The
concerning RFCs leave much room for interpretation, and even if
they are followed, which mostly isn't the case, as shown in
up-to-date investigations, there's a great variety of manners in
which valid message 'envelopes' can be composed.

But characteristics, that reflect the equipment used to create a
message, being constant or variable within specific limits, are
susceptible to attacks.

That's why this paper was developed. It's aim is to provide
guidelines for the composition of those message parts, which
don't reflect information the user intends to transfer, thus
all, which can be called 'message envelope'. A widespread
standardization of that envelope prevents from being divided on
the basis of the tools you use to create and distribute your
message, so that there remains only the risk coming directly
from the text written by the author her/himself.

2. RFC compliance

This document is compliant with existing RFCs. If, for technical
reasons, updates of the concerning RFCs are developed, which
contradict the definitions in this document, those parts of this
paper become invalid.

3. Message characteristics

3.1. Body section

3.1.1. User provided information

This document doesn't want and can't be a guide for users of
anonymization services on how to reveal the least possible
amount of specifics about themselves. Besides the information
itself a lot of semantic and syntactic patterns, from preferred
words, the complexity of your sentences and recurrent spelling
mistakes, up to the setting of spaces and punctuation marks, can
give relevant data to an adversary. So to get some sort of
standardization here, at least the usage of a spelling checker
would be recommendable.

3.1.2. Trailing empty lines

To vary the number of empty lines at the end of a message body
isn't a reliable method to transfer hidden information from the
originator to the addressee of a message, as MTAs usually don't
care about them a lot. But they are at least capable of
delivering equipment specific information to adversaries, and
thus have to be removed wherever being detected, so that
anonymous messages always have to be terminated by a
CRLF-DOT-CRLF sequence directly following the last US-ASCII
character. Lines containing only non visible characters like
SPACEs and TABs have to be removed from the end of the message
body as well.

3.2. Header section

3.2.1. Common recommendations

TAB characters have to be converted to SPACEs. Leading and
trailing SPACEs have to be removed.

Header folding has to be avoided, as long as the 998 character
per line limit [RFC 2822 section 2.1.1] isn't reached. If
necessary, the folding location has to respect the integrity of
the single subitems (e.g. addresses). The folded line has to be
indented by one single SPACE character.

3.2.2. Header field ordering

As stated in RFC 2822, there's no restriction in the ordering of
message header items, as long as the message is no reply to
another message, in which case the ordering of that message has
to be followed. So each user resp. client application is allowed
to specify an individual header sequence. Moreover most of the
latter disregard the RFC and stick to their own ordering even
with follow-up messages. This circumstance provides a unique
fingerprint, if the sequence stays unchanged till up to the
delivery.

Two conceivable methods to avoid that kind of information
leakage would be randomization and standardization of the header
field ordering of every message.

As randomization has several disadvantages (e.g. it's difficult
to survey such headers and to define applicable random
parameters) this document favours a standardized header
ordering. RFC 2822 does explicitly allow to take such a step
even for reply messages, as long as there are reasons for doing
so, and removing user specific data in an anonymization process
undoubtedly is a sound reason.

RFC 2822 itself provides a header item list in section 3.6,
which isn't necessarily intended to be used as an ordering
model, but it's sequence reflects the importance of the single
header items, and is more or less followed by the client
applications currently operating:

  Date
  From
  Sender
  Reply-To
  To
  Cc
  Bcc
  Message-ID
  In-Reply-To
  References
  Subject
  Comments
  Keywords

To that list the following MIME header fields were appended:

  MIME-Version
  Content-Type
  Content-Transfer-Encoding

We recommend to order the header items of anonymous messages in
exactly that order. Headers not present in the list above have
to be appended, namely first normal (not starting with 'X-'),
then extended ('X-') optional fields, each in alphabetic order.

It has to be emphasized, that not all headers listed are
relevant with current implementations of anon remailing.
Furthermore not all have to transmit their original value as
intended by the RFCs (e.g. some time the 'Date' header may under
certain circumstances become a way to define, when to deliver
the message to the destination address).

3.2.3. Header field names

The names of header fields included in the list in paragraph
3.2.1 have to be written exactly as shown. With header fields,
which are not included in the list, the first letter of the
name, as well as the first letter directly following any non
letter character, has to be capitalized. All other letters have
to be written in lower case.

Examples:

  'Subject', 'Message-ID', 'X-No-Archive',
'X-Mail2News-Contact'.

3.2.4. Header field arguments

3.2.4.1 Common recommendations

All tokens defined by an RFC, like the established MIME specific
values, have to be written in lower case:

  Content-Transfer-Encoding: quoted-printable

The first character (of the first word) of any free text
argument SHOULD be capitalized. Beyond this there is no
restriction on cases, except the following words being the
complete argument, which have to be written as presented:

  'Yes', 'No', 'True', 'False', 'On', 'Off'.

Date resp. time parameters have to be converted to their UTC
representation, like

  Mon, 28 Jun 2004 09:39:02 +0000 (UTC)

3.2.4.2 Header field parameters

Concerning header fields with one or several parameters, those
and their values have to consist of only lower case characters,
as long as the case is not of importance to process the message
correctly (which e.g. applies to MIME boundary delimiters).
Values always have to be surrounded by double quotes. SPACEs
between the parameter name, the delimiter ('=') and the value
aren't allowed. Separating those entities, one single SPACE has
to follow the semicolon.

Example:

  Content-Type: text/plain; charset="iso-8859-1"

3.2.4.3 Address lists

Double quotes surrounding the name within an address argument as
well as unnecessary LESS THAN and GREATER THAN characters around
addresses should be removed, if allowed by the RFCs. The domain
name part of the address has to be written in lower cases, the
potentially case-sensitive mailbox local-part [RFC 2821 2.4] has
to stay untouched (should the occasion arise, overriding the
first-character-in-upper-case rule), the words have to be
separated by no more than a single SPACE. The single items of an
address list should be separated by a COMMA followed by one
SPACE character.

With initial messages, if there's no other meaningful method
(like to express the relevance of the single addressee), a case
insensitive alphabetical ordering of the address list is
recommended, listing first addresses with a fully-qualified
domain name label-by-label from the right (TLD) to the left (in
accordance with ISO/IEC 14651), then those with an ip-address
number-by-number from the left to the right. Different mailboxes
of one and the same domain / ip-address have to be sorted
item-by-item from the right to the left. Follow-up messages
should conserve the ordering, but not the writing style of the
referred message.

Examples:

  To: "John Doe II"  <John.Doe at Machine.Example>

has to be converted to

  To: John Doe II <John.Doe at machine.example>

Also according to the rules would be

  To: John.Doe at machine.example, paul.pan at 98.87.219.55, Pete Pan
<Pete.Pan at 98.87.219.55>, harvey at 209.214.12.258

Leading and trailing spaces have to be removed. One single SPACE
character has to separate the header argument from the header
field name.

3.2.4.4 Newsgroups lists

In contrast to address lists, within a newsgroups list the items
have to be written in lower cases and separated only by a COMMA.
No SPACEs are allowed.

Newsgroups lists have to be ordered according to address lists.
Within initial messages, if there's no other meaningful method
(like to express the relevance of the single newsgroup), a case
insensitive alphabetical sorting of the newsgroups list (in
accordance with ISO/IEC 14651) from the left to the right (alt,
comp., rec. ...) has to be carried out. Follow-up messages
should conserve the ordering, but not the writing style of the
referred message.

3.2.4.5 Subject

Reply indicators ('re:', 'aw:') have to be replaced by 'Re:',
followed by one single SPACE character, not even preserving the
style of a reply the message is responding to.

4. MIME boundary character sequence

MIME boundary delimiter definitions are often individual to the
client application, which generated it. The general format
stated in RFC 2046 leaves a lot of room for individuality, so
more restricted rules have to be established.

As the replacement of lower level delimiters intervenes too deep
into the message structure, only the replacement of the top
level MIME boundary sequence, defined in the 'boundary' part of
the 'Content-Type' header is recommended here. The transmission
of data with a more complex structure should preferably take
place within an additionally encrypted container.

The MIME boundary delimiter replacement must apply to the
following rules:

- - The MIME boundary parameter value has to consist of 32
characters including numbers and upper / lower case US-ASCII
letters, which are selected randomly.
- - On a random basis somewhere within this sequence two
successive characters have to be replaced by '=_', as it is
recommanded by RCF 2046.
- - The resulting argument then has to be surrounded by double
quotes.

So a valid 'Content-Type' header entity would look like

  Content-Type: multipart/mixed;
boundary="QnaAre=_PebBx3ObfPuy0b9PueVfgZna4K"

with it's representatives within the body section appearing as

  --QnaAre=_PebBx3ObfPuy0b9PueVfgZna4K

and

  --QnaAre=_PebBx3ObfPuy0b9PueVfgZna4K--

5. Epilogue

Transforming messages automatically in accordance with the
specifications defined above may not be practicable under all
circumstances. It is even conceivable, that some tasks, like
choosing the appropriate address format, will always have to be
done by the instructed user. But as the fulfilling of every
single rule narrows down the number of individual
characteristics, which by itself is a step forward towards a
more secure transmission of anonymous messages, from the user's
point of view even an incomplete implementation would be better
than doing nothing.

This draft doesn't relate to the delivery side, where the
anonymization process already has to be finished. Modifications,
that take place there, are irrelevant, and may at best be able
to disguise the facts from those, who aren't controlling the
decrypted message up to then.

This document is an Internet-Draft.  By submitting this
Internet-Draft, the author represents, that any applicable
patent or other IPR claims of which he is aware has been or will
be disclosed, and any of which he becomes aware will be
disclosed. The distribution of this memo is unlimited, as long
as this happens in it's entirety and without any modification.

- ------------------------------------------------------------

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (MingW32) - GPGshell v3.30

iD8DBQFEqOyUzsHAT7/ZLSkRAiBRAJ9Z/tWEk06HCtLezASwoKcbsguQIwCg7Jch
JyGCrIlAy+Zd6K7ZGGmPU6g=
=eg7m
-----END PGP SIGNATURE-----