Rewriting


The process of converting one form of an address into another is called rewriting. The core of address parsing are the rewriting rules. Sendmail scans through the set of rewriting rules looking for a match on the left hand side (LHS) of the rule. When a rule matches, the address is replaced by the right hand side (RHS) of the rule. There are several sets of rewriting rules. Rule sets are used to modify e-mail addresses, detect errors in an e-mail address, and to select delivery agents.


The syntax for rulesets and rules are:


Sn


Sets the current ruleset to n. If you begin a ruleset more than once it appends the contents of the new rule set to the end of the old definition.


Sn=x


ruleset n can also be called by the numeric value of x.


All rules listed after the rule set identifier belong to that rule set until another rule set is declared.


Rules look like:


Rlhs rhs comments


The fields must be separated by at least one tab character; there may be embedded spaces in the fields. The lhs is a pattern that is applied to the input. If it matches, the input is rewritten using the rhs.


Comments are ignored. Comments are any characters following a # symbol.


The left hand side can be thought of as an "if" statement. The right hand side can be thought of as the "then" statement.


Macro expansions of the form $x are performed when the configuration file is read. A literal $ can be included by using $$. Expansions of the form $&x are performed at run time using a somewhat less general algorithm. This is intended only for referencing internally defined macros such as $h, that are changed at runtime.


Sendmail views the text that makes up rules and addressed as being composed of individual tokens. Rules are tokenized or divided into individual parts while the configuration file is being read.


The text, some.domain, is tokenized into three parts, these are:


some
.
domain


Sendmail decides where to make a break based on the tokenizing character. If you look at your options you'll see one called Operator Characters. Sendmail combines the list of Operator Characters with its own internal list of ()<>,;"\r\n to come up with the tokenizing characters list. When Sendmail reaches one of these tokenizing characters, it creates a new token from the input value.


For the lhs, tokenizing is extremely important. If you don't remember this, what you're trying to match may not result in a boolean true and then the rhs won't be activated.


Quotation marks can be used to override the tokenizing character.


"alan@emailaddress".com


becomes


"alan@emailaddress"
.
com


Notice that the quotation marks are retained. A space always acts as a tokenizing character. The space itself is not a token. After an address has passed through all of the rules, the tokens are then pasted back together again to form a single string.


If two adjacent tokens are just text strings and neither is a tokenizing character, Sendmail creates the string by using the value of the first string, plus the value of the Blank Sub option, followed by the value of the second string.


When an address is passed to a rule set the first thing that happens is a workspace is created.


The address is tokenized as described above. Then the value of the workspace is matched to the lhs of the rules for that rule set. If a lhs match occurs then the workspace is modified by the rhs. The rhs is then passed back to the workspace and becomes the new workspace. The workspace is then passed back to all of the rules again until there is no match made on the lhs. You can run into loops doing this.


V10
Stest
R fred fred


This tells us that the configuration file is a version 10 configuration file. There is one rule set defined as test. The test rule set has one rule, if the workspace contains the value of fred, modify the contents of the workspace with the value of fred.


Of course if the value of the workspace really is fred and you rewrite it to fred, and then pass it back for further matching, you can see how a loop would develop.


There are some tokens that you can use on the rhs to help prevent this.


The left hand side of rewriting rules contains a pattern. Normal words are simply matched directly. You could also use a metasyntax character. Metasyntax is introduced using a dollar sign. The metasymbols are:


$* = Match zero or more tokens


$+ = Match one or more tokens


$- = Match exactly one token


$@ = Match exactly zero tokens


$~ = Match any single token not in a class


$& = Delay macro expansion until runtime


$=x = Match any phrase in class x


$~x = Match any word not in class x


If any of these match, they are assigned for replacement by the right hand side. For example, if the LHS:


$-:$+


which means, match exactly one token, match a colon symbol, and then match one or more tokens, is applied to the input:


UCBARPA:eric


the rule will match, and the values passed to the RHS will be:


$1 UCBARPA
$2 eric


When the left hand side of a rewriting rule matches, the input is deleted and replaced by the right hand side. When a pattern matching operator can match multiple tokens, Sendmail uses a minimal matching algorithm. Whatever is the least amount of tokens needed to make that match, that's what Sendmail will use. If a match is not made by the fewest number of tokens, then Sendmail will add the next token and try to make the match again until a match is made or it runs out of tokens.


Tokens are copied directly from the LHS unless they begin with a dollar sign. Metasymbols for the rhs include:


$n = copy by digit


$[name$] = Canonicalize name


$(map key $@arguments $:default $)


$>n = Call ruleset n


$#mailer = Resolve to mailer for a delivery agent triple


$@host = Rewrite and return or specify the host in a delivery agent triple


$:user - Rewrite once, or specify the user agent in a delivery agent triple


The copy by digit syntax substitutes the corresponding value from a $+, $-, $*, $=, or $~ match on the LHS. It may be used anywhere.


Using the example above if the rhs is


$2$1


then the right hand side becomes ericUCBARPA


A host name enclosed between $[ and $] is looked up in the host database(s) and replaced by the canonical name. For example, "$[ftp$]" might become "ftp.somecomputer.com" and "$[[128.32.130.2]$]" could become "mail.somecomputer.com." Sendmail recognizes its numeric IP address without calling the name server and replaces it with its canonical name. The $( ... $) syntax is a more general form of lookup; it uses a named map instead of an implicit map. If no lookup is found, the indicated default is inserted; if no default is specified and no lookup matches, the value is left unchanged. The arguments are passed to the map for possible use.


The $>n syntax causes the remainder of the line to be substituted as usual and then passed as the argument to ruleset n. The final value of ruleset n then becomes the substitution for this rule. The $> syntax expands everything after the ruleset name to the end of the replacement string and then passes that as the initial input to the ruleset. Recursive calls are allowed. For example,


$>0 $>3 $1


expands $1, passes that to ruleset 3, and then passes the result of ruleset 3 to ruleset 0. The $# syntax should only be used in ruleset zero, a subroutine of ruleset zero, or rulesets that return decisions (e.g., check_rcpt). It causes evaluation of the ruleset to terminate immediately, and sends a signal to Sendmail that the address has been completely resolved. The complete syntax for ruleset 0 is:


$# = mailer
$@ = host
$: = user


This specifies the {mailer, host, user} triple necessary to direct the mailer. If the mailer is local the host part may be omitted. The mailer must be a single word, but the host and user may be multi-part. If the mailer is the built-in IPC mailer, the host may be a colon-separated list of hosts that are searched in order for the first working address (exactly like MX records). The user is later rewritten by the mailer-specific envelope rewriting rule set and assigned to the $u macro. As a special case, if the mailer specified has the F=@ flag specified and the first character of the $: value is "@", the "@" is stripped off, and a flag is set in the address descriptor that causes Sendmail to not do ruleset 5 processing.


Normally, a rule that matches is retried, that is, the rule loops until it fails. A RHS may also be preceded by a $@ or a $: to change this behavior. A $@ prefix causes the ruleset to return with the remainder of the RHS as the value. A $: prefix causes the rule to terminate immediately, but the ruleset to continue; this can be used to avoid continued application of a rule. The prefix is stripped before continuing.


The $@ and $: prefixes may precede a $> spec; for example:


R$+ 	$: $>7 $1


matches anything, passes that to ruleset seven, and then continues; the $: is necessary to avoid an infinite loop.


Substitution occurs in the order described, that is, parameters from the LHS are substituted, hostnames are canonicalized, "subroutines" are called, and finally $#, $@, and $: are processed.


The rhs does not have to have anything to do with the lhs, this is up to you. The rhs could be completely independent of the lhs and you could simple rewrite all of your e-mail addresses to someaddress@somehost.com if you wanted.


The following diagram shows you how an e-mail address flows through the various rule sets.


                                      +-->Resolved Address 
                                     /
                           +---+    /      +---+   +---+
                           |   |   /       |   |   |   |
                  +------->+ 0 +--+   +--->+ 1 +-->+ S +----+
                 /         |   |     /     |   |   |   |     \      +---+
       +---+    /   +---+  +---+    /      +---+   +---+      \     |   |
       |   |   /    |   |          /                           \___>+ 4 +---->msg
addr-->+ 3 +--/---->+ D +---------/                            /    |   |
       |   |        |   |         \                           /     +---+
       +---+        +---+          \       +---+   +---+     /
                                    \      |   |   |   |    /
                                     \---->+ 2 +-->+ R +---+
                                           |   |   |   |
                                           +---+   +---+


Rewriting set semantics


D - sender domain addition
S - mailer-specific sender rewriting
R - mailer-specific recipient rewriting


Semantics of rewriting rule sets


There are six rewriting sets that have specific semantics. Five of these are related as depicted by the figure shown above.


Ruleset three should turn the address into "canonical form." This form should have the basic syntax:


local-part@host-domain


Sendmail applies ruleset three before doing anything with any address. If no "@" sign is specified, then the host-domain may be appended from the senders address.


Ruleset zero is applied after ruleset three to addresses that are going to actually specify recipients. It must resolve to a {mailer, host, address} triple. The mailer must be defined in the mailer definitions of the configuration file. The host is defined by the $h macro for use in the argument expansion of the specified mailer.


Rulesets one and two are applied to all sender and recipient addresses respectively. They are applied before any specification in the mailer definition.


If the mailer has a listing such as:


S=EnvFromSMTP/HdrFromSMTP


Then the lhs applies to the envelope while the rhs applies to the header. If only one entry exists then it applies to both the envelope and the header.


Ruleset four is applied to all addresses in the message. It is typically used to translate internal to external form.


In addition, ruleset 5 is applied to all local addresses (specifically, those that resolve to a mailer with the 'F=5' flag set) that do not have aliases. This allows a last minute hook for local names.


A few extra rulesets are defined as "hooks" that can be defined to get special features.


They are all named rulesets. The "check_*" forms all give accept/reject status; falling off the end or returning normally is an accept, and resolving to $#error is a reject or quarantine. Quarantining is chosen by specifying quarantine in the second part of the mailer triplet:


$#error $@ quarantine $: Reason for quarantine


Many of these can also resolve to the special mailer name $#discard; this accepts the message as though it were successful but then discards it without delivery.


Note, this mailer cannot be chosen as a mailer in ruleset 0. Note also that all "check_*" rulesets have to deal with temporary failures, especially for map lookups, themselves, i.e., they should return a temporary error code or at least they should make a proper decision in those cases.


Some of the following rule sets are omitted from your configuration file by default. For those, no hook is needed. You merely declare the rule set in your mc file and give it appropriate rules:


LOCAL_RULESETS
Scheck_vrfy
...your rules here


Consult Bat for examples on how to configure these into your mc file. See the check_eoh example below for an idea of how this would look in your mc file.


authinfo


The authinfo ruleset is called when sendmail tries to authenticate to another MTA. It should return $# followed by a list of tokens that are used for SMTP AUTH. If the return value starts with anything else it is silently ignored. Each token is a tagged string of the form: "TDstring" (including the quotes), where


T - Tag which describes the item
D - Delimiter: ':' simple text follows '=' string is base64 encoded
string - Value of the item


Valid values for the tag are:


I - authentication id
M - list of mechanisms delimited by spaces
P - password
R - realm
U - user (authorization) id


If this ruleset is defined, the option DefaultAuthInfo is ignored (even if the ruleset does not return a ''useful'' result).


check_compat


The check_compat ruleset is passed sender-address $| recipient-address where $| is a metacharacter separating the addresses. It can accept or reject mail transfer between these two addresses much like the checkcompat() function.


check_data


The check_data ruleset is called after the SMTP DATA command, its parameter is the number of recipients. It can accept or reject the command.


check_eoh


The check_eoh ruleset is passed number-of-headers $| size-of-headers where $| is a metacharacter separating the numbers. These numbers can be used for size comparisons with the arith map. The ruleset is triggered after all of the headers have been read. It can be used to correlate information gathered from those headers using the macro storage map. One possible use is to check for a missing header. For example:


LOCAL_CONFIG
Kstorage macro
HMessage-Id: $>ScreenMessageId

LOCAL_RULESETS
SscreenMessageId
R $*                        $: $(storage {GotMessageId} $@ YES $) $1

Scheck_eoh
R $*                        $: < $&{GotMessageId} >
R $*                        $: $(storage {GotMessageId} $) $1
R < YES >		    $@ OK
R < >	   	 	    $#error $@ 5.7.0 $: 533 Missing Header


Keep in mind the Message-Id: header is not a required header and is not a guaranteed spam indicator. This ruleset is an example and should probably not be used in production.


check_eom


The check_eom ruleset is used to check the message size.


check_etrn


The check_etrn ruleset is passed the parameter of the SMTP ETRN command. It can accept or reject the command.


check_expn


The check_expn ruleset is passed the user name parameter of the SMTP EXPN command. It can accept or reject the address.


check_rcpt


The check_rcpt ruleset is passed the recipient-sender parameter of the SMTP RCPT command. It can accept or reject the address.


check_vrfy and check_expn


The check_vrfy ruleset is passed the user name parameter of the SMTP VRFY command. It can accept or reject the command.


Local_check_mail and check_mail


The check_mail ruleset is passed the envelope-sender parameter of the SMTP MAIL command. It can accept or reject the address.


Local_check_relay and check_relay


The check_relay ruleset is called after a connection is accepted by the daemon. It is not called when sendmail is started using the -bs option. It is passed client.host.name $| client.host.address where $| is a metacharacter separating the two parts. This ruleset can reject connections based on hostname, domain or IP address. Note that it only checks the connecting SMTP client IP address and hostname. It does not check for third party message relaying. The check_rcpt ruleset discussed below usually does third party message relay checking.


Local_check_rcpt and check_rcpt


The check_rcpt ruleset is called immediately after the RCPT To: command.


This ruleset is passed the envelope-recipient address followed by a colon.


It performs the following checks:


Reject empy envelope-recipient addresses


Ensure that the envelope-recipient address is either local or one that is allowed to be relayed


If the access database is used it looks up the envelope-recipient's host in that database and reject, accept, or defer the message based on the looked up value.


queuegroup


The queuegroup ruleset is used to map a recipient address to a queue group name.


The input for the ruleset is a recipient address as specified by the SMTP RCPT command. The ruleset should return $# followed by the name of a queue group. If the return value starts with anything else it is silently ignored.


srv_features


The srv_features ruleset is called with the connecting client's host name when a client connects to sendmail. This ruleset should return $# followed by a list of options (single characters delimited by white space). If the return value starts with anything else it is silently ignored. Generally upper case characters turn off a feature while lower case characters turn it on. Option 'S' causes the server not to offer STARTTLS, which is useful to interact with MTAs/MUAs that have broken STARTTLS implementations by simply not offering it. 'V' turns off the request for a client certificate during the TLS handshake. Options 'A' and 'P' suppress SMTP AUTH and PIPELINING, respectively. 'c' is the equivalent to AuthOptions=p, i.e., it doesn't permit mechanisms susceptible to simple passive attack (e.g., PLAIN, LOGIN), unless a security layer is active. Option 'l' requires SMTP AUTH for a connection. Options 'B', 'D', 'E', and 'X' suppress SMTP VERB, DSN, ETRN, and EXPN, respectively.


A - Do not offer AUTH
a - Offer AUTH (default)
B - Do not offer VERB
b - Offer VERB (default)
D - Do not offer DSN
d - Offer DSN (default)
E - Do not offer ETRN
e - Offer ETRN (default)
L - Do not require AUTH (default)
l - Require AUTH
P - Do not offer PIPELINING
p - Offer PIPELINING (default)
S - Do not offer STARTTLS
s - Offer STARTTLS (default)
V - Do not request a client certificate
v - Request a client certificate (default)
X - Do not offer EXPN
x - Offer EXPN (default)


Note: the entries marked as ''(default)'' may require that some configuration has been made, e.g., SMTP AUTH is only available if properly configured. Moreover, many options can be changed on a global basis via other settings as explained in this document, e.g., via Daemon-PortOptions.


The ruleset may return '$#temp' to indicate that there is a temporary problem determining the correct features, e.g., if a map is unavailable. In that case, the SMTP server issues a temporary failure and does not accept email.


tls_client


The tls_client ruleset is called when sendmail acts as server, after a STARTTLS command has been issued, and from check_mail. The parameter is the value of ${verify} and STARTTLS or MAIL, respectively. If the ruleset does resolve to the "error" mailer, the appropriate error code is returned to the client.


tls_rcpt


The tls_rcpt ruleset is called each time before a RCPT TO command is sent. The parameter is the current recipient. If the ruleset does resolve to the "error" mailer, the RCPT TO command is suppressed (treated as non-deliverable with a permanent or temporary error). This ruleset allows you to require encryption or verification of the recipient's MTA even if the mail is somehow redirected to another host. For example, sending mail to luke@endmail.org may get redirected to a host named death.star and hence the tls_server ruleset won't apply. By introducing per recipient restrictions such attacks (e.g., via DNS spoofing) can be made impossible. See cf/README to see how this ruleset can be used.


tls_server


The tls_server ruleset is called when sendmail acts as client after a STARTTLS command (should) have been issued. The parameter is the value of ${verify}. If the ruleset does resolve to the "error" mailer, the connection is aborted (treated as non-deliverable with a permanent or temporary error).


trust_auth


The trust_auth ruleset is passed the AUTH= parameter of the SMTP MAIL command. It is used to determine whether this value should be trusted. In order to make this decision, the ruleset may make use of the various ${auth_*} macros. If the ruleset does resolve to the "error" mailer the AUTH= parameter is not trusted and hence not passed on to the next relay.


try_tls


The try_tls ruleset is called when sendmail connects to another MTA. If the ruleset does resolve to the "error" mailer, sendmail does not try STARTTLS even if it is offered. This is useful to interact with MTAs that have broken STARTTLS implementations by simply not using it.


RULESETS (* means built in to sendmail)

 0 *  Parsing
 1 *  Sender rewriting
 2 *  Recipient rewriting
 3 *  Canonicalization
 4 *  Post cleanup
 5 *  Local address rewrite (after aliasing)
1x    mailer rules (sender qualification)
2x    mailer rules (recipient qualification)
3x    mailer rules (sender header qualification)
4x    mailer rules (recipient header qualification)
5x    mailer subroutines (general)
6x    mailer subroutines (general)
7x    mailer subroutines (general)
8x    reserved
90    Mailertable host stripping
96    Bottom half of Ruleset 3 (ruleset 6 in old sendmail)
97    Hook for recursive ruleset 0 call (ruleset 7 in old sendmail)
98    Local part of ruleset 0 (ruleset 8 in old sendmail)

MAILERS

0    local, prog     local and program mailers
1    [e]smtp, relay  SMTP channel
2    uucp-*          UNIX-to-UNIX Copy Program
3    netnews         Network News delivery
4    fax             Sam Leffler's HylaFAX software
5    mail11          DECnet mailer

M4 DIVERSIONS

1    Local host detection and resolution
2    Local Ruleset 3 additions
3    Local Ruleset 0 additions
4    UUCP Ruleset 0 additions
5    locally interpreted names (overrides $R)
6    local configuration (at top of file)
7    mailer definitions
8    DNS based blacklists
9    special local rulesets (1 and 2)


Next Section: Mailers and Delivery Agents - 15 of 32



This Web Site Copyright © 1997 - 2010
by Alan Pae - All Rights Reserved