And How to Stop Them
Effectively stopping spam over
long-term requires much more than blocking individual IP addresses and creating rules based on keywords that spammers typically use. The increasing sophistication of tools spammers use coupled with
increasing number of spammers in
wild has created a hyper-evolution in
variety and volume of spam. The old ways of blocking
bad guys just don’t work anymore.
Examining spam and spam-blocking technology can illuminate how this evolution is taking place and what can be done to combat spam and reclaim e-mail as
efficient, effective communication tool it was intended to be.
There are several widely-used methods for filtering spam, each of which can be defeated by spammers to some degree. Understanding
strengths and weaknesses of each approach and
methods spammers use to defeat them is
basis of an effective, comprehensive anti-spam strategy.
Signature-based Filters
Signature-based filters examine
contents of known spam, usually derived from honey pots, or dummy e-mail addresses set up specifically to collect spam. Once a honey pot receives a spam message,
content is examined and given a unique identifier. The unique identifier is obtained by assigning a value to each character in
e-mail. Once all characters have been assigned a value,
values are totaled, creating
spam’s signature. The signature is added to a signature database and sent as a regular update to
e-mail service’s subscribers. The signature is compared to every e-mail coming in to
network and all matching messages are discarded as spam.
The benefit of signature-based filters is that they rarely produce false-positives, or legitimate e-mail incorrectly identified as spam.
The drawback of signature-based filters is that they are very easy to defeat. Because they are backward-looking, they only deal with spam that has already been sent. By
time
honey pot receives a spam message,
system assigns a signature, and
update is sent and installed on
subscribers’ network,
spammer has already sent millions of e-mails. A slight modification of
e-mail message will render
existing signature useless.
Furthermore, spammers can easily evade signature-based filters by using special e-mail software that adds random strings of content to
subject line and body of
e-mail. Because
variable content alters
signature of each e-mail sent by
spammer, signature-based spam filters are unable to match
e-mail to known pieces of spam.
Developers of signature-based spam filters have learned to identify
tell-tale signs of automated random character generation. But as is often
case, spammers remain a step ahead and have developed more sophisticated methods for inserting random content. As a result, most spam continues to fool signature-based filters.
Rule-based (Heuristic) Filtering
Rule-based filters scan e-mail content for predetermined words or phrases that may indicate a message is spam. For example, if an e-mail administrator includes
word “sex” on a company’s rule-based list, any e-mail containing this word will be filtered.
The major drawback of this approach is
difficulty in identifying keywords that are consistently indicative of spam. While spammers may frequently use
words “sex” and ‘Viagra” in spam e-mails, these words are also used in legitimate business correspondence, particularly in
healthcare industry. Additionally, spammers have learned to obfuscate suspect words by using spellings such as “S*E*X”, or “VI a a GRR A”.
It is impossible to develop dictionaries that identify every possible misspelling of “spammy” keywords. Additionally, because filtering for certain keywords produces large numbers of false positives, many organizations have found they cannot afford to rely solely on rule-based filters to identify spam.
Blacklists
The goal of blacklisting is to force Internet Service Providers (ISPs) to crack-down on customers who send spam. A blacklisted ISP is blocked from sending e-mail to organizations. When an ISP is blacklisted, they are provided with a list of actions they must take in order to be removed from
blacklist. This controversial method blocks not just
spammers, but all of
ISP’s customers. Blacklisting is generally considered an unfriendly approach to stopping spam because
users most affected by
blacklist are e-mail users who do not send spam. Many argue blacklisting actually damages
utility of e-mail more than it helps stop spam since
potential for blocking legitimate e-mail is so high.
In addition to
ethical considerations, there are other problems with blacklists. Many blacklists are not updated frequently enough to maintain effectiveness. Some blacklist administrators are irresponsible in that they immediately block suspect servers without thoroughly investigating complaints or giving
ISP time to respond. Another downside is that blacklists are not accurate enough to catch all spam. Only about half of servers used by spammers, regardless of how diligent
blacklist administrator may be, are ever cataloged in a given blacklist.
Blacklists are used because they can be partially effective against spammers who repeatedly use
same ISP or e-mail account to send spam. However, because spammers often change ISPs, re-route e-mail and hijack legitimate servers,
spammer is a moving target. Blacklist administrators are forced to constantly revise lists, and
lag-time between when a spammer begins using a given server and when
blacklist administrator is able to identify
new spam source and add it to
blacklist allows spammers to send hundreds of millions of e-mails. Spammers consider this constant state of flux a part of doing business and are constantly looking for new servers to send spam messages.
Blacklists, therefore, have some utility in stopping known spammers. Because of their limitations, however, this data should only be used in conjunction with other sources to determine if a given message is spam.
Whitelists
Whitelists are databases of trusted e-mail sources. The list may contain specific e-mail addresses, IP addresses or trusted domains. E-mails received from a whitelisted source are allowed to pass through
system to
user’s email box. The list is built when users and e-mail administrators manually add trusted sources to
whitelist. Once built,
catch-rate for spam can be close to 100%, however, whitelists produce an inordinate number of false positives.
It is virtually impossible to produce an exhaustive list of all possible legitimate e-mail senders because legitimate e-mail can come from any number of sources. To get around this difficulty, some organizations have instituted a challenge-response methodology. When an unknown sender sends an e-mail to a user’s account,
system automatically sends a challenge back to
sender. Some challenge-response systems require
sender to read and decipher an image containing letters and numbers. The image is designed to be unreadable by a machine, but easily recognizable by a human. Spammers would not spend
time required to go through a large number of challenge-response e-mails, so they drop
address and move on to those users who don’t use such a system.
Whitelists are only partially successful and impractical for many users. For example, problems can arise when users register for online newsletters, order products online or register for online services. If
user does not remember to add
new e-mail source to their whitelist, or if
domain or source is entered incorrectly,
communication will fail. Additionally, whitelists impose barriers to legitimate e-mail communication and are viewed by some as just plain rude.
Whitelists are not widely used by e-mail users and administrators as a primary tool to fight spam because of
high number of false positives, and
difficulties in creating a comprehensive list of e-mail sources. Because whitelists are not widely used, spammers typically do not develop countermeasures. As with other spam fighting techniques, whitelists are most effective when used in conjunction with other anti-spam tools.