Continued from page 1
Bayesian Filters
Named after Thomas Bayes, an English mathematician, Bayesian Logic is used in decision making and inferential statistics. Bayesian Filers maintain a database of known spam and ham, or legitimate e-mail. Once database is large enough, system ranks words according to probability they will appear in a spam message.
Words more likely to appear in spam are given a high score (between 51 and 100), and words likely to appear in legitimate e-mail are given a low score (between 1 and 50). For example, words “free” and “sex” generally have values between 95 and 98, whereas words “emphasis” or “disadvantage” may have a score between 1 and 4.
Commonly used words such as “the” and “that”, and words new to Bayesian filters are given a neutral score between 40 and 50 and would not be used in system’s algorithm.
When system receives an e-mail, it breaks message down into tokens, or words with values assigned to them. The system utilizes tokens with scores on high and low end of range and develops a score for e-mail as a whole. If e-mail has more spam tokens than ham tokens, e-mail will have a high spam score. The e-mail administrator determines a threshold score system uses to allow e-mail to pass through to users.
Bayesian filters are effective at filtering spam and minimizing false positives. Because they adapt and learn based on user feedback, Bayesian Filers produce better results as they are used within an organization over time.
Bayesian filters are not, however, foolproof. Spammers have learned which words Bayesian Filters consider spammy and have developed ways to insert non-spammy words into e-mails to lower message’s overall spam score. By adding in paragraphs of text from novels or news stories, spammers can dilute effects of high-ranking words. Text insertion has also caused normally legitimate words that are found in novels or news stories to have an inflated spam score. This may potentially render Bayesian filters less effective over time.
Another approach spammers use to fool Bayesian filters is to create less spammy e-mails. For example, a spammer may send an e-mail containing only phrase, “Here’s link…”. This approach can neutralize spam score and entice users to click on a link to a Web site containing spammer’s message. To block this type of spam, filter would have to be designed to follow link and scan content of Web site users are asked to visit. This type of filtering is not currently employed by Bayesian filters because it would be prohibitively expensive in terms of server resources and could potentially be used as a method of launching denial of service attacks against commercial servers.
As with all single-method spam filtering methodologies, Bayesian filters are effective against certain techniques spammers use to fool spam filters, but are not a magic bullet to solving spam problem. Bayesian filters are most effective when combined with other methods of spam detection.
The Solution
When used alone, each anti-spam technique has been systematically overcome by spammers. Grandiose plans to rid world of spam, such as like charging a penny for each e-mail received or forcing servers to solve mathematical problems before delivering e-mail, have been proposed with few results. These schemes are not realistic and would require a large percentage of population to adopt same spam eradication method in order to be effective.
Working alone, each individual spam-blocking technique works with varying degrees of effectiveness and is susceptible to a certain number of false positives. Fortunately, solution is already at hand. IronMail®, secure e-mail gateway appliance from CipherTrust®, provides a highly accurate solution by correlating results of single-detection techniques with its industry-leading correlation engine, Spam Profiler™.
Learn more about stopping spam by requesting CipherTrust’s free whitepaper, “Controlling Spam: The IronMail Way”.
The core of IronMail’s spam capabilities, Spam Profiler analyzes, inspects and scores e-mail on over one thousand different message characteristics. Each method is weighed based on historical accuracy rates and analysis by CipherTrust’s experienced research team.
Optimizing Spam Profiler requires precise calibration and testing thousands of combinations of values associated with various message characteristics. To automate this process, CipherTrust developed Genetic Optimization™, an advanced analysis technique that replicates cutting-edge DNA matching models. Genetic Optimization identifies best possible combination of values for all characteristics examined by Spam Profiler and automatically tunes IronMail appliance, reducing administrator intervention and assuring optimum protection against spam and spam-born threats.
Take The Next Step
Learn more about how IronMail can secure enterprise e-mail systems by visiting www.ciphertrust.com or requesting CipherTrust’s free whitepaper, “Controlling Spam: The IronMail Way”. This resource will provide information you need to make an informed decision about eliminating spam and securing your e-mail systems.
CipherTrust is the leader in anti-spam and email security. Learn more by downloading our free whitepaper, “Controlling Spam: The IronMail Way” or by visiting www.ciphertrust.com.