WHAT-IS . NET
Information and answers to all your common and special questions.
Copyright ©2009 What-is.Net  All rights reserved.
Last Updated: Sep 2009
What are Spam Filters?
Regardless of how they calculate probabilities, these new statistical filters all share some important benefits:

1. They're very effective. Even the simplest statistical filter will catch 99% of current spam. The most effective filter I know of, Bill Yerazunis' CRM114, catches 99.8%. (Mine is lagging behind at about 99.7%.)

2. They generate few false positives. False positives, legitimate emails that are mistakenly treated as spam, are the bane of spam filtering. Statistical filters yield fewer false positives because they consider evidence of innocence as well as evidence of guilt. A token that occurs disproportionately often in your nonspam mail, like the name of a friend, will count as much toward decreasing the spam probability as a token like "cash" would to increasing it.

3. They learn. You don't have to look through piles of spam and figure out rules to identify them. Whatever's in there, the filters tend to find it. Like us, statistical filters notice that the token "cash" is sign of spam. However, they also notice that "modalities" (used in a surprisingly high proportion of Nigerian spams) and "FF0000" (html for bright red) are even better signs of spam. And as spammers change their messages or their infrastructure, the filters adapt.

4. They let each user define what's spam. Although statistical filters could be used at the network level, ideally the probabilities should be calculated individually for each user. To the extent users' definitions of spam differ, their inboxes will reflect this.

5. They're hard to trick. There are only two ways to get past a statistical filter: use fewer bad words, or use more innocent words. Spammers can't do the latter, because the most innocent words (words related to your friends and family, your work, your interests) vary for each user. So they have to use fewer bad words. They can't use weird spellings (e.g. "Freee" instead of "Free") because filters quickly learn those. Their only option is to use vaguer and vaguer euphemisms, or simply to have some generic sounding text, and a link.
Spam filters are software programs that sort incoming mail in order to identify and pull out junk mail, also known as spam. Spam filters can be installed on Internet mail servers, on private network servers, or on personal computers. Spam is not only bothersome but can be used for spreading malicious code like viruses and Trojans, and for perpetuating phishing scams. For these reasons and more, a spam filter is a great way to help protect your computer or network and cut out junk mail.

The first generation of spam filters used rules to recognize specific spam features. Now a new generation of statistical spam filters seems to offer significantly better performance. Statistical filters look at the entire contents of each incoming email and decide whether it's spam based on its overall similarity to previous spams. This new kind of filter routinely catches over 99% of current spam with near zero false positives.

The simplest statistical filter can be described in a paragraph. Users discard all their spam in a separate trash can. At intervals, a program looks through all the user's email and, for each token, calculates the ratio of spam occurrences to total occurrences.
Spam Filters
Return to HOME Page