AGAVA SpamProtexx
Features
HOW WILL SPAMPROTEXX
SAVE MY TIME?

SpamProtexx does not delete your email. How does it save your time then?

We have run a few tests, and we are confident that the major problem of Spam is not its overall quantity, but the fact that it comes in minute by minute and distracts you from your current occupation.

Read more on the usage strategy that we propose...

Advanced Algorithm & Database Features

If you had already used baesian-based spam filters before, you might be interested in this section. Most implementations of the baesian concept have a number of illnesses, which have been knowingly addressed in SpamProtexx.

1. Training Error Sensitivity. If you submit a message for training into the wrong category (spam to non-spam or non-spam to spam) most implementations would add changes to the filtering database that would dramatically spoil the classification quality: messages that you had already trained on, would fail to classify correctly.

SpamProtexx uses a know-how mechanism, that monitors the health state of the database and removes such effects. Moreover, if you simply correct your error by sending the same message for training to the proper class, the result of the previous training will be cancelled.

2. Over-Training. Spams often come in bunches of the same class — thus, sometimes you would submit a few samples of virtually the same type of spam. Most filters would train on these and the coefficients would tend to be very high, which will lead to so-called over-training of the database.

SpamProtexx solves this problem by trying to classify each message before training on it. If it can classify it correctly, it does not accept it for training. This ensures that the database does not over-train on a certain type of messages.

3. HTML Tags. Traditional baesian filters tend to generate a lot of false positives on HTML mail. This happens because a lot of spams come in as HTML and when such messages are submitted for training, standard HTML tags become spam words.

SpamProtexx uses an HTML parser to eliminate the influence of common HTML tags on email classification. Instead, SpamProtexx would pay attention to the properties of these tags — fonts, paragraphs, body, images, etc. For instance, SpamProtexx would be able to detect that spams offen arrive in fonts of a certain color, size and font type.

4. Message Headers. Messages are often short. They might contain only a few words in the Body or even in the Subject line only. Some filters would fail to classify such messages correcty, because they need a lot of content in the Body to make their decision.

SpamProtexx uses full message headers (over 1-2 kilobytes of information!) for its classification purposes. There’s a lot of data in the headers that helps SpamProtexx make the right decision. Even very short messages can be classified correctly by SpamProtexx.

5. Common Words. There are a lot of words (such as articles and prepositions) in natural languages that are not specific to either spam or non-spam messages. The result is a decrease in classification quality simply because usually a spam filter is mostly trained on spam samples rather than non-spam samples.

SpamProtexx has a stop-list for such words, that are not used for classification purposes.

The world’s simplest spam filter.
Copyright © 2004 SoftProtexx.
All rights Reserved.