Mail::SpamAssassin::Plugin::TextCat



Mail::SpamAssassin::Plugin::TextCat

NAME
SYNOPSIS
DESCRIPTION
USER OPTIONS

NAME

Mail::SpamAssassin::Plugin::TextCat − TextCat language guesser

SYNOPSIS

  loadplugin     Mail::SpamAssassin::Plugin::TextCat

DESCRIPTION

This plugin will try to guess the language used in the message body text.

You can use the "ok_languages" directive to set which languages are considered okay for incoming mail and if the guessed language is not okay, "UNWANTED_LANGUAGE_BODY" is triggered.

It will always add the results to a "X−Language" name-value pair in the message metadata data structure. This may be useful as Bayes tokens and can also be used in rules for scoring. The results can also be added to marked-up messages using "add_header", with the _LANGUAGES_ tag. See Mail::SpamAssassin::Conf for details.

Note: the language cannot always be recognized with sufficient confidence. In that case, no action is taken.

USER OPTIONS

ok_languages xx [ yy zz ... ] (default: all)

This option is used to specify which languages are considered okay for incoming mail. SpamAssassin will try to detect the language used in the message body text.

Note that the language cannot always be recognized with sufficient confidence. In that case, no action is taken.

The rule "UNWANTED_LANGUAGE_BODY" is triggered if none of the languages detected are in the "ok" list. Note that this is the only effect of the "ok" list. It does not act as a whitelist against any other form of spam scanning.

In your configuration, you must use the two or three letter language specifier in lowercase, not the English name for the language. You may also specify "all" if a desired language is not listed, or if you want to allow any language. The default setting is "all".

Examples:

  ok_languages all         (allow all languages)
  ok_languages en          (only allow English)
  ok_languages en ja zh    (allow English, Japanese, and Chinese)

Note: if there are multiple ok_languages lines, only the last one is used.

Select the languages to allow from the list below:
af − Afrikaans
am − Amharic
ar − Arabic
be − Byelorussian
bg − Bulgarian
bs − Bosnian
ca − Catalan
cs − Czech
cy − Welsh
da − Danish
de − German
el − Greek
en − English
eo − Esperanto
es − Spanish
et − Estonian
eu − Basque
fa − Persian
fi − Finnish
fr − French
fy − Frisian
ga − Irish Gaelic
gd − Scottish Gaelic
he − Hebrew
hi − Hindi
hr − Croatian
hu − Hungarian
hy − Armenian
id − Indonesian
is − Icelandic
it − Italian
ja − Japanese
ka − Georgian
ko − Korean
la − Latin
lt − Lithuanian
lv − Latvian
mr − Marathi
ms − Malay
ne − Nepali
nl − Dutch
no − Norwegian
pl − Polish
pt − Portuguese
qu − Quechua
rm − Rhaeto-Romance
ro − Romanian
ru − Russian
sa − Sanskrit
sco − Scots
sk − Slovak
sl − Slovenian
sq − Albanian
sr − Serbian
sv − Swedish
sw − Swahili
ta − Tamil
th − Thai
tl − Tagalog
tr − Turkish
uk − Ukrainian
vi − Vietnamese
yi − Yiddish
zh − Chinese (both Traditional and Simplified)
zh.big5 − Chinese (Traditional only)
zh.gb2312 − Chinese (Simplified only)

inactive_languages xx [ yy zz ... ] (default: see below)

This option is used to specify which languages will not be considered when trying to guess the language. For performance reasons, supported languages that have fewer than about 5 million speakers are disabled by default. Note that listing a language in "ok_languages" automatically enables it for that user.

The default setting is:
bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi

That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian, Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian, Latvian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish.

textcat_max_languages N (default: 3)

The maximum number of languages before the classification is considered unknown.

textcat_optimal_ngrams N (default: 0)

If the number of ngrams is lower than this number then they will be removed. This can be used to speed up the program for longer inputs. For shorter inputs, this should be set to 0.

textcat_max_ngrams N (default: 400)

The maximum number of ngrams that should be compared with each of the languages models (note that each of those models is used completely).

textcat_acceptable_score N (default: 1.02)

Include any language that scores at least "textcat_acceptable_score" in the returned list of languages.






Opportunity


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.





Free Software


Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.


Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.





Free Books


The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.


Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.





Education


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.


Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.