Setting Filter Factory parameters

The Filter Factory parameters are loaded through the FilterFactory.default.xml configuration file.

The list of default filters in the configuration file are:

Each filter specifies a number of settings, which determine which class is loaded for the filter, which paragraph extractor is used, and the MIME types to which the filter applies. Table 4-12 shows the filter setting parameters.

Table 4-12: Filter settings

Parameter

Default

Description

className

None

The Java class that defines the filter.

extractorClassName

None

The Java class used for extracting paragraphs from the filtered text.

mimeTypes

None

The list of MIME types that are associated with the filter.

timeout

45,000

Indicates the time in milliseconds the filter waits while filtering a document. If the filter exceeds the given time, the filter aborts. This parameter is used mainly by the Stellent filter.

keepTempFiles

false

If set to true, the filter keeps any temporary files produced during the filtering process. This is used mainly by the Stellent filter.

In addition to the filter-specific settings, there are a number of general filter settings that help the extractors determine the paragraphs. The filter ensures that each paragraph is between the minimum and maximum lengths and aims for the ideal paragraph length.

Table 4-13 shows the paragraph length settings.

Table 4-13: Paragraph settings

Parameter

Default

Description

default.minParaLen

250

The minimum number of characters in a paragraph

default.idealParaLen

500

The ideal number of characters in a paragraph

default.maxParaLen

1,000

The maximum number of characters in a paragraph