Configuring MPFs

The MPF classes utilize a strategy to best compress all the paragraphs from documents, favoring documents of average length (where the average length is implied from the MPF configuration). Each paragraph is written to disk in one of two ways:

The first technique is employed initially, as the compression scheme works better with more data—thus the paragraphs take up less space on disk. The second technique is employed when the paragraph group allocation is exhausted.

The paragraphs are not all written together, as it is often necessary to read individual paragraphs from disk (and compressing all the paragraphs together forces the application to read and decompress all paragraphs to access the sole paragraph required). The grouping provides a balance between data compression and disk I/O.

The number of paragraphs in any one paragraph group is not fixed; groups accept new paragraphs until the data buffer’s soft limit is reached. “Soft” indicates that a limit can be exceeded, but the group is then closed. The ideal scenario is when all the paragraphs from a document fit exactly within the allocated number of paragraph groups. Unused paragraph groups result in redundancy.

You can configure the paragraph grouping using the MPF parameters shown in Table 4-30. The MPF parameters are defined for all document stores in a container and are set in the main container file Container.<uid>.xml file.

Table 4-30: MPF parameters

Parameter

Default

Description

omniq.index.mpf.docsPerFile

20

The number of documents stored in each MPF.

omniq.index.mpf.filesPerFolder

250

The number of MPFs stored in each folder.

omniq.index.mpf.foldersPerFolder

50

The number of MPF folders stored per folder.

omniq.index.mpf.maxParagraphGroups

5

The maximum number of paragraph groups to allocate per document.

omniq.index.mpf.maxTotalGroupEntries

50

The maximum number of paragraphs from any one document that can be in a paragraph group.

omniq.index.mpf.bufferSoftLimit

8192

The ideal number of bytes an uncompressed paragraph group can consume before it is closed, compressed, and written to disk. This limit is usually slightly exceeded by design.