Creating a Passive Document Store

Add one or more collections of documents from the Internet or intranets with the help of a Web robot.

The Web robot manages the download of Web content and sends it to the passive document store, which indexes it and makes it available for searching.

  1. In the Enterprise Explorer, expand Sybase Search Servers > (Sybase Search Connection Profile) > Sybase Search Server > Document Stores.
  2. Right-click Internet Import Manager and select New Web Robot.
  3. Enter appropriate values for the fields that appear in the New Passive Stores wizard:
    Table 1. Create Passive Document Store

    Field

    Description

    Name

    Enter the name of the Web robot.

    Robot Manager

    Select the robot manager to host the Web robot.

    Document Group Nonmembers

    Select the document group you want the new passive document store to be associated with.

    Document Group Members

    Lists the document groups in which the document store is a member.

    Start URL

    Enter the URL you want the Web robot to visit first.

    Extract Pattern

    Populate the Link Extractor Pattern and Index Pattern fields based on the start URL you have entered.

    Link Extractor Pattern

    Download and extract any pages for which the URL matches one of these patterns. These pages are placed in the URL (work) queue.

    Index Pattern

    Index the pages downloaded from URLs that match one of these patterns.

    Link Extractor Pattern Exceptions

    List the exceptions to the general rules specified in the Link Extractor Patterns field.

    Index Pattern Exceptions

    List the exceptions to the general rules specified in the Index Pattern field.

    Regular Expression

    Select to process the input as a regular expression.

  4. (Optional) Click Next to enter the HTTP authentication and agent details.

    Field

    Description

    Agent details

    User Agent

    Enter the user agent name that corresponds to the HTTP user agent request header. This value is sent with all HTTP requests.

    Timeout values: Ranges between 1 to 60 seconds.

    Courtesy

    Enter the time, in seconds, you want the Web robot to wait between successful HTTP requests.

    Error

    Enter the time, in seconds, you want the Web robot to wait between unsuccessful HTTP requests. This is typically slightly longer than the courtesy timeout value, to allow the network and target Web server time to recover before the next attempt.

    Connect

    Enter the time, in seconds, you want the Web robot to wait to connect to the target Web server.

    Read

    Enter the time, in seconds, you want the Web robot to wait on a connection to receive a response.

    Maximum values

    Download Pages

    Enter the maximum number of pages you want the Web robot to download before it auto-terminates and saves what it has searched so far.

    Duration

    Enter the maximum length of time you want the Web robot to spend downloading before it auto-terminates and saves what it has searched so far. Since this amount of time may extend into days, you must specify it in as an ISO 8601 format.

    Consecutive Fails

    Enter the maximum number of consecutive failures you want the Web robot to handle before it auto-terminates and saves what it has searched so far.

    Page Tries

    Enter the maximum number of times you want the Web robot to attempt to download any Web page. Setting Page Tries to a higher value enables Web robots to overcome temporary network or Web server failures.

    HTTP authentication details

    URL (prefix)

    Enter the prefix to the URLs that require authentication, for example, http://example.net/protected/.

    Realm

    Enter the name of the realm, if applicable.

    Username

    Enter the user name required for authentication.

    Password

    Enter the password required for authentication.

    Form authentication details

    Authenticating URL

    Enter the URL you want to perform the authentication. This is the URL where the HTML form is submitted.

    Method

    Select the request method.

    Username Key Value

    Indicates the form input field that represents the user name. For example, username, uname, or usr.

    Username Value

    Enter the user name value. For example, jsmith.

    Password Key Value

    Indicates the form input field that represents the password. For example, password, passwd, or pwd.

    Password Value

    Enter the password value.

    Default Page Names

    Enter the default Web page names that you want the Web robot to match with the target Web server’s welcome file list. For example, index.html, index.jsp.

    This enables the Web robot to determine that the following URLs are equivalent and only one version needs to be indexed:
    • http://example.net/
    • http://example.net/index.html
  5. Click Finish to save the changes and create the passive store.
Related concepts
Document Stores
Related tasks
Creating a File System Document Store
Creating a Database Document Store
Viewing Document Store Metrics
Editing a Passive Document Store
Deleting a Passive Document Store

Send your feedback on this help topic to Sybase Technical Publications: pubs@sybase.com

Your comments will be sent to the technical publications staff at Sybase, Inc. For product-related issues or technical support, contact Sybase Technical Support at 1-800-8SYBASE.