KinoSearch1::Analysis:UsereContributed Perl DocKinoSearch1::Analysis::Token(3)NAMEKinoSearch1::Analysis::Token - unit of text
SYNOPSIS
# private class - no public API
PRIVATE CLASS
You can't actually instantiate a Token object at the Perl level --
however, you can affect individual Tokens within a TokenBatch by way of
TokenBatch's (experimental) API.
DESCRIPTION
Token is the fundamental unit used by KinoSearch1's Analyzer
subclasses. Each Token has 4 attributes: text, start_offset,
end_offset, and pos_inc (for position increment).
The text of a token is a string.
A Token's start_offset and end_offset locate it within a larger text,
even if the Token's text attribute gets modified -- by stemming, for
instance. The Token for "beating" in the text "beating a dead horse"
begins life with a start_offset of 0 and an end_offset of 7; after
stemming, the text is "beat", but the end_offset is still 7.
The position increment, which defaults to 1, is a an advanced tool for
manipulating phrase matching. Ordinarily, Tokens are assigned
consecutive position numbers: 0, 1, and 2 for "three blind mice".
However, if you set the position increment for "blind" to, say, 1000,
then the three tokens will end up assigned to positions 0, 1, and 1001
-- and will no longer produce a phrase match for the query '"three
blind mice"'.
COPYRIGHT
Copyright 2006-2010 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch1 version 1.01.
perl v5.14.1 2011-06-20 KinoSearch1::Analysis::Token(3)