unicharambigs man page on DragonFly

unicharambigs man page on DragonFly

Man page or keyword search:
man Server 44335 pages
apropos Keyword Search (all sections)
Output format

UNICHARAMBIGS(5)					      UNICHARAMBIGS(5)

NAME
       unicharambigs - Tesseract unicharset ambiguities

DESCRIPTION
       The unicharambigs file (a component of traineddata, see
       combine_tessdata(1) ) is used by Tesseract to represent possible
       ambiguities between characters, or groups of characters.

       The file contains a number of lines, laid out as follow:

	   [num] <TAB> [char(s)] <TAB> [num] <TAB> [char(s)] <TAB> [num]

       Field one     the number of characters
		     contained in field two

       Field two     the character sequence to
		     be replaced

       Field three   the number of characters
		     contained in field four

       Field four    the character sequence
		     used to replace field two

       Field five    contains either 1 or 0. 1
		     denotes a mandatory
		     replacement, 0 denotes an
		     optional replacement.

       Characters appearing in fields two and four should appear in
       unicharset. The numbers in fields one and three refer to the number of
       unichars (not bytes).

EXAMPLE
	   2	   ' '	   1	   "	 1
	   1	   m	   2	   r n	 0
	   3	   i i i   1	   m	 0

       In this example, all instances of the 2 character sequence '' will
       always be replaced by the 1 character sequence "; a 1 character
       sequence m may be replaced by the 2 character sequence rn, and the 3
       character sequence may be replaced by the 1 character sequence m.

HISTORY
       The unicharambigs file first appeared in Tesseract 3.00; prior to that,
       a similar format, called DangAmbigs (dangerous ambiguities) was used:
       the format was almost identical, except only mandatory replacements
       could be specified, and field 5 was absent.

BUGS
       This is a documentation "bug": it’s not currently clear what should be
       done in the case of ligatures (such as fi) which may also appear as
       regular letters in the unicharset.

SEE ALSO
       tesseract(1), unicharset(5)

AUTHOR
       The Tesseract OCR engine was written by Ray Smith and his research
       groups at Hewlett Packard (1985-1995) and Google (2006-present).

				  02/09/2012		      UNICHARAMBIGS(5)

[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome