Last modified: March 08, 2011
Contents
The toolkit defines a number of free function which are not image methods. These are defined in different modules and can be imported in a python script with
from gamera.toolkits.greekocr.compare import levenshtein, errorrate
from gamera.toolkits.greekocr.unicode_teubner import unicode_to_teubner
Returns the given unicode string to a LaTeX document body using the Teubner style for representing Greek characters and accents. Signature:
unicode_to_teubner (unicode_text)
The returned LaTeX code does not contain the LaTeX header. To create a complete LaTeX document, you can use the following code:
# LaTeX header
print "\documentclass[10pt]{article}"
print "\usepackage[polutonikogreek]{babel}"
print "\usepackage[or]{teubner}"
print "\\begin{document}"
print "\selectlanguage{greek}"
# document body
print unicode_to_teubner(unicode_string)
# LaTex footer
print "\end{document}"
Computes the Levenshtein distance (aka edit distance) between the two Unicode strings s1 and s2. Signature:
levenshtein(s1, s2)
This implementation differs from the plugin edit_distance in the Gamera core in two points:
- the Gamera core function is implemented in C++ and currently only works with ASCII strings
- this implementation is written in pure Python and therefore somewhat slower, but it works with Unicode strings.
For details about the algorithm see http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Levenshtein_distance
For the two given Unicode strings, the edit distance divided by the length of the first string is returned. Signature:
errorrate(groundtruth, ocr)