The DFHG Witnesses Catalog - Text Reuse Detection is a tool developed by the DFHG Project for automatic text reuse detection of Greek fragmentary historians collected in the Fragmenta Historicorum Graecorum (FHG) by Karl Müller. This tool is complementary to the other tools of the DFHG Project.
The tool allows users to automatically detect text reuses in source texts by inserting an XML file URL or selecting one of the PerseusDL/OGL editions (leave the XML file URL field empty and press Submit to obtain the PerseusDL/ OGL editions list). Detected source text XML files can be downloaded and include a DFHG attribute marking up the reuse. A warning message informs users about Perseus and OGL XML files that are not any more available in their repositories.
PerseusDL is the Perseus Digital Library collection of Greek and Latin texts (GitHub). OGL is the Open Greek and Latin collection, which includes also the First One-Thousand Years of Greek texts (GitHub).
The "text reuse detection" functionality is based on the Smith–Waterman algorithm that performs local sequence alignment to detect similarities between strings. Smith-Waterman has been used for sequencing DNA, and for detecting plagiarism and collusion by comparing sequences of text.
The Text Reuse Detection enables users to select the following entries of the DFHG Witnesses Catalog: