PhiloComp.net

The Signature Stylometric System

A User-Friendly System for Textual Analysis

Welcome to the home page of Signature, a program designed to facilitate "stylometric" analysis and comparison of texts, with a particular emphasis on author identification. The collage below on the right illustrates the sorts of task for which Signature can be used: comparing the styles of Jane Austen and other novelists; examining the "authorial signature" of the plays written by (or controversially attributed to) Shakespeare; establishing the provenance of ancient manuscripts such as the shared books of Aristotle's Ethics; identifying the author of the unattributed Federalist Papers; and investigating the relationships between Biblical scriptures (e.g. Did "Luke" write Acts? Did Paul write Hebrews?).

Do You Have an Interesting Stylometric Project?

Since the release of version 1.0 of Signature (which can be downloaded below), the system has been radically enhanced, with a wide range of new facilities, including:

  • More powerful file-handling and filtering tools
  • Ability to specify relevant alphabets and punctuation etc. for different languages/genres
  • Wordlist facilities extended to accommodate phrases of specified length(s)
  • Similar facilities for bigrams/trigrams etc.
  • Choice of keyness measures for key word/phrase identification
  • Fully automatic creation of frequent word/phrase lists
  • Automated monitoring of previously specified words
  • Powerful concordancer, enabling also punctuation and proximity searches etc.
  • Principal Component Analysis, applicable to all data types
  • Burrows' Delta analysis, applicable to all data types
  • Multiple chi-square analysis, applicable to all data types
  • Main parameters of all facilities easily configurable
  • Comprehensive help and theoretical documentation

Investigation is also under way to test the feasibility of incorporating grammatical analysis into the concordancer, so as to enable grammar-informed searching etc. If this proves feasible, the concordancer will also be further integrated with the graphing and data analysis facilities.

Later versions of Signature were used in the high-profile investigations linked from this page (on the Obama autobiography and the J.K. Rowling pseudonym issue) – some of the facilities can be seen in the reports provided, and in the recordings of news programs. But the new Signature will not be made publicly available until the possibilities notes above have been investigated, and until the system is fully polished, tested, and documented. However if you have an interesting project for which you think the more powerful version of Signature would be useful, you are welcome to get in touch with Peter Millican and suggest it.

Download Signature 1.0

This program is freeware for educational use, but please respect the copyright, and ensure that if you pass it on you do so without charge, make clear its authorship, and leave all documentation intact. The program is provided in two forms, first as a standard ZIP archive, and then as a self-extracting ZIP file. In both cases it is packaged together with the Federalist papers, collated by known author, to serve as sample texts for getting started:

This is a very basic version, both in terms of features and documentation. But it provides a useful first step in stylometry, and the tests that it includes can be used to recreate many classic analyses.

Using the System

Having downloaded the ZIP archive, extract it into an appropriate directory (e.g. "C:Signature") and start the system by running the file "Signature.exe".

Signature screenshot

Documentation

A PowerPoint presentation is provided in the package, to give a straightforward introduction to the ideas of stylometric analysis and the Signature system in a manner suitable for private study, or a taught course on literary computing. Use PowerPoint to print out handouts (six slides per page) for a useful quick-reference guide:

PowerPoint presentation: Introduction to Textual Analysis using Signature.

Full documentation will in due course be provided in a comprehensive Help file, which is currently in preparation.

Prepared Textual Resources

Although Signature can operate on standard text and HTML files, it is often desirable to prepare these for use appropriately (e.g. by enclosing metadata in "<...>" tag brackets, so as to exclude it from the analysis). This particularly applies to files from the Gutenberg Project, which are otherwise extremely useful for the purpose, but which have extensive front/back matter that needs to be marked out if it is not to distort the stylometric results. The following files contain small archives of pre-prepared files, most of them deriving from the Gutenberg archives: