top of page
A_Young_Man_Reading_by_Candlel-moshed-07-06-00-19-55-270.gif

Automatic Text Indexer

[The Indexer opens in a new window]

​​​​

A simple, lightweight program designed for a user to upload a PDF, and supply a list of desired terms. The program will then return a handsomely formatted, alphabetized index—pairing each term with a complete list of page numbers where it appears in the text.

​

Operation:

  • Upload a PDF

  • In the terms field, list each term in the following format: "Term = Term1, Term2, Term3" where "Term" is the entry as it will appear in your index. "Term1," "Term2" and so on, are alternate forms of that word that you would like the Indexer to match as well

  • For example: If you want the term "Rome" to appear in your index, and for that to cover the words Rome, Roman, and Romans, your line would read like so: "Rome = Rome, Roman, Romans"

  • Press the Generate Index button

  • Copy or save your Index as a .txt file

​

Features:

  • Avoids false positives (e.g. Indexing the term "light" will not also return "enlightened," or "slightly.")

  • Avoids terms that appear only in the chapter header on a given page. This is accomplished by ignoring the top 80 pixels of each page.

  • Alphabetizes your term list automatically, no need to do that ahead of time.

  • Terms appearing on multiple consecutive pages will not be listed individually. 93, 94, and 95 will appear as "93-95"

  • Page numbers are derived from the PDF file itself, not from numbered text on the page. Ergo: please be sure that the first page of your PDF is page number 1, and so on.​​​​​​​​​​​

​

"The Temple is holy because it is not for sale." -Ezra Pound, Cantos. 1925.

bottom of page