About Perseus under PhiloLogic

Perseus Project Texts Loaded under PhiloLogic
Final season for PhiloLogic 3, Summer 2018

News and updates

Welcome to Perseus under PhiloLogic. It's been a while since we released an official update, but here it is, thanks to indefatigable wrangling (of C, perl, apache and many other arcana) this summer by Walt Shandruk. The intervening years have seen significant work especially on the side of the Greek texts, and some enhancements to the functionality of PhiloLogic for ... philologists. I'd like to thank all those who have written to us over the years to report typos and other errors; or simply to report a server outage. We are glad you keep coming back for more.

If the Perseus mothership at Tufts represents a well-outfitted library carrel, with texts, commentaries, dictionaries and other resources all within your reach on the same page, then the organization of this site may come as a surprise. Rather than reading-with-apparatus, we aim to offer exploration of the texts through a panoply of corpus queries. Because many texts have been parsed by hand, and the rest of them with the computer, you can search in ways that you don't see in too many other places: For instance, search for present imperatives in Plato, or the particle μέν only in lines spoken by Ismene. Looking for forms of βροτός at the end of the sentence? We are here to serve them up. For experienced and post-beginner readers, we also recommend tools that will by now be familiar, I hope: the Keyword in Context view, and the Collocation tool.

We are grateful for all problem reports and user suggestions; keep them (and your donations:-)) coming. To keep abreast of developments, consider following us on Twitter: @LogeionGkLat. Work that awaits: we really want to incorporate more texts, which are steadily becoming available from the Perseus Digital Library. More importantly, we need to adapt the new generation of PhiloLogic, PhiloLogic4, to the needs of classicists (think: navigation that is not by page number in an edition; lemma searches; ...), so that we no longer depend on fifteen-year-old technology. Stay tuned.

Background: where do those texts come from?

The texts we make available on this site are practically all used by permission from the Perseus Project at Tufts University, the foremost Digital Library for the classical world, if not for the Humanities in general. In its collection of Greek and Roman materials, readers will find many of the canonical texts read today. The Greek collection approaches 8 million words and the Latin collection currently has 5.5 million. In addition, many English language dictionaries, other reference works, translations, and commentaries are included, so that anyone with an internet connection has access to the equivalent of a respectable College Classics library. The Greek and Latin texts are richly encoded for content rather than form (e.g., not page breaks, initials, and indents, but speaker information, metrical information, and milestones). The Perseus site is further enriched by intricate linking mechanisms among texts (resulting in more than 30 million links). For licensing information, details on editors and translators, etc., click on the XML Header links that show up in the bibliographical details of the texts.

What did you do to the texts? or: Where is the mirror?

You will here find a selection of the text at the Tufts site, but the mechanism for browsing and searching them is a different one. It is PhiloLogic, a system that was especially developed for large textual databases by the ARTFL project at the University of Chicago. While the original Perseus site is an excellent tool for linear reading, by putting all kinds of resources on the same page while a user reads a passage, we were interested in leveraging the rich encoding for searching the texts, and for other tasks that are less about reading and more about research: corpus linguistics, above all. We are grateful that the Perseus Project makes its texts available to third parties, and continue to live in hope that other not-for-profit institutions devoted to text curation will enhance their search and analysis offerings, or follow the example of Perseus, and decide to make their data available for advanced analysis with other systems than their own. Please get in touch, or download your own copy of PhiloLogic, which is open-source.

Why doesn't your site give me Cicero to read when I type in Cicero in the search box?

It is important to understand that a PhiloLogic search form is not like a Google search box. The main search box is for words that occur in the text, so that by typing 'Gallia est' you will find the opening sentence of the Gallic Wars, but entering 'Julius Caesar' will in the first instance lead you to texts of Catullus and Cicero. Starting from our homepage, if you wish to read a work by a certain author, click on the initial letter of the author's name to see a list of authors and works; type in the abbreviation of the work (e.g., Caes. Gal.) in the citation search box, or click on the link to the full search form, where you can use the Author and Title fields.

Why aren't you more like Google?

PhiloLogic is designed to leverage the rich structural encoding that Perseus texts offer, and therefore to know the difference between types of content: words in the texts, versus the so-called metadata: authors, titles, and much more. It is also designed to allow for precise answers to specific questions, rather than ballpark estimates of the 'are you feeling lucky' type. If you search for the word 'amicitia' in texts, or for the name 'Pseudolus', we don't want you to find instances from titles, or speaker indications -- unless you specify that that's the kind of information: titles that include amicitia, words spoken by Pseudolus, that you want. We believe that both approaches have their advantages but that more precise searching is something that classicists tend to want. In sum, before entering anything in a search field, ask yourself what kind of search this is: a word search or a search for metadata. If your search is for metadata, find the fitting field elsewhere on the search form. Tip: By clicking on the buttons next to the search fields, you will always get a listing of your options.

Why are my results different when I search that other Greek corpus?

Several important distinctions: Most importantly, that corpus is probably much much larger than the selection offered here, and the texts are often of more recent vintage. On the other hand, the texts may not have been disambiguated, so that guesses about frequency may always be at the high end and include lemmata that do not in fact occur in the texts or do not occur with the frequency asserted. We would like to see the functionality of searching by part-of-speech, or by specified attribute (such as speaker), and better leveraging of parsing in everybody's corpus, but we are in no position to know what goes on behind closed doors. More questions? Happy to chat, of course.

How do I use this site? Where did all the search forms go?

One type of reaction we heard a lot about the original Perseus under PhiloLogic site was that the search forms were rather intimidating to the novice. While keeping the search forms in place for all the power users who have by now become accustomed to them, we decided to offer a radically simplified front page for all our resources, with a pared-down set of search options. On this new home page, you can now navigate to texts directly by entering a citation, search for a word or phrase, or browse works alphabetically by author. We wanted to make sure that finding texts is as intuitive to a classicist reader as possible, and so you can usually look up a text based on its Oxford Classical Dictionary citation. In addition, the homepage gives direct access to word parses, dictionary entries, and grammar sections.
Texts and their translations live in the same databases. In the new release, we have decided to no longer display these in a single browser window. Many users found this confusing. You can now go from translation to original, or read them side by side, by clicking on links ('English', 'Greek', 'Latin'). If there are multiple translations, you will see 'English' and 'English2'. For a demonstration of a typical visit, check the steps in the earlier part of this presentation.
Commentaries and Monographs live in two separate databases. On the home page, you can now enter author or title so that it is easy to find out whether a commentary is available for a particular ancient text. Monographs include various grammars. We have made a quick lookup box for grammar sections, in accordance with how these works usually get cited in commentaries and in classrooms.
Dictionaries are now accessible via the parse window in the Greek and Latin databases. In addition, entries in Liddell & Scott and Lewis & Short can be looked up from the homepage. Full text remains searchable from the search forms for the individual dictionaries.

What browser should I use?
I can't find the parsing window any longer!
Why is my perfectly normal word with an acute accent not found?

We know about users with good experiences on Linux, Ubuntu, Windows XP, Mac OS as operating systems; we know that Opera, Firefox, and Safari have been successfully used as browsers. Unfortunately Internet Explorer is not compatible with our click-to-parse mechanism. In all other browsers we have tested, a click on a Greek or Latin word should result in a new window with parse information and links to dictionaries. Subsequent clicks will result in this same parse window being 'refreshed'; if you don't see anything, it may be that this window is hidden behind your other browser window(s). If Greek fails to show up as Greek, make sure that your browser can deal with UTF-8 encoding, and download some Unicode font that has Greek in it. There are plenty of free Greek fonts. Cutting and pasting into word processors should be easy. In most cases, you should be able to type in words you search for without diacritics (this also means: no breathings and no iota subcripts), or in transliteration (see 'Info & Help' for guidance); just be sure to also select the corresponding radio button ('no diacritics', 'transliteration') when you do this.
Unicode detail that is probably too much information: we try to be consistent in using pre-combined Unicode and avoiding the now-deprecated characters that use 'oxia' rather than the canonical 'tonos' combinations). If you use a Greek input method that produces the 'oxia' variant, consider entering your search without diacritics when there are acute accents in play or installing an input method that adheres to canonical practice. The Mac OS X system has built-in polytonic Greek input that also complies with these standards.

What is this business about morphology?
And what do the colors mean?

In the Spring of 2008 we received an ATI grant to develop morphological analysis for the Greek corpus, and to make it searchable. You can learn more about this project by reading abstracts of our presentations on this topic or taking a look at this big poster on how it was all put together. In a more recent presentation, we present a walk-through of a set of searches. For more details on part-of-speech codes, consult the 'Info & Help' sections on the search forms. It is important to point out that the texts were not parsed by hand, so that there will be many erroneous parses. We hope you will help us correct those!
In a typical parse window, you'll see one parse highlighted in light blue. It indicates that our automatic part-of-speech tagger has selected this parse as the most likely one in the context. You will see a number (say, 0.45678) associated with the parse. This expresses the probability the system (a stupid computer that does not know Greek as well as you do!) associates with that particular parse. Parts of the texts have been hand-tagged. If you encounter a hand-tagged form, it will be green in color. Even there, data entry problems may come up, so please be critical and report (submit a problem report form via the link in the parse window if the correct parse is not listed) any errors you find.

How do I search for morphological attributes or lemmas?

If you wish to search for occurrences of a lemma or part-of-speech code, you use the same search field as for normal words (or 'strings'), but you prefix them with 'lemma:' or 'pos:'. For example, 'lemma:nostos' or 'lemma:sum'.

New: by using 'form:' you can ignore the more complex instructions for part-of-speech codes that follow. Simply write out what you think will sufficiently describe the form you are looking for, in any order, but use hyphens between terms. For instance, 'form:optative-act-singular' for an active optative in the singular, where 'form:sg-opt-act' would do the same thing.

The part-of-speech codes are less simple to summarize. The Info & Help section has a quick introduction. It is important to know that while a full analysis constitutes ten slots, many of these will be empty (-), and even more will not be of interest to you at a given time. All of these you can leave unspecified with *, but your formulation must be specific enough that an 'a' does define accusative and not aorist. For this it is helpful to know the ordering of the different slots. They are:
1) major part of speech: Verb, Noun, Adjective, Pronoun, particle (g), aDverb, nuMeral, pReposition, Conjunction, Interjection;
2) minor part of speech: a: Article or determinative (Latin is, idem, ipse), Personal, Demonstrative, x: indefinite, Interrogative, Relative, poSsessive, k: reflexive, reCiprocal, propEr;
3) person: 1, 2, 3;
4) number: singular, plural, dual;
5) tense: Present, Imperfect, Aorist, peRfect, pLuperfect, Future, fuTure perfect;
6) mood: Indicative, Subjunctive, Optative, iMperative, iNfinitive, Participle, Gerundive, gerunD, sUpine;
7) voice: Active, Middle, Passive, middlE-passive;
8) gender: Masculine, Feminine, Neuter, Common;
9) case: Nominative, Genitive, Dative, Accusative, aBlative, Vocative;
10) degree: Comparative, Superlative.
Regular expressions will work to a certain extent. For instance, one could merely specify 'pos:*a-' to capture accusatives. (All slots from 1 through 8 are here left unspecified. We know this because the search field always requires a complete word, and we have ended our word with '-' and not with a wild card). This initial formulation, however, would miss accusatives that are also comparatives or superlatives. In order to include them, try 'pos:*a[-cs]' instead. [xyz] means 'pick any one of the items xyz between the brackets'. Conversely, if one is looking for personal pronouns, it may make sense to use pos:pp* with no further specification about slots 3-8.
Part-of-speech and lemma searches can be combined, by means of a semi-colon, or used separately, with a space, if one is specifying different words: The search 'lemma:dokew;pos:v-3s.* pos:.*d-' searches for forms of δοκέω in the 3rd singular (semicolon), and separately, something in the dative.
This is probably as good a moment as any to point out that our parser and our search engine do not know Greek or even Latin syntax! You will have to decide for yourself, in searches of this sort, whether the datives you find are in fact datives that are governed by the verb.
Is all of this rather overwhelming? We do realize that the formulas look rather forbidding! If we can find the time and the funding, we will work on more natural language querying (could I please have some perfect active optatives?) to take the place of 'pos:v*roa*'.

So what?

We think that this corpus holds great promise both for research and for teaching. Philologists need to do corpus study beyond the single word; more particularly, classical linguists should work on making more evidence-based and quantitative claims than are found in much of the current literature. Teachers who wish to select what vocabulary or constructions to emphasize should have a notion of frequency of use, and rather than making up examples, they could run a quick search for actual examples of constructions. To give a simple example, three definite articles in sequence is not unusual. Now you can find actual examples in Lysias, a suitable author for introductory and intermediate classes, to demonstrate this. On a practical note for teachers, if you send your class a link of this sort, the phenomenon you wished to highlight is highlighted on the page. If you wish to draw your students' attention to a particular part of a page - search for it, and send them the copied URL of the search result. They will see the same highlighting.

Great! How can I help?

As you can probably imagine, there are many many wheels within wheels to make this site do what it does, and sometimes things get lost in the shuffle. If you see something awry, please let us know. Here's how you can help us improve this site: If you encounter a problem, please use the "Report a Problem" link that you will find on the Results pages.
In addition, we hope you will select the correct parses when you use the parse window. You will see your selection turn yellow; it will also be stored in the database. In fact, it will be quarantined, with all other user votes, until approval, from which point all users will see the corrected parse and new runs of our part-of-speech tagger will be more accurate thanks to the increased amount of so-called training data. Your corrections, therefore, will have both a local impact in their context, and a global impact on the accuracy of the database as a whole.
The parse window has a separate problem report form (in case none of the parses is satisfactory, or the short definition falls, well, short).

What if I want to do more?

This project would not have been possible without open-source software and data shared under creative-commons licences. If you are a faculty member, staff, student, or administrator at an institution of higher learning, get informed about Open Access, Open Content and the Creative Commons. Support the principles they represent, and work for change where you can in your own institution and professional organizations. Regardless of affiliation, classical enthusiasts can support organizations that work with these principles. You can support open-access and creative-commons oriented projects that you like. For classicists, some sites to visit as good clearing houses for this kind of information are Chuck Jones's Ancient World Online, Neel Smith's Vitruvian Design blog, and stoa.org.

Credits

Much of the programming on this release has been done by a single Classics BA pursuing a Master's in Computer Science (a good amount of additional unfunded work by determined classicists helps, as well as open-source software and assistance by its developers). We wish to register our gratitude to the Provost's office of the University of Chicago for its ATI grant for 2008-09. And of course, κῦδος to Richard Whaling for pulling it off!

Is that all?

A final line-up, then, of people to thank for their help in the past year. All the programming for the 2009 release was done by Richard Whaling. We, Richard and Helma, wish to thank our disambiguators: Kristin Dean, Charlotte Krontiris, and Ursula Poole; Walt Shandruk, for munging through a pile of Latin data on short notice; the Perseus Project, for sharing data and expertise; Martin Mueller, for consultation and making available his Homeric data; and Hugh Cayless, for making our life easier with his Transcoder. We thank the entire staff at ARTFL for welcoming classicists in their midst and generously sharing expertise, caffeine, and mirth.

Chicago, July 2009
Helma Dik

About Perseus under PhiloLogic, 2018 edition

Perseus Project Texts Loaded under PhiloLogic Final season for PhiloLogic 3, Summer 2018