Resources in our repository exist in different languages, for different types of materials, as well as for different categories of material.

Thanks to the use of Standards for the storage and description of these resources, they can all be listed from a single access point, as demonstrated here.

data

repository

data

1_parts_of_words

english
Subsyllabic similarity, De Cara & Goswami (2002)
Statistical Analysis of Similarity Relations among Spoken Words: Evidence for the Special Status of Rimes in English
web iconmetadata icon
Letter sequences statistics, Hirata & Bryden (1971)
Letter sequences varying in order of approximation to English
metadata icon
The sounds of English
In this site thousands of English words have been painstakingly grouped according to their sounds and their spellings making the patterns obvious. This is the most logical and systematic method to learn English. It doesn't rely on rules to teach reading and spelling; instead, repeated exposure to a sound/letter pattern allows your brain to recognize the pattern intuitively and internalize it.
web iconmetadata icon

2_words

dutch
AoA norms for Dutch, De Moor et al. (2000)
Age-of-acquisition ratings on 2816 Dutch four- and five-letter nouns. These norms were collected by asking 559 undergraduates to indicate for a set of words at which age they thought they had learned them.
web iconmetadata icon
Validated AoA norms for Dutch, Ghyselinck et al. (2000a)
Alphabetical listing of 254 validated words together with their rated AoA, logarithmic frequency and the evaluation of the three judges.
web iconmetadata icon
Validated AoA norms for Dutch, Ghyselinck et al. (2000b)
Alphabetical listing of the 410 validated words together with their rated AoA and the percentage of children that correctly indicated the meaning of the word.
web iconmetadata icon
AoA norms for Dutch, Ghyselinck et al. (2003)
These norms were collected by asking 142 participants to indicate for a set of 389 or 388 words at which age they thought they had learned them.
web iconmetadata icon
Norms on emotional valence and concreteness, Van der Goten et al. (1999)
Norms are provided for 1- and 3-syllable words
web iconmetadata icon
Celex database for Dutch. Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
web iconmetadata icon
Woorden in het basisonderwijs, Schrooten & Vermeer (1994)
15.000 woorden aangeboden aan leerlingen
web iconmetadata icon
Translations Norms for Dutch-English Translation Pairs, Tokowicz et al. (2002)
Dutch-English Number of Translations, Form Similarity, and Semantic Similarity Norms
web iconmetadata icon
english
AoA and imagery measures, Bird et al. (2001)
Age_of_acquisition, imageability, and frequency measures for 2,694 words
web iconmetadata icon
AoA and imagery measures, Gilhooly & Logie (1980)
Age_of_acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words
metadata icon
Celex Database for English, Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
web iconmetadata icon
English Lexicon Project (ELP, Balota et al. 2002)

Adapted from website). The English Lexicon Project (ELP) is an ongoing project. Its goal is to collect normative data for speeded naming and lexical decision for over 40,000 words across 1200 subjects at 6 different universities and to integrate these data into a database along with descriptive characteristics of the words used in the study.

As for now, the English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.

The naming and lexical decision data are currently being collected from six testing Universities. To date, we have collected 2,752,698 reaction time measurements from 816 subjects in the lexical decision experiment. We have also collected 1,125,880 experimental measurements from 444 subjects in the naming experiment.

Researchers interested in psycholinguistics, human memory, computational modeling, and other fields will find these data useful. For example, researchers will be better equipped to select stimuli, test theories, and reduce potential confounds in their studies.

web iconmetadata icon
Brett Kessler publications, programs, and datasets
NA.
web iconmetadata icon
Lexical FreeNet :: Connected Thesaurus
[As on authors' website]This program allows you to search for relationships between words, concepts, and people. It is a combination thesaurus, rhyming dictionary, pun generator, and concept navigator. Use it to find words that fit the needs of whatever writing endeavor you've undertaken, or just to browse concept space. To use the system, enter one or two words into the boxes at the top of the page, select a function to perform, optionally select some word relations to allow, and click Submit Query! Here is a description of the seven functions that are available.
web iconmetadata icon
MRC database (Coltheart et al.)
MRC Psycholinguistic Database containing over 150000 words with up to 26 linguistic and psycholinguistic attributes for each (e.g. pronunciation, part of speech, word...). Among these attributes, age-of-acquisition and imagery statistics from Gilhooly & Logie (1980) and Word frequency list from Kucera & Francis (1967)
web iconmetadata icon
Wordmine.org
[As on authors' website] Wordmine is a culmination of several years of computational analyses of various word-level constructs. It is meant to act as a psycho-linguistic resource similar to the MRC database and is available to all researchers free of charge. This resource, the MRC database and RT values available from Balota and his colleagues can be used in combination to pre-test assumptions regarding variables of interest. We encourage students to use these sites and data so that they can run "experiments" to test assumptions about the way the word recognition system operates. In exchange for the data available here we ask that you acknowledge this site in your presentations or publications.
web iconmetadata icon
WordNet
[As on authors' website] WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
web iconmetadata icon
Word Frequencies in Written and Spoken English, based on the British National Corpus, Leech et al. (2001)
[From authors' website] Book with frequency statistics derived from the British National Corpus - a 100,000,000 word electronic databank sampled from the whole range of present-day English, spoken and written - and makes use of the grammatical information that has been added to each word in the corpus. Includes frequencies for present-day speech (including everyday conversation) as well as for writing: (a) Rank-ordered and alphabetical frequency lists for the whole corpus and for various subdivisions: e.g. informative vs. imaginative writing, conversational vs. other varieties of speech. (b) Entries take account of grammatical parts of speech (e.g. round as a preposition is listed separately from round as an adjective). (c) Includes discussions of a number of thematic frequency lists such as colour terms, female vs. male terms, etc
web iconmetadata icon
Kurcera & Francis (1967)
Kurcera and Francis Frequency values
metadata icon
Children's Printed Word Database, Stuart et al. (1993-1996)
ESRC-funded project to develop a database of printed word frequencies as read by children aged between 5 & 9.
web iconmetadata icon
Children's - Stuart et al. (2003)
Database of children's early reading vocabulary, for use by researchers and teacher, with up-to-date word frequency list of early print exposure in the UK
web iconmetadata icon
Zeno et al. (1995) - Educator's word frequency guide
Quantitative summary of the printed vocabulary encountered by students in American schools, with separate word frequency counts were conducted on materials for each grade (grade 1 through college)
web iconmetadata icon
Crawford et al. (2004)
Corpus of gender-related and neutral words
web iconmetadata icon
Cortese & Fugett (2004)
Imageability ratings for 3,000 monosyllabic words
web iconmetadata icon
Norms, Clark & Paivio (2004a)
Expanded Norms for Original 925 Paivio, Yuille, and Madigan (1968) Items.
web iconmetadata icon
Norms, Clark & Paivio (2004b)
Expanded Norms for 2,311 Items
web iconmetadata icon
Maki et al. (2004)
Semantic distance norms computed from an electronic dictionary (WordNet).
web iconmetadata icon
Gahl et al. (2004)
Verb subcategorization frequencies (American English)
web iconmetadata icon
french
Objective AoA norms, Chalard et al. (2005)
Alphabetical listing of the 230 words and their scores on objective AoA.
web iconmetadata icon
BDLex (de Calmès & Pérennou, 1998)

BDLEX consists of a lexical database developed within the French GDR-PRC CHM at IRIT (IMH-PT team), Paul Sabatier University, Toulouse. The data cover lexical, phonological, and morphological information.

The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical words) with the following attributes: spelling, pronunciation, morphosyntactic features (part of speech, agreements,...), the canonical word spelling and a frequency indicator.

Moreover the lexical resources include the version BDLex-syll which specifies the syllabic division in the field pronunciation.

web iconmetadata icon
BruLex, Content et al. (1990)
[As on server] The Brulex directory includes the different versions fo the BRULEX database. BRULEX was developed about ten years ago, for the purpose of facilitating selection and control of experimental materials in psycholinguistic experiments on lexical processing. It contains a large number of informations that were typed manually by benevolent collaborators. Potential users should be aware that the current version, which hasn't been changed since '89, contains a number of transcription errors and inaccuracies.
web iconmetadata icon
Dicouèbe, Dictionnaire en ligne de combinatoire du Français

(from webpage):Le DiCo (acronyme pour dictionnaire de combinatoire) est une base de données lexicales du français, développée depuis plusieurs années à l'OLST par Igor Mel'čuk et Alain Polguère. La finalité première de cette base est de décrire chaque lexie apparaissant dans la nomenclature du DiCo selon deux axes : les dérivations sémantiques (relations sémantiques fortes) qui la lient à d'autres lexies de la langue et les collocations (expressions semi-idiomatiques) qu'elle contrôle. Cette description s'accompagne d'une modélisation des structures syntaxiques régies par la lexie et d'une modélisation de son sens, sous forme d'étiquetage sémantique.

web iconmetadata icon
Lexique, New (2001, 2004)
Lexique gives a lot of information (frequency, neighbours, phonology, lemma, etc.) concerning 137.000 words in French based on large corpora (35 millions of words). This database is regularly updated and the the 3rd version has been released in 2005. This third version brings a lot of new features as:
- written and estimated spoken frequency
- frequency of any character string
- recent words
- etc. Open Lexique is a project that allow people to query simultaneously several french databases. Other ressources have been developped such as a free text corpus, a first name database, an anagram database, etc.
web iconmetadata icon
LexOP. Peereman & Content (1998)
[As on the authors' server] The Lexop directory includes the different versions of the LEXOP database. LEXOP includes about 2,500 monosyllabic French words, and providesa large number of statistics related to orthography to phonology and phonology to orthography mappings. [see the Lexop/00readme file for more details about the contents and distribution]
web iconmetadata icon
MANULEX (Lété, Sprenger-Charolles, Colé)
A Grade-Level Lexical Database from French Elementary-School Readers

MANULEX provides grade-level word-frequency lists of non-lemmatized and lemmatized words (48,886 and 23,812 entries, respectively) computed from the 1.9 million words taken from 54 French elementary-school readers. Word frequencies are provided for four levels: 1st grade (G1), 2nd grade (G2), 3rd to 5th grades (G3-5), and all grades (G1-5). The frequencies were computed following the methods described by Carroll et al. (1971) and Zeno et al. (1995) with four statistics at each level (F: overall word frequency, D: index of dispersion across the selected readers, U: estimated frequency per million words, and SFI: Standard Frequency Index). The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally, it may serve an educational purpose by providing basic vocabulary lists.

web iconmetadata icon
NovLex 1, Lambert & Chesnet (2001)

(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.

Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).

De ce corpus nous avons extraits 9300 racines lexicales (Base Lexicale) distinctes (déterminées à l'aide du dictionnaire Larousse).

web iconmetadata icon
NovLex 2, Lambert & Chesnet (2001)

(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.

Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).

De ce corpus nous avons extraits 20 600 entrées orthographiquement différentes (Base d'occurrences). Dans la Base d'Occurences, toutes les formes orthographiques sont considérées comme des entrées séparées (e.g. "cheveu" et "cheveux" sont deux entrées distinctes).

web iconmetadata icon
Omnilex
Base de Données Informatisée sur le Lexique du Français Contemporain
web iconmetadata icon
Vocolex, Dufour et al. (2002)
Paper Abstract: Several studies on auditory word recognition indicate that word processing is influenced by the phonological similarity with other words. We describe a lexical database, VoColex, which provides several statistical indexes of phonological similarity between French words. Phonological similarity is computed according to two distinct principles. According to the first principle, phonologically similar words share initial phonemes with the target word. According to the second principle, phonological neighbours correspond to any words which can be derived from the target by a single phoneme change (substitution, addition, or deletion) whatever the position of the modified phoneme. The statistical data provided by VoCoLex should allow the control and the empirical manipulation of various measures of phonological similarity, as well as quantitative descriptions of the auditory lexicon.
web iconmetadata icon
Dicos à ABU
[From website]. La vertu des listes de mots que vous trouverez dans ces pages n'est pas d'offrir aux bibliophiles que vous êtes la possibilité de développer des outils professionnels. Elles sont en effet loin d'être complètes et sans erreur.

Il y a pour l'instant quatre listes : une liste de mots communs (+300000 mots), une liste de prénoms (12437 prénoms), une liste de nom de cités française (39076 noms), une liste de nom de pays (170 pays), une liste de difficultés de la langue (1500 mots).

Et également un dictionnaire:" Les Excentricités du Langage" de Lorédan Larchey (version hypertexte). Une perle que nous vous recommandons !
web iconmetadata icon
XMLittré
[From website]. Ce site propose une version interrogeable en ligne du dictionnaire de la langue française d'Émile Littré.

Cet ouvrage a été publié à partir de 1863, puis dans sa deuxième édition en 1872-1877.
web iconmetadata icon
Cordier & Le Ny (2004)
Values of Experiential Frequency, Degree of Knowledge and Rated Familiarity for French Words
web iconmetadata icon
ARTFL- Word Frequency Search Form
Word Frequency information. TLF. Trésor de la Langue française. Imbs (1971).
web iconmetadata icon
Morphalou
Le lexique Morphalou est un lexique ouvert des formes fléchies du français. Les données initiales de Morphalou proviennent du TLFnome, la nomenclature du Trésor de la Langue Française qui a fourni 539.413 formes fléchies, appartenant à 68.075 lemmes. Le transfert du TLFnome vers Morphalou s'est fait par une réorganisation structurelle des données et une normalisation des étiquettes grammaticales, sans perte d'informations linguistiques. Le lexique résultant est un lexique à large couverture (~540.000 formes fléchies), linguistiquement valide (sous la responsabilité d'un comité éditorial) et formellement en accord avec les propositions de normalisation pour les ressources lexicales du TAL à l'ISO (TC37/SC4). Il est en accès libre à des fins de recherche et d'enseignement. Le maintien et la mise à jour du lexique sont assurés par l'ATILF.
web iconmetadata icon
Dictionnaire des synonymes
[From website]. Ce dictionnaire des synonymes contient approximativement 49 000 entrées et 396 000 relations synonymiques . La base de départ est constituée de sept dictionnaires classiques (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse et Robert) dont ont été extraites les relations synonymiques ; ce premier travail, effectué par l'Institut National de la Langue Française (INaLF) a produit une série de fichiers ; les données de ceux-ci ont été regroupées et homogénéisées au sein du laboratoire CRISCO (ELSAP à l'époque). Enfin, nous avons complété cette procédure par un important travail de correction (par adjonction ou suppression de liens synonymiques) sur le fichier final. Ce projet a démarré sous la responsabilité de Sabine PLOUX, qui a défini les principes de fonctionnement de ce dictionnaire. Depuis 1998, Jean-Luc MANGUIN en est le responsable ; il en a assuré la mise en oeuvre sur Internet et la confection de l'interface d'interrogation en mode texte. Les développements actuels résultent d'un projet qui a mis en collaboration le CRISCO (Caen) et l'entreprise Memodata (Caen). Ce projet a été retenu par le Comité Régional pour l'Imagerie et les Technologies de l'Information et de la Communication.
web iconmetadata icon
spanish
Corpus Diacrónico del Español (CORDE)
Diacronic Corpus for the Spanish Language. [With interactive search facilities]
web iconmetadata icon
Corpus de Referencia del Español Actual (CREA)
[...] describe las posibilidades del programa informático de consulta del Banco de Datos del Español de la Real Academia Española. Se trata de un texto para personas sin conocimientos específicos de la materia, en el que se proporcionan las nociones básicas para la consulta interactiva del mayor recurso léxico -más de 200 millones de palabras- disponible para el idioma español. [With interactive search facilities.]
web iconmetadata icon
Diccionario de la Universidad de Oviedo
Diccionario de Antónimos, Diccionario de Sinónimos, Conjugador de Verbos, Términos Relacionados.
web iconmetadata icon
Izura et al. (2004)
Category norms for 500 Spanish words in 5 semantic categories
web iconmetadata icon
Banco de datos del Español
Nómina de autores y obras. [With interactive search facilities.]
metadata icon

3_nonwords

english
ARC nonword database, Rastle et al. (2002)
Database of 358,534 nonwords
web iconmetadata icon
french
Cordier & Le Ny (2004)
Values of Experiential Frequency, Degree of Knowledge and Rated Familiarity for French Pseudowords
web iconmetadata icon

4_running_text

across_the_board
WordTheque
[From webiste]. The Wordtheque is a powerful interface with a massive database (currently 707.737.941 words) containing multilingual novels, technical literature and translated texts. Hits are highlighted in context windows that can be expanded up or down. To go to the source web pages (novels, etc.)
web iconmetadata icon
english
Childes, MacWhinney
[From authors' webiste]. The CHILDES system provides tools for studying conversational interactions. These tools include a database of transcripts, programs for computer analysis of transcripts, methods for linguistic coding,and systems for linking transcripts to digitized audio and video.
web iconmetadata icon
french
GrosMots.com
[From webiste]. Nous nous proposons de : rassembler des corpus de textes français libres de droits; stucturer les textes en posant des balises pour délimiter les différentes parties de chaque ouvrage; maintenir une page de liens vers les articles sur internet à propos de ces ouvrages; monitorer des forums de questions pointant sur les ouvrages, les auteurs et leurs contextes; monitorer un service d'annonces d'échange et vente d'éditions diverses des ouvrages; introduire Progsession pour un travail en groupe.

Nous espérons terminer en 2007 un premier plan portant sur 3000 oeuvres dont plus de 2000 sont déjà téléchargeables.
web iconmetadata icon

5_visual_material

across_the_board
Amsterdam Library of Object Images (ALOI)
1000 objects under different angles and different lighting conditions. Royalty Free?
web iconmetadata icon
Fribbles Stimulus Sets
Fribble stimuli used in several experiments. Within each Fribble species, the exact shape, color, and texture of the main body and the approximate location and interrelationships between appendage parts are held constant for all exemplars. Colors and textures of appendage parts are also similar (although not identical) across exemplars. The main aspect that changes from exemplar to exemplar in a species is the exact shape of the appendage parts.
web iconmetadata icon
Action Picture Stimuli in IPNP
Black and white drawings of 275 transitive and intransitive actions from different sources.
web iconmetadata icon
Object Picture Stimuli in IPNP
Black and white drawings of 520 common objects (including 174 pictures from the Snodgrass & Vanderwart set and other sources.)
web iconmetadata icon
Diagnostic Color Objects
Color images of many diagnostic color objects, e.g., a banana. Objects are shown in typical and atypical colors. There are also control sets of neutral color objects. The orignal set were used as stimuli in Naor-Raz, Tarr, & Kersten (2003)
web iconmetadata icon
Grayscale pictures of 31 chairs -- Bruno Rossion
Grayscale pictures of 31 chairs garnered from various sources by Bruno Rossion at the Universite Catholique de Louvain. Bruno has scaled all of the images to the same size, orientation, and brightness. Bruno asks that if you are going to use the chairs, please contact him at rossion@neco.ucl.ac.be and let him know what you are up to. The images are STANDARD COLORS grayscale PICTS.
web iconmetadata icon
Colorized Snodgrass and Vanderwart pictures -- Rossion & Pourtois (2004)
The authors have created a new set of stimuli based on the widely used line drawings of Snodgrass and Vanderwart (1980). These 260 stimuli contain diagnostic texture and color information. Normative data (naming agreement and latencies, complexity, familiarity, imaginability) for these new stimuli have been collected (Rossion & Pourtois, 2004). Their data shows that surface information, color in particular, greatly facilitates object recognition.

If you download the set and wish to use it in an experimental/clinical study, a donation of $30-$50 to help defray their costs would be most welcome. You can send correspondence about contributing to Bruno Rossion.
web iconmetadata icon
Royalty Free Clipart Images
A long list of links for royalty free or free to use clipart images.
web iconmetadata icon
Royalty Free Photos
A long list of links for royalty free or free of use photos.
web iconmetadata icon
Change Blindness Scenes
This set of scenes were used as stimuli in the studies reported in Aginsky & Tarr (2000) set contains many variants of individual scenes. Variants were generated by either moving or changing the color of some element of the scene. The images are color PICT files.
web iconmetadata icon
Object Data Bank
Michael Tarr (with the artistic help of Scott Yu) has developed a wonderful data bank of three-dimensional objects from numerous views.
web iconmetadata icon
Viperlib, visual perception library
Viperlib is a web-based resource library of images and presentation material illuminating the study of visual perception. (more than 2000 images)

All images are given freely by the vision research community and are available for educational, non-profit use only.
web iconmetadata icon
dutch
Lexical norms for pictures, Martein (2005)
This work presents the results of a normative data collection study of 216 pictures which can be used in a wide range of cognitive experiments. Black-and-white line drawings of 216 objects, belonging to 20 large semantic categories, were rated by a sample of 300 first-year psychology students at the University of Ghent. These ratings provided data on several variables of central importance to cognitive processing and memory functioning: name agreement, concept agreement, familiarity, visual complexity and image agreement. The following semantic categories were included in the set: 1. Article of clothing, 2. Birds, 3. Electronical appliances, 4. Fish, shells, ..., 5. Flowers, plants, ..., 6. Food, 7. Fruit, 8. Furniture and decoration, 9. Insects, 10. Kitchen-utensils, 11. Mammals, 12. Miscellaneous, 13. Musical instruments, 14. Parts of a building, 15. Parts of the human body, 16. Reptiles and amphibians, 17. Tools, 18. Vegetables, 19. Vehicles, 20. Weapons
web iconmetadata icon
Lexical norms for pictures, Severens et al. (2005)
Timed norms for 590 pictures in Belgian Dutch, with name agreement and response latencies.
web iconmetadata icon
english
Norms for timed picture naming
The UCSD Center for Research in Language is engaged in a large international study to provide norms for timed picture naming in seven different languages (American English, German, Mexican Spanish, Italian, Bulgarian, Hungarian, and the variant of Mandarin Chinese spoken in Taiwan). They currently have data for over 500 pictures.
web iconmetadata icon
french
Bonin et al. (2003)
French norms for name agreement, image agreement, conceptual familiarity, visual complexity, image variability, age of acquisition, and naming latencies
web iconmetadata icon
Schwitter et al. (2004)
French normative data and naming times for action pictures
web iconmetadata icon
spanish
Cuetos & Alija (2003)
Normative data and naming times for action pictures in Spanish
web iconmetadata icon
Cuetos et al. (1999)
Naming times for the Snodgrass and Vanderwart pictures in Spanish
web iconmetadata icon
Dasi et al. (2004)
Normative data on the familiarity and difficulty of 196 Spanish word fragments
web iconmetadata icon
Fernandez et al. (2004)
Free-association norms for the Spanish names of the Snodgrass and Vanderwart pictures in Spanish
web iconmetadata icon

7_performance_measures

english
Lexical Decision, Balota et al. (1999)
Lexical Decision Corpora
web iconmetadata icon
Spieler & Balota (1998)
Naming Latencies for Younger and Older Adults.
web iconmetadata icon
Response times in CVC naming study, Treiman et al. (1995)
[As on the authors' website] These are the mean RTs and error rates from the Treiman et al. 1995 naming study with Wayne State University students. They are being made available to other researchers who wish to do additional analyses of these data.
web iconmetadata icon
DRC simulation results for words, Coltheart et al. (2001)
DRC simulation results reported in Coltheart et al. (2001), 7910 words
metadata icon
PMSP simulation results, Plaut et al. (1996)
Simulation results for the PMSP96 model (Plaut et al., 1996)
metadata icon
french
Belec (Mousty et al., 1994)
Batterie d'évaluation du langage écrit et de ses troubles.
metadata icon

docs

1_parts_of_words

english
ASL alphabet
A pdf document with the corresponding sign (ASL) for each letter of the alphabet
web iconmetadata icon
A Collection of Letter Oddities and Trivia
Cover topics such as: Vowels; Uncommon double letters, triple letters, quadruple letters; Consecutive consonants; Most frequent appearance of each letter
web iconmetadata icon
ipa
Unicode character sets
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
web iconmetadata icon

2_words

english
Heteronyms
List of heteronyms in the language
web iconmetadata icon
Alan Cooper's Homonyms
List of homonyms (in fact, homophones) in English
web iconmetadata icon
Homophones
List of 439 homophones by Ian Miller
web iconmetadata icon
A Collection of Word Oddities and Trivia
Cover topics such as: Misspelled words; Typewriter words; Beautiful words; Long words; Long words (place names); Long words (chemical names); Plurals; Scrabble words; Short and Long words from Mathematics; Short and Long words from the Bible; Names of people which became words; Last words (alphabetically arranged); French words; interjections; Italian words
web iconmetadata icon
french
Les bases de données lexicales : pourquoi et comment les utiliser en logopédie ? Par Marie-Anne Schelstraete & Christelle Maillart
Notre système de traitement de l'information - le système cognitif - est, de manière générale, très sensible à la fréquence des stimuli qu'il traite, ce que l'on peut interpréter comme une forme d'adaptation particulièrement adéquate à un environnement variable. Dans tous les domaines de la psychologie cognitive, des recherches montrent que des stimuli fréquents, qui ont déjà été souvent rencontrés, sont traités plus rapidement et avec une plus grande exactitude que des stimuli plus rares. [...]
web iconmetadata icon
techniques
Modern Dictionary Making
On-line course, by Dafydd Gibbon, Faculty of Linguistics and Literary Studies (University of Bielefeld), Summer Semester 2003 (Version: July 24, 2003)
web iconmetadata icon
What is a lexical database
Entry of the Glossary of linguistic terms (Loos, Anderson, Day, and Jordan)
web iconmetadata icon
Lexical Database definition (at freedictionary)
Multiple definitions related to lexical databases
web iconmetadata icon

4_running_text

english
historical_changes
Words in English by Suzanne Kemmer (Rice University)
[from the website] This website is a resource for those who want to learn more about this fascinating language [i.e., English] – its history as a language, the origins of its words, and its current modern characteristics.
web iconmetadata icon
The Great Vowel Shift
[from the website] This site is designed for my students--undergraduates with limited linguistic knowledge who are being introduced to the Great Vowel Shift. There are topics I do not discuss in this site because they are too basic, too complicated, or too controversial for this audience.
web iconmetadata icon
The History of English Phonemes
[from the website] This Website is designed to help students of the English language trace the development of the phonemes of English from the Old English period into Present-Day English. The information contained in the site is available in any good textbook on the history of the language, but printed texts normally present the information in a linear fashion corresponding to the chronological development of English. The value of the Website is the hypertextual treatment of the information, which is meant to keep students from having to spend a great deal of time leafing through textbooks.
web iconmetadata icon
Wordorigins.org
[from the website] This site is devoted to the origins of words and phrases, or as a linguist would put it, to etymology. Etymology is the study of word origins. (It is not the study of insects; that is entomology.) Where words come from is a fascinating subject, full of folklore and historical lessons. Often, popular tales of a word's origin arise. Sometimes these are true; more often they are not. While it often seems disappointing when a neat little tale turns out to be untrue, almost invariably the true origin is just as interesting.
web iconmetadata icon
_general
A Collection of Sentence Oddities and Trivia
Cover topics such as: Pangrams, Palindromes, Plurals
web iconmetadata icon
Vivian Cook website
Various informative pages on Writing Systems or Second Language Acquisition. Include a linguistics glossary and extensive bibliography on Second Language Acquisition.
web iconmetadata icon
french
historical_changes
Textes en français historique
NA
web iconmetadata icon
Chantez-vous français?
[Copie du site web] "Ce n'est pas une histoire du chant. Ce n'est pas non plus une histoire du français. C'est une histoire du français chanté. Dès l'origine, le chant constitue une forme de discours à part entière, qui obéit à ses règles propres. L'histoire de ces règles, qui définissent le champ de la déclamation, est ici retracée. Touchant à plusieurs disciplines, cette étude s'adresse en tout premier aux chanteurs pratiquant la musique ancienne, qui peinent à trouver, dans les traités spécialisés, des réponses à leurs questions. Peut-être intéressera-t-elle aussi quelques linguistes que le chant et la musique ne laissent pas indifférents et, qui sait, d'autres esprits curieux."
web iconmetadata icon
Old French On The Web
A Website Devoted to the Language and Literature of Old French
web iconmetadata icon

6_associations

english
English is Tough Stuff
Famous poem Chaos first written by a Dutchman, Dr Gerard Nolst TRENITY in 1920, and often republished and expanded since
web iconmetadata icon

links

0_across_the_board

French Lexical Resources
Lists the databases and tools known to the Manulex team
web iconmetadata icon
Psychology of Language Page of Links (Kreuz)
[From the website]. This page has grown out of my own collection of links to people and resources on the World Wide Web. As such, it undoubtedly reflects my interests and biases to an unhealthy degree. Over time, and with your help, I hope to compile a more comprehensive collection of sources. I particularly need help in adding to the list of psychology of language researchers and labs (sections 2 and 3). Please send me your links, and I will enshrine them on the list.
web iconmetadata icon
LangueFrancaise.net
Groups a large diversity of resources in French
web iconmetadata icon
OLAC - Open Language Archives Community
OLAC collects information about the language resources in multiple archives, making them searchable from a single location.
web iconmetadata icon
Technologies du Langage
Jean Véronis' Blog (Title in French but many posts are in English)
web iconmetadata icon
UK Data archive
[From website] The UK Data Archive (UKDA) is an internationally-renowned centre of expertise in data acquisition, preservation, dissemination and promotion; and is curator of the largest collection of digital data in the social sciences and humanities in the UK. The UKDA provides resource discovery and support for secondary use of quantitative and qualitative data in research, teaching and learning as a lead partner of the Economic and Social Data Service (ESDS). The UKDA houses AHDS History, provides preservation services for other data organisations and facilitates international data exchange.
web iconmetadata icon

1_parts_of_words

english
UCLA Phonetics Lab Data
Index of Languages, Index of Sounds, Map Index, and material relevant to Peter Ladefoged books: A course in phonetics and Vowels and Consonants.
web iconmetadata icon

2_words

across_the_board
Localized Dictionaries for Mozilla Thunderbird
The XPI files are basically just a ZIP format with some special requirements for Mozilla. Unzip a file and you get two files xxxxxx.dic and xxxxxx.aff which are the "standard" myspell format. lingucomponent.openoffice.org and the links from it will tell you how to interpret them.
web iconmetadata icon
Dictionaries list at linguistlist
Links to over 200 dictionaries of specific languages and a collection of multilingual dictionaries, as well as acronym dictionaries, thesauri, and dictionaries of specialized terms. It also includes dictionary projects (e.g. The Euro Wordnet Project).
web iconmetadata icon
chinese
SUBTLEX-CH
Chinese Word frequencies based on film and television subtitles.
web iconmetadata icon
dutch
SUBTLEX-NL
Dutch Word frequencies based on film and television subtitles.
web iconmetadata icon
english
Answers.com
According to the authors "The best definitions and explanations for over one million topics." The matches in dictionaries, wikipedia encyclopedia, or wordnet are all grouped on one screen.
web iconmetadata icon
Urban Dictionary
Dictionary for street slang.
web iconmetadata icon
Your dictionary.com
Various dictionary resources.
web iconmetadata icon
Four Letter Words
This small project is an attempt to give a spacial overview of the entirety of this part of english language heritage, as well as to explore and visualize relations between all four letter words.
web iconmetadata icon
wordcount.org
WordCount™ is an interactive presentation of the 86,800 most frequently used English words..
web iconmetadata icon
SUBTLEX-US
English (US) Word frequencies based on film and television subtitles.
web iconmetadata icon
french
Lexical resources at the LEAD (Universite de Bourgogne)
NA
web iconmetadata icon
Liste de bases de données lexicales à Orthorélie
N/A
web iconmetadata icon
Glossaires de termes
Lexiques, glossaires et dictionnaires spécialisés (très longue liste)
web iconmetadata icon
spanish
Diccionario de la lengua Española
Spanish Dictionary
web iconmetadata icon
Diccionarios.com
N/A
web iconmetadata icon
Nuevo Tesoro Lexicografico de la Lengua Española
N/A
web iconmetadata icon
La Real Academia Española
Royal Academy of Spanish Language
web iconmetadata icon

3_nonwords

across_the_board
Wuggy
A multilingual pseudoword generator.
web iconmetadata icon

4_running_text

across_the_board
corpus-linguistics.de
[From the website] On this webpage you will find an annotated reference system to find everything related to Corpus Linguistics that is available on the Internet: Corpora, Concordances, Corpus Linguistics research efforts and events, software for tagging, annotation etc.
web iconmetadata icon
Devoted to Corpora (Bookmarks for Corpus-based Linguists)
[From the website] These annotated links (c. 1,000 of them) are meant mainly for linguists and language teachers who work with corpora, not computational linguists/NLP (natural language processing) people, so although the language-engineering-type links here are fairly extensive, they are not exhaustive (for such info, you'll have to look elsewhere). Stuff here also represent my personal interests and biases (which will be obvious in some of my descriptive notes) and consequently there may be gaps, errors and omissions which you are welcome to tell me about. The English language bias on these pages will, I hope, be forgiven.
web iconmetadata icon
ELDA (Language Resources Distribution)
[From the website] Our catalogue of language resources currently gathers around 700 spoken and written language resources. It can be accessed from the ELRA web site and from the ELDA web site. The identification and the collection of existing language resources is part of our regular activity. The new resources we have collected, once the catalogue has been updated, are announced on some mailing lists, as well as in the ELRA members' news and in the quarterly ELRA newsletter.
web iconmetadata icon
EURALEX (European Association for Lexicography)
[From the website] EURALEX is the European Association for Lexicography: an international association which was founded in 1983, with the aims of furthering all aspects of the broad field of lexicography, and of promoting the exchange of ideas and information. It is committed to the development of lexicography in all European languages (as well as other non-European languages). EURALEX's interests include dictionaries of all kinds (monolingual, bilingual, and multilingual, general and specialist, in book and in machine-readable form); metalexicography, the theory of lexicography, and the history of lexicography; the praxis of dictionary-making; dictionary use; terminology and terminography; corpus lexicography; computational lexicography and dictionaries for natural language processing; and lexicology in general.
web iconmetadata icon
Linguistic Data Resources on the Internet
A topically organized list of language data resources on the Internet.
web iconmetadata icon
Archives for Language and Machine Learning
N/A
web iconmetadata icon
SIGLEX
[From the website] SIGLEX, a Special Interest Group on the Lexicon of the Association for Computational Linguistics, provides an umbrella for research interests on lexical issues ranging from lexicography and the use of online dictionaries to computational lexical semantics. SIGLEX is also the umbrella organization for SENSEVAL, evaluation exercises for Word Sense Disambiguation.
web iconmetadata icon
english
History of the English Language
List of Links about the English Language and its historical changes.
web iconmetadata icon
french
Une Histoire de la langue française @ Globe-Gate
Collection of nearly 100 links related to French, its dialects and historical changes
web iconmetadata icon

tools

0_across_the_board

Rent a coder
If you need a tool that doesn't seem to exist yet, why not rent a coder to develop it
web iconmetadata icon

1_parts_of_words

Ngrams
Bigram Frequencies
Compute bigram frequencies for a list of words, using a bigram frequency table
metadata icon
Speech tools
Speech Tools enables you to record, store, and analyze language sounds and music, as well as to help you in language learning. This suite of programs consists of (a) IPA Help 2.03; (b) Phonology Assistant 2.1; (c) Speech Analyzer 2.6
web iconmetadata icon

2_words

Latent Semantic Analysis Applications
Latent Semantic Analysis (LSA) captures the essential relationships between text documents and word meaning, or semantics, the knowledge base which must be accessed to evaluate the quality of content. Several educational applications that employ LSA have been developed: (1) selecting the most appropriate text for learners with variable levels of background knowledge, (2) automatically scoring the content of an essay, and (3) helping students effectively summarize material.
web iconmetadata icon
Semantic Space and Probability Models
Scott McDonald has provided a web interface to several semantic space models derived from the British National Corpus (and therefore British English: highly recommended).
web iconmetadata icon
Word Analysis Tools
Include: (a) Interlinear Text editor helps you create text segments comprised of wordform-in context objects. (b) Wordform Inventory editor. (c) Analysis editor, to categorize and gloss a word and its component morphemes for display in the interlinear text and lexical database. (d) Morphology Explorer to help you discover morphemes in your data on the basis of similarity in form or meaning and review other data correlations in the wordform list, your texts, and the lexical database.
web iconmetadata icon

4_running_text

Consortium for Lexical Research, FTP archive
The Consortium for Lexical Research, an archive of research materials accessible via anonymous FTP.
web iconmetadata icon
DFKI ACL Natural Language Software registry
The registry maintains an exhaustive list of tools, with a structured listing and descriptions of available NLP products. It does not overtake any distribution facility.
web iconmetadata icon
Link Grammar
[From authors' webiste]. he Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" representation of a sentence (showing noun phrases, verb phrases, etc.).
web iconmetadata icon
Logiciels par Jean Veronis
Différents logiciels NLP, incluant: Analyseur syntaxique à but pédagogique, Concordancier pour les corpus écrits et oraux, Macro complémentaire Excel avec quelques fonctions statistiques utiles (boîtes à moustaches, Khi2 pour les tableaux de contingences, etc.)
metadata icon

5_visual_material

Psychophysics Toolbox
The Psychophysics Toolbox is a free set of Matlab functions for vision research (Brainard, 1997; Pelli, 1997)
web iconmetadata icon
Vision Egg
Visual stimulus creation and control with open source software (Python Library)
web iconmetadata icon

7_performance_measures

a_expt_design
JONES: Journal of Neurobehavioral Experiments and Stimuli
[from JONES' website] JONES is a new kind of journal that allows scientists to share their experiments with the scientific community in a way that has not been possible before. JONES differs from other methods publications in that what gets published is not just a description of the experiment, but files that fully specify the experimental paradigm and stimuli.
web iconmetadata icon
b_running
Software for running experiments
List of softwares, with comments and resources, in the lexicall wiki
web iconmetadata icon
Software for participants management
List of softwares, with comments and resources, in the lexicall wiki
web iconmetadata icon
Software for Eye Movement Experiments
List of softwares, with comments and resources, in the lexicall wiki
web iconmetadata icon
c_post-processing_filtering
Trim Outliers
Of use to cognitive psychologists who need to filter data from experiments. Let you filter out data outside a minimum and maximum value or outside a n SD (standard deviation) window. Click on the web author icon to access compiled version and full documentation.
web iconmetadata icon
d_summaries
Pivot Table
Reproduce the pivot table function found in excel. A pivot table let you explore your data according to predetermined dimensions: for example, reaction times by word lenght, error rates by word frequency, etc. Four different summary statistics are provided for the distribution in each dimension: mean, median, standard deviation, count, etc.
web iconmetadata icon
e_plotting
b-src: Bioinformatics Packages Source Code Search Engine
b-Src is a bioinformatics packages source code search engine by gonzui. b-Src is maintained by Mitsuteru Nakao and b-Src Team in behalf of CBRC Sequence Analysis Team. Please let us know if you have recommended bioinformatics packages to be indexed.
web iconmetadata icon
GraphViz (Graph Visualization Software)

Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains.

Graphviz is open source graph visualization software. It has several main graph layout programs. See the gallery for some sample layouts. It also has web and interactive graphical interfaces, and auxiliary tools, libraries, and language bindings.

Compiled version for Windows, Mac OSX, Linux

web iconmetadata icon
Orange Widgets (Data Mining)
Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets. Compiled versions exist for Windows and Macintosh.
web iconmetadata icon
f_statistical_analyses
stats R
R is an open source (that is free) software for statistical analysis and data visualisation. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
web iconmetadata icon
Software for Statistical Analysis
List of softwares, with comments and resources, in the lexicall wiki
web iconmetadata icon
Can't open socket