Resources in our repository exist in different languages, for different types of materials, as well as for different categories of material.

Thanks to the use of Standards for the storage and description of these resources, they can all be listed from a single access point, as demonstrated here.

data

repository

data

2_words

dutch
AoA norms for Dutch, De Moor et al. (2000)
Age-of-acquisition ratings on 2816 Dutch four- and five-letter nouns. These norms were collected by asking 559 undergraduates to indicate for a set of words at which age they thought they had learned them.
web iconmetadata icon
Validated AoA norms for Dutch, Ghyselinck et al. (2000a)
Alphabetical listing of 254 validated words together with their rated AoA, logarithmic frequency and the evaluation of the three judges.
web iconmetadata icon
Validated AoA norms for Dutch, Ghyselinck et al. (2000b)
Alphabetical listing of the 410 validated words together with their rated AoA and the percentage of children that correctly indicated the meaning of the word.
web iconmetadata icon
AoA norms for Dutch, Ghyselinck et al. (2003)
These norms were collected by asking 142 participants to indicate for a set of 389 or 388 words at which age they thought they had learned them.
web iconmetadata icon
Norms on emotional valence and concreteness, Van der Goten et al. (1999)
Norms are provided for 1- and 3-syllable words
web iconmetadata icon
Celex database for Dutch. Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
web iconmetadata icon
Woorden in het basisonderwijs, Schrooten & Vermeer (1994)
15.000 woorden aangeboden aan leerlingen
web iconmetadata icon
Translations Norms for Dutch-English Translation Pairs, Tokowicz et al. (2002)
Dutch-English Number of Translations, Form Similarity, and Semantic Similarity Norms
web iconmetadata icon
english
AoA and imagery measures, Bird et al. (2001)
Age_of_acquisition, imageability, and frequency measures for 2,694 words
web iconmetadata icon
AoA and imagery measures, Gilhooly & Logie (1980)
Age_of_acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words
metadata icon
Celex Database for English, Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
web iconmetadata icon
English Lexicon Project (ELP, Balota et al. 2002)

Adapted from website). The English Lexicon Project (ELP) is an ongoing project. Its goal is to collect normative data for speeded naming and lexical decision for over 40,000 words across 1200 subjects at 6 different universities and to integrate these data into a database along with descriptive characteristics of the words used in the study.

As for now, the English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.

The naming and lexical decision data are currently being collected from six testing Universities. To date, we have collected 2,752,698 reaction time measurements from 816 subjects in the lexical decision experiment. We have also collected 1,125,880 experimental measurements from 444 subjects in the naming experiment.

Researchers interested in psycholinguistics, human memory, computational modeling, and other fields will find these data useful. For example, researchers will be better equipped to select stimuli, test theories, and reduce potential confounds in their studies.

web iconmetadata icon
Brett Kessler publications, programs, and datasets
NA.
web iconmetadata icon
Lexical FreeNet :: Connected Thesaurus
[As on authors' website]This program allows you to search for relationships between words, concepts, and people. It is a combination thesaurus, rhyming dictionary, pun generator, and concept navigator. Use it to find words that fit the needs of whatever writing endeavor you've undertaken, or just to browse concept space. To use the system, enter one or two words into the boxes at the top of the page, select a function to perform, optionally select some word relations to allow, and click Submit Query! Here is a description of the seven functions that are available.
web iconmetadata icon
MRC database (Coltheart et al.)
MRC Psycholinguistic Database containing over 150000 words with up to 26 linguistic and psycholinguistic attributes for each (e.g. pronunciation, part of speech, word...). Among these attributes, age-of-acquisition and imagery statistics from Gilhooly & Logie (1980) and Word frequency list from Kucera & Francis (1967)
web iconmetadata icon
Wordmine.org
[As on authors' website] Wordmine is a culmination of several years of computational analyses of various word-level constructs. It is meant to act as a psycho-linguistic resource similar to the MRC database and is available to all researchers free of charge. This resource, the MRC database and RT values available from Balota and his colleagues can be used in combination to pre-test assumptions regarding variables of interest. We encourage students to use these sites and data so that they can run "experiments" to test assumptions about the way the word recognition system operates. In exchange for the data available here we ask that you acknowledge this site in your presentations or publications.
web iconmetadata icon
WordNet
[As on authors' website] WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
web iconmetadata icon
Word Frequencies in Written and Spoken English, based on the British National Corpus, Leech et al. (2001)
[From authors' website] Book with frequency statistics derived from the British National Corpus - a 100,000,000 word electronic databank sampled from the whole range of present-day English, spoken and written - and makes use of the grammatical information that has been added to each word in the corpus. Includes frequencies for present-day speech (including everyday conversation) as well as for writing: (a) Rank-ordered and alphabetical frequency lists for the whole corpus and for various subdivisions: e.g. informative vs. imaginative writing, conversational vs. other varieties of speech. (b) Entries take account of grammatical parts of speech (e.g. round as a preposition is listed separately from round as an adjective). (c) Includes discussions of a number of thematic frequency lists such as colour terms, female vs. male terms, etc
web iconmetadata icon
Kurcera & Francis (1967)
Kurcera and Francis Frequency values
metadata icon
Children's Printed Word Database, Stuart et al. (1993-1996)
ESRC-funded project to develop a database of printed word frequencies as read by children aged between 5 & 9.
web iconmetadata icon
Children's - Stuart et al. (2003)
Database of children's early reading vocabulary, for use by researchers and teacher, with up-to-date word frequency list of early print exposure in the UK
web iconmetadata icon
Zeno et al. (1995) - Educator's word frequency guide
Quantitative summary of the printed vocabulary encountered by students in American schools, with separate word frequency counts were conducted on materials for each grade (grade 1 through college)
web iconmetadata icon
Crawford et al. (2004)
Corpus of gender-related and neutral words
web iconmetadata icon
Cortese & Fugett (2004)
Imageability ratings for 3,000 monosyllabic words
web iconmetadata icon
Norms, Clark & Paivio (2004a)
Expanded Norms for Original 925 Paivio, Yuille, and Madigan (1968) Items.
web iconmetadata icon
Norms, Clark & Paivio (2004b)
Expanded Norms for 2,311 Items
web iconmetadata icon
Maki et al. (2004)
Semantic distance norms computed from an electronic dictionary (WordNet).
web iconmetadata icon
Gahl et al. (2004)
Verb subcategorization frequencies (American English)
web iconmetadata icon
french
Objective AoA norms, Chalard et al. (2005)
Alphabetical listing of the 230 words and their scores on objective AoA.
web iconmetadata icon
BDLex (de Calmès & Pérennou, 1998)

BDLEX consists of a lexical database developed within the French GDR-PRC CHM at IRIT (IMH-PT team), Paul Sabatier University, Toulouse. The data cover lexical, phonological, and morphological information.

The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical words) with the following attributes: spelling, pronunciation, morphosyntactic features (part of speech, agreements,...), the canonical word spelling and a frequency indicator.

Moreover the lexical resources include the version BDLex-syll which specifies the syllabic division in the field pronunciation.

web iconmetadata icon
BruLex, Content et al. (1990)
[As on server] The Brulex directory includes the different versions fo the BRULEX database. BRULEX was developed about ten years ago, for the purpose of facilitating selection and control of experimental materials in psycholinguistic experiments on lexical processing. It contains a large number of informations that were typed manually by benevolent collaborators. Potential users should be aware that the current version, which hasn't been changed since '89, contains a number of transcription errors and inaccuracies.
web iconmetadata icon
Dicouèbe, Dictionnaire en ligne de combinatoire du Français

(from webpage):Le DiCo (acronyme pour dictionnaire de combinatoire) est une base de données lexicales du français, développée depuis plusieurs années à l'OLST par Igor Mel'čuk et Alain Polguère. La finalité première de cette base est de décrire chaque lexie apparaissant dans la nomenclature du DiCo selon deux axes : les dérivations sémantiques (relations sémantiques fortes) qui la lient à d'autres lexies de la langue et les collocations (expressions semi-idiomatiques) qu'elle contrôle. Cette description s'accompagne d'une modélisation des structures syntaxiques régies par la lexie et d'une modélisation de son sens, sous forme d'étiquetage sémantique.

web iconmetadata icon
Lexique, New (2001, 2004)
Lexique gives a lot of information (frequency, neighbours, phonology, lemma, etc.) concerning 137.000 words in French based on large corpora (35 millions of words). This database is regularly updated and the the 3rd version has been released in 2005. This third version brings a lot of new features as:
- written and estimated spoken frequency
- frequency of any character string
- recent words
- etc. Open Lexique is a project that allow people to query simultaneously several french databases. Other ressources have been developped such as a free text corpus, a first name database, an anagram database, etc.
web iconmetadata icon
LexOP. Peereman & Content (1998)
[As on the authors' server] The Lexop directory includes the different versions of the LEXOP database. LEXOP includes about 2,500 monosyllabic French words, and providesa large number of statistics related to orthography to phonology and phonology to orthography mappings. [see the Lexop/00readme file for more details about the contents and distribution]
web iconmetadata icon
MANULEX (Lété, Sprenger-Charolles, Colé)
A Grade-Level Lexical Database from French Elementary-School Readers

MANULEX provides grade-level word-frequency lists of non-lemmatized and lemmatized words (48,886 and 23,812 entries, respectively) computed from the 1.9 million words taken from 54 French elementary-school readers. Word frequencies are provided for four levels: 1st grade (G1), 2nd grade (G2), 3rd to 5th grades (G3-5), and all grades (G1-5). The frequencies were computed following the methods described by Carroll et al. (1971) and Zeno et al. (1995) with four statistics at each level (F: overall word frequency, D: index of dispersion across the selected readers, U: estimated frequency per million words, and SFI: Standard Frequency Index). The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally, it may serve an educational purpose by providing basic vocabulary lists.

web iconmetadata icon
NovLex 1, Lambert & Chesnet (2001)

(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.

Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).

De ce corpus nous avons extraits 9300 racines lexicales (Base Lexicale) distinctes (déterminées à l'aide du dictionnaire Larousse).

web iconmetadata icon
NovLex 2, Lambert & Chesnet (2001)

(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.

Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).

De ce corpus nous avons extraits 20 600 entrées orthographiquement différentes (Base d'occurrences). Dans la Base d'Occurences, toutes les formes orthographiques sont considérées comme des entrées séparées (e.g. "cheveu" et "cheveux" sont deux entrées distinctes).

web iconmetadata icon
Omnilex
Base de Données Informatisée sur le Lexique du Français Contemporain
web iconmetadata icon
Vocolex, Dufour et al. (2002)
Paper Abstract: Several studies on auditory word recognition indicate that word processing is influenced by the phonological similarity with other words. We describe a lexical database, VoColex, which provides several statistical indexes of phonological similarity between French words. Phonological similarity is computed according to two distinct principles. According to the first principle, phonologically similar words share initial phonemes with the target word. According to the second principle, phonological neighbours correspond to any words which can be derived from the target by a single phoneme change (substitution, addition, or deletion) whatever the position of the modified phoneme. The statistical data provided by VoCoLex should allow the control and the empirical manipulation of various measures of phonological similarity, as well as quantitative descriptions of the auditory lexicon.
web iconmetadata icon
Dicos à ABU
[From website]. La vertu des listes de mots que vous trouverez dans ces pages n'est pas d'offrir aux bibliophiles que vous êtes la possibilité de développer des outils professionnels. Elles sont en effet loin d'être complètes et sans erreur.

Il y a pour l'instant quatre listes : une liste de mots communs (+300000 mots), une liste de prénoms (12437 prénoms), une liste de nom de cités française (39076 noms), une liste de nom de pays (170 pays), une liste de difficultés de la langue (1500 mots).

Et également un dictionnaire:" Les Excentricités du Langage" de Lorédan Larchey (version hypertexte). Une perle que nous vous recommandons !
web iconmetadata icon
XMLittré
[From website]. Ce site propose une version interrogeable en ligne du dictionnaire de la langue française d'Émile Littré.

Cet ouvrage a été publié à partir de 1863, puis dans sa deuxième édition en 1872-1877.
web iconmetadata icon
Cordier & Le Ny (2004)
Values of Experiential Frequency, Degree of Knowledge and Rated Familiarity for French Words
web iconmetadata icon
ARTFL- Word Frequency Search Form
Word Frequency information. TLF. Trésor de la Langue française. Imbs (1971).
web iconmetadata icon
Morphalou
Le lexique Morphalou est un lexique ouvert des formes fléchies du français. Les données initiales de Morphalou proviennent du TLFnome, la nomenclature du Trésor de la Langue Française qui a fourni 539.413 formes fléchies, appartenant à 68.075 lemmes. Le transfert du TLFnome vers Morphalou s'est fait par une réorganisation structurelle des données et une normalisation des étiquettes grammaticales, sans perte d'informations linguistiques. Le lexique résultant est un lexique à large couverture (~540.000 formes fléchies), linguistiquement valide (sous la responsabilité d'un comité éditorial) et formellement en accord avec les propositions de normalisation pour les ressources lexicales du TAL à l'ISO (TC37/SC4). Il est en accès libre à des fins de recherche et d'enseignement. Le maintien et la mise à jour du lexique sont assurés par l'ATILF.
web iconmetadata icon
Dictionnaire des synonymes
[From website]. Ce dictionnaire des synonymes contient approximativement 49 000 entrées et 396 000 relations synonymiques . La base de départ est constituée de sept dictionnaires classiques (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse et Robert) dont ont été extraites les relations synonymiques ; ce premier travail, effectué par l'Institut National de la Langue Française (INaLF) a produit une série de fichiers ; les données de ceux-ci ont été regroupées et homogénéisées au sein du laboratoire CRISCO (ELSAP à l'époque). Enfin, nous avons complété cette procédure par un important travail de correction (par adjonction ou suppression de liens synonymiques) sur le fichier final. Ce projet a démarré sous la responsabilité de Sabine PLOUX, qui a défini les principes de fonctionnement de ce dictionnaire. Depuis 1998, Jean-Luc MANGUIN en est le responsable ; il en a assuré la mise en oeuvre sur Internet et la confection de l'interface d'interrogation en mode texte. Les développements actuels résultent d'un projet qui a mis en collaboration le CRISCO (Caen) et l'entreprise Memodata (Caen). Ce projet a été retenu par le Comité Régional pour l'Imagerie et les Technologies de l'Information et de la Communication.
web iconmetadata icon
spanish
Corpus Diacrónico del Español (CORDE)
Diacronic Corpus for the Spanish Language. [With interactive search facilities]
web iconmetadata icon
Corpus de Referencia del Español Actual (CREA)
[...] describe las posibilidades del programa informático de consulta del Banco de Datos del Español de la Real Academia Española. Se trata de un texto para personas sin conocimientos específicos de la materia, en el que se proporcionan las nociones básicas para la consulta interactiva del mayor recurso léxico -más de 200 millones de palabras- disponible para el idioma español. [With interactive search facilities.]
web iconmetadata icon
Diccionario de la Universidad de Oviedo
Diccionario de Antónimos, Diccionario de Sinónimos, Conjugador de Verbos, Términos Relacionados.
web iconmetadata icon
Izura et al. (2004)
Category norms for 500 Spanish words in 5 semantic categories
web iconmetadata icon
Banco de datos del Español
Nómina de autores y obras. [With interactive search facilities.]
metadata icon

docs

2_words

english
Heteronyms
List of heteronyms in the language
web iconmetadata icon
Alan Cooper's Homonyms
List of homonyms (in fact, homophones) in English
web iconmetadata icon
Homophones
List of 439 homophones by Ian Miller
web iconmetadata icon
A Collection of Word Oddities and Trivia
Cover topics such as: Misspelled words; Typewriter words; Beautiful words; Long words; Long words (place names); Long words (chemical names); Plurals; Scrabble words; Short and Long words from Mathematics; Short and Long words from the Bible; Names of people which became words; Last words (alphabetically arranged); French words; interjections; Italian words
web iconmetadata icon
french
Les bases de données lexicales : pourquoi et comment les utiliser en logopédie ? Par Marie-Anne Schelstraete & Christelle Maillart
Notre système de traitement de l'information - le système cognitif - est, de manière générale, très sensible à la fréquence des stimuli qu'il traite, ce que l'on peut interpréter comme une forme d'adaptation particulièrement adéquate à un environnement variable. Dans tous les domaines de la psychologie cognitive, des recherches montrent que des stimuli fréquents, qui ont déjà été souvent rencontrés, sont traités plus rapidement et avec une plus grande exactitude que des stimuli plus rares. [...]
web iconmetadata icon
techniques
Modern Dictionary Making
On-line course, by Dafydd Gibbon, Faculty of Linguistics and Literary Studies (University of Bielefeld), Summer Semester 2003 (Version: July 24, 2003)
web iconmetadata icon
What is a lexical database
Entry of the Glossary of linguistic terms (Loos, Anderson, Day, and Jordan)
web iconmetadata icon
Lexical Database definition (at freedictionary)
Multiple definitions related to lexical databases
web iconmetadata icon

links

2_words

across_the_board
Localized Dictionaries for Mozilla Thunderbird
The XPI files are basically just a ZIP format with some special requirements for Mozilla. Unzip a file and you get two files xxxxxx.dic and xxxxxx.aff which are the "standard" myspell format. lingucomponent.openoffice.org and the links from it will tell you how to interpret them.
web iconmetadata icon
Dictionaries list at linguistlist
Links to over 200 dictionaries of specific languages and a collection of multilingual dictionaries, as well as acronym dictionaries, thesauri, and dictionaries of specialized terms. It also includes dictionary projects (e.g. The Euro Wordnet Project).
web iconmetadata icon
chinese
SUBTLEX-CH
Chinese Word frequencies based on film and television subtitles.
web iconmetadata icon
dutch
SUBTLEX-NL
Dutch Word frequencies based on film and television subtitles.
web iconmetadata icon
english
Answers.com
According to the authors "The best definitions and explanations for over one million topics." The matches in dictionaries, wikipedia encyclopedia, or wordnet are all grouped on one screen.
web iconmetadata icon
Urban Dictionary
Dictionary for street slang.
web iconmetadata icon
Your dictionary.com
Various dictionary resources.
web iconmetadata icon
Four Letter Words
This small project is an attempt to give a spacial overview of the entirety of this part of english language heritage, as well as to explore and visualize relations between all four letter words.
web iconmetadata icon
wordcount.org
WordCount™ is an interactive presentation of the 86,800 most frequently used English words..
web iconmetadata icon
SUBTLEX-US
English (US) Word frequencies based on film and television subtitles.
web iconmetadata icon
french
Lexical resources at the LEAD (Universite de Bourgogne)
NA
web iconmetadata icon
Liste de bases de données lexicales à Orthorélie
N/A
web iconmetadata icon
Glossaires de termes
Lexiques, glossaires et dictionnaires spécialisés (très longue liste)
web iconmetadata icon
spanish
Diccionario de la lengua Española
Spanish Dictionary
web iconmetadata icon
Diccionarios.com
N/A
web iconmetadata icon
Nuevo Tesoro Lexicografico de la Lengua Española
N/A
web iconmetadata icon
La Real Academia Española
Royal Academy of Spanish Language
web iconmetadata icon

Warning: fsockopen(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/mlange/lexicall.widged.com/_admin/inc/output.inc(1) : eval()'d code on line 7

Warning: fsockopen(): unable to connect to kimanovs.com:80 (php_network_getaddresses: getaddrinfo failed: Name or service not known) in /home/mlange/lexicall.widged.com/_admin/inc/output.inc(1) : eval()'d code on line 7
Can't open socket