data
repository
data
1_parts_of_words
english
Subsyllabic similarity, De Cara & Goswami (2002)
Statistical Analysis of Similarity Relations among Spoken Words: Evidence for the Special Status of Rimes in English
Letter sequences statistics, Hirata & Bryden (1971)
Letter sequences varying in order of approximation to English
The sounds of English
In this site thousands of English words have been painstakingly grouped according to their sounds and their spellings making the patterns obvious. This is the most logical and systematic method to learn English. It doesn't rely on rules to teach reading and spelling; instead, repeated exposure to a sound/letter pattern allows your brain to recognize the pattern intuitively and internalize it.
2_words
dutch
AoA norms for Dutch, De Moor et al. (2000)
Age-of-acquisition ratings on 2816 Dutch four- and five-letter nouns. These norms were collected by asking 559 undergraduates to indicate for a set of words at which age they thought they had learned them.
Validated AoA norms for Dutch, Ghyselinck et al. (2000a)
Alphabetical listing of 254 validated words together with their rated AoA, logarithmic frequency and the evaluation of the three judges.
Validated AoA norms for Dutch, Ghyselinck et al. (2000b)
Alphabetical listing of the 410 validated words together with their rated AoA and the percentage of children that correctly indicated the meaning of the word.
AoA norms for Dutch, Ghyselinck et al. (2003)
These norms were collected by asking 142 participants to indicate for a set of 389 or 388 words at which age they thought they had learned them.
Norms on emotional valence and concreteness, Van der Goten et al. (1999)
Norms are provided for 1- and 3-syllable words
Celex database for Dutch. Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
Woorden in het basisonderwijs, Schrooten & Vermeer (1994)
15.000 woorden aangeboden aan leerlingen
Translations Norms for Dutch-English Translation Pairs, Tokowicz et al. (2002)
Dutch-English Number of Translations, Form Similarity, and Semantic Similarity Norms
english
AoA and imagery measures, Bird et al. (2001)
Age_of_acquisition, imageability, and frequency measures for 2,694 words
AoA and imagery measures, Gilhooly & Logie (1980)
Age_of_acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words
Celex Database for English, Baayen et al. (1993)
Extensive database with both lemma and wordform statistics
English Lexicon Project (ELP, Balota et al. 2002)
Adapted from website). The English Lexicon Project (ELP) is an ongoing project. Its goal is to collect normative data for speeded naming and lexical decision for over 40,000 words across 1200 subjects at 6 different universities and to integrate these data into a database along with descriptive characteristics of the words used in the study.
As for now, the English Lexicon Project (supported by the National Science Foundation) affords access to a large set of lexical characteristics, along with behavioral data from visual lexical decision and naming studies of 40,481 words and 40,481 nonwords.
The naming and lexical decision data are currently being collected from six testing Universities. To date, we have collected 2,752,698 reaction time measurements from 816 subjects in the lexical decision experiment. We have also collected 1,125,880 experimental measurements from 444 subjects in the naming experiment.
Researchers interested in psycholinguistics, human memory, computational modeling, and other fields will find these data useful. For example, researchers will be better equipped to select stimuli, test theories, and reduce potential confounds in their studies.


Brett Kessler publications, programs, and datasets
NA.
Lexical FreeNet :: Connected Thesaurus
[As on authors' website]This program allows you to search for relationships between words, concepts, and people. It is a combination thesaurus, rhyming dictionary, pun generator, and concept navigator. Use it to find words that fit the needs of whatever writing endeavor you've undertaken, or just to browse concept space. To use the system, enter one or two words into the boxes at the top of the page, select a function to perform, optionally select some word relations to allow, and click Submit Query! Here is a description of the seven functions that are available.
MRC database (Coltheart et al.)
MRC Psycholinguistic Database containing over 150000 words with up to 26 linguistic and psycholinguistic attributes for each (e.g. pronunciation, part of speech, word...). Among these attributes, age-of-acquisition and imagery statistics from Gilhooly & Logie (1980) and Word frequency list from Kucera & Francis (1967)
Wordmine.org
[As on authors' website] Wordmine is a culmination of several years of computational analyses of various word-level constructs. It is meant to act as a psycho-linguistic resource similar to the MRC database and is available to all researchers free of charge.
This resource, the MRC database and RT values available from Balota and his colleagues can be used in combination to pre-test assumptions regarding variables of interest. We encourage students to use these sites and data so that they can run "experiments" to test assumptions about the way the word recognition system operates. In exchange for the data available here we ask that you acknowledge this site in your presentations or publications.
WordNet
[As on authors' website] WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Word Frequencies in Written and Spoken English, based on the British National Corpus, Leech et al. (2001)
[From authors' website] Book with frequency statistics derived from the British National Corpus - a 100,000,000 word electronic databank sampled from the whole range of present-day English, spoken and written - and makes use of the grammatical information that has been added to each word in the corpus. Includes frequencies for present-day speech (including everyday conversation) as well as for writing: (a) Rank-ordered and alphabetical frequency lists for the whole corpus and for various subdivisions: e.g. informative vs. imaginative writing, conversational vs. other varieties of speech. (b) Entries take account of grammatical parts of speech (e.g. round as a preposition is listed separately from round as an adjective). (c) Includes discussions of a number of thematic frequency lists such as colour terms, female vs. male terms, etc
Kurcera & Francis (1967)
Kurcera and Francis Frequency values
Children's Printed Word Database, Stuart et al. (1993-1996)
ESRC-funded project to develop a database of printed word frequencies as read by children aged between 5 & 9.
Children's - Stuart et al. (2003)
Database of children's early reading vocabulary, for use by researchers and teacher, with up-to-date word frequency list of early print exposure in the UK
Zeno et al. (1995) - Educator's word frequency guide
Quantitative summary of the printed vocabulary encountered by students in American schools,
with separate word frequency counts were conducted on materials for each grade (grade 1 through college)
Crawford et al. (2004)
Corpus of gender-related and neutral words
Cortese & Fugett (2004)
Imageability ratings for 3,000 monosyllabic words
Norms, Clark & Paivio (2004a)
Expanded Norms for Original 925 Paivio, Yuille, and Madigan (1968) Items.
Norms, Clark & Paivio (2004b)
Expanded Norms for 2,311 Items
Maki et al. (2004)
Semantic distance norms computed from an electronic dictionary (WordNet).
Gahl et al. (2004)
Verb subcategorization frequencies (American English)
french
Objective AoA norms, Chalard et al. (2005)
Alphabetical listing of the 230 words and their scores on objective AoA.
BDLex (de Calmès & Pérennou, 1998)
BDLEX consists of a lexical database developed within the French GDR-PRC CHM at IRIT (IMH-PT team), Paul Sabatier University, Toulouse. The data cover lexical, phonological, and morphological information.
The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical The database BDLEX consists of about 440,000 inflected forms (generated from about 50,000 canonical words) with the following attributes: spelling, pronunciation, morphosyntactic features (part of speech, agreements,...), the canonical word spelling and a frequency indicator.
Moreover the lexical resources include the version BDLex-syll which specifies the syllabic division in the field pronunciation.
BruLex, Content et al. (1990)
[As on server] The Brulex directory includes the different versions fo the BRULEX database. BRULEX was developed about ten years ago, for the purpose of facilitating selection and control of experimental materials in psycholinguistic experiments on lexical processing. It contains a large number of informations that were typed manually by benevolent collaborators. Potential users should be aware that the current version, which hasn't been changed since '89, contains a number of transcription errors and inaccuracies.
Dicouèbe, Dictionnaire en ligne de combinatoire du Français
(from webpage):Le DiCo (acronyme pour dictionnaire de combinatoire) est une base de données lexicales du français, développée depuis plusieurs années à l'OLST par Igor Mel'čuk et Alain Polguère. La finalité première de cette base est de décrire chaque lexie apparaissant dans la nomenclature du DiCo selon deux axes : les dérivations sémantiques (relations sémantiques fortes) qui la lient à d'autres lexies de la langue et les collocations (expressions semi-idiomatiques) qu'elle contrôle. Cette description s'accompagne d'une modélisation des structures syntaxiques régies par la lexie et d'une modélisation de son sens, sous forme d'étiquetage sémantique.
Lexique, New (2001, 2004)
Lexique gives a lot of information (frequency, neighbours, phonology, lemma, etc.) concerning 137.000 words in French based on large corpora (35 millions of words). This database is regularly updated and the the 3rd version has been released in 2005. This third version brings a lot of new features as:
- written and estimated spoken frequency
- frequency of any character string
- recent words
- etc.
Open Lexique is a project that allow people to query simultaneously several french databases.
Other ressources have been developped such as a free text corpus, a first name database, an anagram database, etc.
LexOP. Peereman & Content (1998)
[As on the authors' server] The Lexop directory includes the different versions of the LEXOP database. LEXOP includes about 2,500 monosyllabic French words, and providesa large number of statistics related to orthography to phonology and phonology to orthography mappings. [see the Lexop/00readme file for more details about the contents and distribution]
MANULEX (Lété, Sprenger-Charolles, Colé)
A Grade-Level Lexical Database from French Elementary-School Readers
MANULEX provides grade-level word-frequency lists of non-lemmatized and lemmatized words (48,886 and 23,812 entries, respectively) computed from the 1.9 million words taken from 54 French elementary-school readers. Word frequencies are provided for four levels: 1st grade (G1), 2nd grade (G2), 3rd to 5th grades (G3-5), and all grades (G1-5). The frequencies were computed following the methods described by Carroll et al. (1971) and Zeno et al. (1995) with four statistics at each level (F: overall word frequency, D: index of dispersion across the selected readers, U: estimated frequency per million words, and SFI: Standard Frequency Index). The database also provides the number of letters in the word and syntactic category information. MANULEX is intended to be a useful tool for studying language development through the selection of stimuli based on precise frequency norms. Researchers in artificial intelligence can also use it as a source of information on natural language processing to simulate written language acquisition in children. Finally, it may serve an educational purpose by providing basic vocabulary lists.


NovLex 1, Lambert & Chesnet (2001)
(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.
Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).
De ce corpus nous avons extraits 9300 racines lexicales (Base Lexicale) distinctes (déterminées à l'aide du dictionnaire Larousse).
NovLex 2, Lambert & Chesnet (2001)
(from webpage):La base de données lexicales NOVLEX est un outil permettant d'estimer l'étendue et la fréquence lexicale du vocabulaire écrit adressé à des élèves francophones de l'enseignement primaire.
Elle a été constituée grâce à l'analyse de livres scolaires et extra-scolaires destinés à des élèves de CE2 (8-9 ans). NOVLEX est construit à partir d'un corpus d'à peu près 417 000 mots, ne comprenant ni noms propres, ni prénoms, ni noms de ville, ni onomatopées et ramenés en minuscule ("Un", "UN" et "un" sont une même entrée).
De ce corpus nous avons extraits 20 600 entrées orthographiquement différentes (Base d'occurrences). Dans la Base d'Occurences, toutes les formes orthographiques sont considérées comme des entrées séparées (e.g. "cheveu" et "cheveux" sont deux entrées distinctes).
Omnilex
Base de Données Informatisée sur le Lexique du Français Contemporain
Vocolex, Dufour et al. (2002)
Paper Abstract: Several studies on auditory word recognition indicate that word
processing is influenced by the phonological similarity with other words. We describe
a lexical database, VoColex, which provides several statistical indexes of phonological
similarity between French words. Phonological similarity is computed according to two
distinct principles. According to the first principle, phonologically similar words
share initial phonemes with the target word. According to the second principle,
phonological neighbours correspond to any words which can be derived from the target
by a single phoneme change (substitution, addition, or deletion) whatever the position
of the modified phoneme. The statistical data provided by VoCoLex should allow the
control and the empirical manipulation of various measures of phonological similarity,
as well as quantitative descriptions of the auditory lexicon.
Dicos à ABU
[From website].
La vertu des listes de mots que vous trouverez dans ces pages n'est pas d'offrir aux bibliophiles que vous êtes la possibilité de développer des outils professionnels. Elles sont en effet loin d'être complètes et sans erreur.
Il y a pour l'instant quatre listes : une liste de mots communs (+300000 mots), une liste de prénoms (12437 prénoms), une liste de nom de cités française (39076 noms), une liste de nom de pays (170 pays), une liste de difficultés de la langue (1500 mots).
Et également un dictionnaire:" Les Excentricités du Langage" de Lorédan Larchey (version hypertexte). Une perle que nous vous recommandons !
XMLittré
[From website]. Ce site propose une version interrogeable en ligne du dictionnaire de la langue française d'Émile Littré.
Cet ouvrage a été publié à partir de 1863, puis dans sa deuxième édition en 1872-1877.
Cordier & Le Ny (2004)
Values of Experiential Frequency, Degree of Knowledge and Rated Familiarity for French Words
ARTFL- Word Frequency Search Form
Word Frequency information. TLF. Trésor de la Langue française. Imbs (1971).
Morphalou
Le lexique Morphalou est un lexique ouvert des formes fléchies du français. Les données initiales de Morphalou proviennent du TLFnome, la nomenclature du Trésor de la Langue Française qui a fourni 539.413 formes fléchies, appartenant à 68.075 lemmes. Le transfert du TLFnome vers Morphalou s'est fait par une réorganisation structurelle des données et une normalisation des étiquettes grammaticales, sans perte d'informations linguistiques. Le lexique résultant est un lexique à large couverture (~540.000 formes fléchies), linguistiquement valide (sous la responsabilité d'un comité éditorial) et formellement en accord avec les propositions de normalisation pour les ressources lexicales du TAL à l'ISO (TC37/SC4). Il est en accès libre à des fins de recherche et d'enseignement. Le maintien et la mise à jour du lexique sont assurés par l'ATILF.
Dictionnaire des synonymes
[From website].
Ce dictionnaire des synonymes contient approximativement 49 000 entrées et 396 000 relations synonymiques . La base de départ est constituée de sept dictionnaires classiques (Bailly, Benac, Du Chazaud, Guizot, Lafaye, Larousse et Robert) dont ont été extraites les relations synonymiques ; ce premier travail, effectué par l'Institut National de la Langue Française (INaLF) a produit une série de fichiers ; les données de ceux-ci ont été regroupées et homogénéisées au sein du laboratoire CRISCO (ELSAP à l'époque). Enfin, nous avons complété cette procédure par un important travail de correction (par adjonction ou suppression de liens synonymiques) sur le fichier final.
Ce projet a démarré sous la responsabilité de Sabine PLOUX, qui a défini les principes de fonctionnement de ce dictionnaire. Depuis 1998, Jean-Luc MANGUIN en est le responsable ; il en a assuré la mise en oeuvre sur Internet et la confection de l'interface d'interrogation en mode texte. Les développements actuels résultent d'un projet qui a mis en collaboration le CRISCO (Caen) et l'entreprise Memodata (Caen). Ce projet a été retenu par le Comité Régional pour l'Imagerie et les Technologies de l'Information et de la Communication.


spanish
Corpus Diacrónico del Español (CORDE)
Diacronic Corpus for the Spanish Language. [With interactive search facilities]
Corpus de Referencia del Español Actual (CREA)
[...] describe las posibilidades del programa informático de consulta del Banco de Datos del Español de la Real Academia Española. Se trata de un texto para personas sin conocimientos específicos de la materia, en el que se proporcionan las nociones básicas para la consulta interactiva del mayor recurso léxico -más de 200 millones de palabras- disponible para el idioma español. [With interactive search facilities.]
Diccionario de la Universidad de Oviedo
Diccionario de Antónimos, Diccionario de Sinónimos, Conjugador de Verbos, Términos Relacionados.
Izura et al. (2004)
Category norms for 500 Spanish words in 5 semantic categories
Banco de datos del Español
Nómina de autores y obras. [With interactive search facilities.]
3_nonwords
english
ARC nonword database, Rastle et al. (2002)
Database of 358,534 nonwords
french
Cordier & Le Ny (2004)
Values of Experiential Frequency, Degree of Knowledge and Rated Familiarity for French Pseudowords
4_running_text
across_the_board
WordTheque
[From webiste]. The Wordtheque is a powerful interface with a massive database (currently 707.737.941 words) containing multilingual novels, technical literature and translated texts. Hits are highlighted in context windows that can be expanded up or down. To go to the source web pages (novels, etc.)
english
Childes, MacWhinney
[From authors' webiste]. The CHILDES system provides tools for studying conversational interactions. These tools include a database of transcripts, programs for computer analysis of transcripts, methods for linguistic coding,and systems for linking transcripts to digitized audio and video.
french
GrosMots.com
[From webiste].
Nous nous proposons de : rassembler des corpus de textes français libres de droits;
stucturer les textes en posant des balises pour délimiter les différentes parties de chaque ouvrage;
maintenir une page de liens vers les articles sur internet à propos de ces ouvrages;
monitorer des forums de questions pointant sur les ouvrages, les auteurs et leurs contextes;
monitorer un service d'annonces d'échange et vente d'éditions diverses des ouvrages;
introduire Progsession pour un travail en groupe.
Nous espérons terminer en 2007 un premier plan portant sur 3000 oeuvres dont plus de 2000 sont déjà téléchargeables.
5_visual_material
across_the_board
Amsterdam Library of Object Images (ALOI)
1000 objects under different angles and different lighting conditions. Royalty Free?
Fribbles Stimulus Sets
Fribble stimuli used in several experiments. Within each Fribble species, the exact shape, color, and texture of the main body and the approximate location and interrelationships between appendage parts are held constant for all exemplars. Colors and textures of appendage parts are also similar (although not identical) across exemplars. The main aspect that changes from exemplar to exemplar in a species is the exact shape of the appendage parts.
Action Picture Stimuli in IPNP
Black and white drawings of 275 transitive and intransitive actions from different sources.
Object Picture Stimuli in IPNP
Black and white drawings of 520 common objects (including 174 pictures from the Snodgrass & Vanderwart set and other sources.)
Diagnostic Color Objects
Color images of many diagnostic color objects, e.g., a banana.
Objects are shown in typical and atypical colors. There are also control sets of neutral color objects.
The orignal set were used as stimuli in Naor-Raz, Tarr, & Kersten (2003)
Grayscale pictures of 31 chairs -- Bruno Rossion
Grayscale pictures of 31 chairs garnered from various sources by Bruno Rossion at the Universite Catholique de Louvain. Bruno has scaled all of the images to the same size, orientation, and brightness. Bruno asks that if you are going to use the chairs, please contact him at rossion@neco.ucl.ac.be and let him know what you are up to. The images are STANDARD COLORS grayscale PICTS.
Colorized Snodgrass and Vanderwart pictures -- Rossion & Pourtois (2004)
The authors have created a new set of stimuli based on the widely used line drawings of Snodgrass and Vanderwart (1980).
These 260 stimuli contain diagnostic texture and color information.
Normative data (naming agreement and latencies, complexity, familiarity, imaginability) for these new stimuli have been collected (Rossion & Pourtois, 2004).
Their data shows that surface information, color in particular, greatly facilitates object recognition.
If you download the set and wish to use it in an experimental/clinical study, a donation of $30-$50 to help defray their costs would be most welcome. You can send correspondence about contributing to Bruno Rossion.
Royalty Free Clipart Images
A long list of links for royalty free or free to use clipart images.
Royalty Free Photos
A long list of links for royalty free or free of use photos.
Change Blindness Scenes
This set of scenes were used as stimuli in the studies reported in Aginsky & Tarr (2000) set contains many variants of individual scenes. Variants were generated by either moving or changing the color of some element of the scene. The images are color PICT files.
Object Data Bank
Michael Tarr (with the artistic help of Scott Yu) has developed a wonderful data bank of three-dimensional
objects from numerous views.
Viperlib, visual perception library
Viperlib is a web-based resource library of images and presentation material illuminating the study of visual perception. (more than 2000 images)
All images are given freely by the vision research community and are available for educational, non-profit use only.
dutch
Lexical norms for pictures, Martein (2005)
This work presents the results of a normative data collection study of 216 pictures which can be used in a wide range of cognitive experiments. Black-and-white line drawings of 216 objects, belonging to 20 large semantic categories, were rated by a sample of 300 first-year psychology students at the University of Ghent. These ratings provided data on several variables of central importance to cognitive processing and memory functioning: name agreement, concept agreement, familiarity, visual complexity and image agreement. The following semantic categories were included in the set: 1. Article of clothing, 2. Birds, 3. Electronical appliances, 4. Fish, shells, ..., 5. Flowers, plants, ..., 6. Food, 7. Fruit, 8. Furniture and decoration, 9. Insects, 10. Kitchen-utensils, 11. Mammals, 12. Miscellaneous, 13. Musical instruments, 14. Parts of a building, 15. Parts of the human body, 16. Reptiles and amphibians, 17. Tools, 18. Vegetables, 19. Vehicles, 20. Weapons
Lexical norms for pictures, Severens et al. (2005)
Timed norms for 590 pictures in Belgian Dutch, with name agreement and response latencies.
english
Norms for timed picture naming
The UCSD Center for Research in Language is engaged in a large international study to provide norms for timed picture naming in seven different languages (American English, German, Mexican Spanish, Italian, Bulgarian, Hungarian, and the variant of Mandarin Chinese spoken in Taiwan). They currently have data for over 500 pictures.
french
Bonin et al. (2003)
French norms for name agreement, image agreement, conceptual familiarity, visual complexity, image variability, age of acquisition, and naming latencies
Schwitter et al. (2004)
French normative data and naming times for action pictures
spanish
Cuetos & Alija (2003)
Normative data and naming times for action pictures in Spanish
Cuetos et al. (1999)
Naming times for the Snodgrass and Vanderwart pictures in Spanish
Dasi et al. (2004)
Normative data on the familiarity and difficulty of 196 Spanish word fragments
Fernandez et al. (2004)
Free-association norms for the Spanish names of the Snodgrass and Vanderwart pictures in Spanish
7_performance_measures
english
Lexical Decision, Balota et al. (1999)
Lexical Decision Corpora
Spieler & Balota (1998)
Naming Latencies for Younger and Older Adults.
Response times in CVC naming study, Treiman et al. (1995)
[As on the authors' website] These are the mean RTs and error rates from the Treiman et al. 1995 naming study with Wayne State University students. They are being made available to other researchers who wish to do additional analyses of these data.
DRC simulation results for words, Coltheart et al. (2001)
DRC simulation results reported in Coltheart et al. (2001), 7910 words
PMSP simulation results, Plaut et al. (1996)
Simulation results for the PMSP96 model (Plaut et al., 1996)
french
Belec (Mousty et al., 1994)
Batterie d'évaluation du langage écrit et de ses troubles.
docs
1_parts_of_words
english
ASL alphabet
A pdf document with the corresponding sign (ASL) for each letter of the alphabet
A Collection of Letter Oddities and Trivia
Cover topics such as:
Vowels; Uncommon double letters, triple letters, quadruple letters; Consecutive consonants;
Most frequent appearance of each letter
ipa
Unicode character sets
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
2_words
english
Heteronyms
List of heteronyms in the language
Alan Cooper's Homonyms
List of homonyms (in fact, homophones) in English
Homophones
List of 439 homophones by Ian Miller
A Collection of Word Oddities and Trivia
Cover topics such as:
Misspelled words; Typewriter words;
Beautiful words; Long words; Long words (place names); Long words (chemical names);
Plurals; Scrabble words; Short and Long words from Mathematics;
Short and Long words from the Bible; Names of people which became words;
Last words (alphabetically arranged);
French words; interjections; Italian words
french
Les bases de données lexicales : pourquoi et comment les utiliser en logopédie ? Par Marie-Anne Schelstraete & Christelle Maillart
Notre système de traitement de l'information - le système cognitif - est, de manière générale, très sensible à la fréquence des stimuli qu'il traite, ce que l'on peut interpréter comme une forme d'adaptation particulièrement adéquate à un environnement variable. Dans tous les domaines de la psychologie cognitive, des recherches montrent que des stimuli fréquents, qui ont déjà été souvent rencontrés, sont traités plus rapidement et avec une plus grande exactitude que des stimuli plus rares. [...]
techniques
Modern Dictionary Making
On-line course, by Dafydd Gibbon, Faculty of Linguistics and Literary Studies (University of Bielefeld), Summer Semester 2003 (Version: July 24, 2003)
What is a lexical database
Entry of the Glossary of linguistic terms (Loos, Anderson, Day, and Jordan)
Lexical Database definition (at freedictionary)
Multiple definitions related to lexical databases
4_running_text
english
historical_changes
Words in English by Suzanne Kemmer (Rice University)
[from the website] This website is a resource for those who want to learn more about this fascinating language [i.e., English] – its history as a language, the origins of its words, and its current modern characteristics.
The Great Vowel Shift
[from the website] This site is designed for my students--undergraduates with limited linguistic knowledge who are being introduced to the Great Vowel Shift. There are topics I do not discuss in this site because they are too basic, too complicated, or too controversial for this audience.
The History of English Phonemes
[from the website] This Website is designed to help students of the English language trace the development of the phonemes of English from the Old English period into Present-Day English. The information contained in the site is available in any good textbook on the history of the language, but printed texts normally present the information in a linear fashion corresponding to the chronological development of English. The value of the Website is the hypertextual treatment of the information, which is meant to keep students from having to spend a great deal of time leafing through textbooks.
Wordorigins.org
[from the website] This site is devoted to the origins of words and phrases, or as a linguist would put it, to etymology. Etymology is the study of word origins. (It is not the study of insects; that is entomology.) Where words come from is a fascinating subject, full of folklore and historical lessons. Often, popular tales of a word's origin arise. Sometimes these are true; more often they are not. While it often seems disappointing when a neat little tale turns out to be untrue, almost invariably the true origin is just as interesting.
_general
A Collection of Sentence Oddities and Trivia
Cover topics such as:
Pangrams, Palindromes, Plurals
Vivian Cook website
Various informative pages on Writing Systems or Second Language Acquisition. Include a linguistics glossary and extensive bibliography on Second Language Acquisition.
french
historical_changes
Textes en français historique
NA
Chantez-vous français?
[Copie du site web] "Ce n'est pas une histoire du chant. Ce n'est pas non plus une histoire du français. C'est une histoire du français chanté. Dès l'origine, le chant constitue une forme de discours à part entière, qui obéit à ses règles propres. L'histoire de ces règles, qui définissent le champ de la déclamation, est ici retracée. Touchant à plusieurs disciplines, cette étude s'adresse en tout premier aux chanteurs pratiquant la musique ancienne, qui peinent à trouver, dans les traités spécialisés, des réponses à leurs questions. Peut-être intéressera-t-elle aussi quelques linguistes que le chant et la musique ne laissent pas indifférents et, qui sait, d'autres esprits curieux."
Old French On The Web
A Website Devoted to the Language and Literature of Old French
6_associations
english
English is Tough Stuff
Famous poem Chaos first written by a Dutchman, Dr Gerard Nolst TRENITY in 1920, and often republished and expanded since
links
0_across_the_board
French Lexical Resources
Lists the databases and tools known to the Manulex team
Psychology of Language Page of Links (Kreuz)
[From the website]. This page has grown out of my own collection of links to people and resources on the World Wide Web. As such, it undoubtedly reflects my interests and biases to an unhealthy degree. Over time, and with your help, I hope to compile a more comprehensive collection of sources. I particularly need help in adding to the list of psychology of language researchers and labs (sections 2 and 3). Please send me your links, and I will enshrine them on the list.
LangueFrancaise.net
Groups a large diversity of resources in French
OLAC - Open Language Archives Community
OLAC collects information about the language resources in multiple archives, making them searchable from a single location.
Technologies du Langage
Jean Véronis' Blog (Title in French but many posts are in English)
UK Data archive
[From website] The UK Data Archive (UKDA) is an internationally-renowned centre of expertise in data acquisition, preservation, dissemination and promotion; and is curator of the largest collection of digital data in the social sciences and humanities in the UK. The UKDA provides resource discovery and support for secondary use of quantitative and qualitative data in research, teaching and learning as a lead partner of the Economic and Social Data Service (ESDS). The UKDA houses AHDS History, provides preservation services for other data organisations and facilitates international data exchange.
1_parts_of_words
english
UCLA Phonetics Lab Data
Index of Languages, Index of Sounds, Map Index, and material relevant to Peter Ladefoged books: A course in phonetics and Vowels and Consonants.
2_words
across_the_board
Localized Dictionaries for Mozilla Thunderbird
The XPI files are basically just a ZIP format with some special requirements for Mozilla.
Unzip a file and you get two files xxxxxx.dic and xxxxxx.aff which are the "standard" myspell format.
lingucomponent.openoffice.org and the links from it will tell you how to interpret them.
Dictionaries list at linguistlist
Links to over 200 dictionaries of specific languages and a collection of multilingual dictionaries, as well as acronym dictionaries, thesauri, and dictionaries of specialized terms. It also includes dictionary projects (e.g. The Euro Wordnet Project).
chinese
SUBTLEX-CH
Chinese Word frequencies based on film and television subtitles.
dutch
SUBTLEX-NL
Dutch Word frequencies based on film and television subtitles.
english
Answers.com
According to the authors "The best definitions and explanations for over one million topics." The matches in dictionaries, wikipedia encyclopedia, or wordnet are all grouped on one screen.
Urban Dictionary
Dictionary for street slang.
Your dictionary.com
Various dictionary resources.
Four Letter Words
This small project is an attempt to give a spacial overview of the entirety of this part of english language heritage, as well as to explore and visualize relations between all four letter words.
wordcount.org
WordCount™ is an interactive presentation of the 86,800 most frequently used English words..
SUBTLEX-US
English (US) Word frequencies based on film and television subtitles.
french
Lexical resources at the LEAD (Universite de Bourgogne)
NA
Liste de bases de données lexicales à Orthorélie
N/A
Glossaires de termes
Lexiques, glossaires et dictionnaires spécialisés (très longue liste)
spanish
Diccionario de la lengua Española
Spanish Dictionary
Nuevo Tesoro Lexicografico de la Lengua Española
N/A
La Real Academia Española
Royal Academy of Spanish Language
3_nonwords
across_the_board
Wuggy
A multilingual pseudoword generator.
4_running_text
across_the_board
corpus-linguistics.de
[From the website] On this webpage you will find an annotated reference system to find everything related to Corpus Linguistics that is available on the Internet: Corpora, Concordances, Corpus Linguistics research efforts and events, software for tagging, annotation etc.
Devoted to Corpora (Bookmarks for Corpus-based Linguists)
[From the website] These annotated links (c. 1,000 of them) are meant mainly for linguists and language teachers who work with corpora, not computational linguists/NLP (natural language processing) people, so although the language-engineering-type links here are fairly extensive, they are not exhaustive (for such info, you'll have to look elsewhere). Stuff here also represent my personal interests and biases (which will be obvious in some of my descriptive notes) and consequently there may be gaps, errors and omissions which you are welcome to tell me about. The English language bias on these pages will, I hope, be forgiven.
ELDA (Language Resources Distribution)
[From the website] Our catalogue of language resources currently gathers around 700 spoken and written language resources. It can be accessed from the ELRA web site and from the ELDA web site. The identification and the collection of existing language resources is part of our regular activity. The new resources we have collected, once the catalogue has been updated, are announced on some mailing lists, as well as in the ELRA members' news and in the quarterly ELRA newsletter.
EURALEX (European Association for Lexicography)
[From the website] EURALEX is the European Association for Lexicography: an international association which was founded in 1983, with the aims of furthering all aspects of the broad field of lexicography, and of promoting the exchange of ideas and information. It is committed to the development of lexicography in all European languages (as well as other non-European languages). EURALEX's interests include dictionaries of all kinds (monolingual, bilingual, and multilingual, general and specialist, in book and in machine-readable form); metalexicography, the theory of lexicography, and the history of lexicography; the praxis of dictionary-making; dictionary use; terminology and terminography; corpus lexicography; computational lexicography and dictionaries for natural language processing; and lexicology in general.
Linguistic Data Resources on the Internet
A topically organized list of language data resources on the Internet.
Archives for Language and Machine Learning
N/A
SIGLEX
[From the website] SIGLEX, a Special Interest Group on the Lexicon of the Association for Computational Linguistics, provides an umbrella for research interests on lexical issues ranging from lexicography and the use of online dictionaries to computational lexical semantics. SIGLEX is also the umbrella organization for SENSEVAL, evaluation exercises for Word Sense Disambiguation.
english
History of the English Language
List of Links about the English Language and its historical changes.
french
Une Histoire de la langue française @ Globe-Gate
Collection of nearly 100 links related to French, its dialects and historical changes
tools
0_across_the_board
Rent a coder
If you need a tool that doesn't seem to exist yet, why not rent a coder to develop it
1_parts_of_words
Ngrams
Bigram Frequencies
Compute bigram frequencies for a list of words, using a bigram frequency table
Speech tools
Speech Tools enables you to record, store, and analyze language sounds and music,
as well as to help you in language learning. This suite of programs consists of
(a) IPA Help 2.03; (b) Phonology Assistant 2.1; (c) Speech Analyzer 2.6
2_words
Latent Semantic Analysis Applications
Latent Semantic Analysis (LSA) captures the essential relationships between text documents and word meaning, or semantics, the knowledge base which must be accessed to evaluate the quality of content. Several educational applications that employ LSA have been developed: (1) selecting the most appropriate text for learners with variable levels of background knowledge, (2) automatically scoring the content of an essay, and (3) helping students effectively summarize material.
Semantic Space and Probability Models
Scott McDonald has provided a web interface to several semantic space models derived from the British National Corpus (and therefore British English: highly recommended).
Word Analysis Tools
Include: (a) Interlinear Text editor helps you create text segments comprised of wordform-in context objects. (b) Wordform Inventory editor. (c) Analysis editor, to categorize and gloss a word and its component morphemes for display in the interlinear text and lexical database. (d) Morphology Explorer to help you discover morphemes in your data on the basis of similarity in form or meaning and review other data correlations in the wordform list, your texts, and the lexical database.
4_running_text
Consortium for Lexical Research, FTP archive
The Consortium for Lexical Research, an archive of research materials accessible via anonymous FTP.
DFKI ACL Natural Language Software registry
The registry maintains an exhaustive list of tools, with a structured listing and descriptions of available NLP products. It does not overtake any distribution facility.
Link Grammar
[From authors' webiste]. he Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" representation of a sentence (showing noun phrases, verb phrases, etc.).
Logiciels par Jean Veronis
Différents logiciels NLP, incluant: Analyseur syntaxique à but pédagogique, Concordancier pour les corpus écrits et oraux, Macro complémentaire Excel avec quelques fonctions statistiques utiles (boîtes à moustaches, Khi2 pour les tableaux de contingences, etc.)
5_visual_material
Psychophysics Toolbox
The Psychophysics Toolbox is a free set of Matlab functions for vision research (Brainard, 1997; Pelli, 1997)
Vision Egg
Visual stimulus creation and control with open source software (Python Library)
7_performance_measures
a_expt_design
JONES: Journal of Neurobehavioral Experiments and Stimuli
[from JONES' website] JONES is a new kind of journal that allows scientists to share their experiments with the scientific community in a way that has not been possible before. JONES differs from other methods publications in that what gets published is not just a description of the experiment, but files that fully specify the experimental paradigm and stimuli.
b_running
Software for running experiments
List of softwares, with comments and resources, in the lexicall wiki
Software for participants management
List of softwares, with comments and resources, in the lexicall wiki
Software for Eye Movement Experiments
List of softwares, with comments and resources, in the lexicall wiki
c_post-processing_filtering
Trim Outliers
Of use to cognitive psychologists who need to filter data from experiments.
Let you filter out data outside a minimum and maximum value or outside a n SD (standard deviation) window.
Click on the web author icon to access compiled version and full documentation.
d_summaries
Pivot Table
Reproduce the pivot table function found in excel.
A pivot table let you explore your data according to predetermined dimensions: for example, reaction times by word lenght,
error rates by word frequency, etc. Four different summary statistics are provided for the
distribution in each dimension: mean, median, standard deviation, count, etc.
e_plotting
b-src: Bioinformatics Packages Source Code Search Engine
b-Src is a bioinformatics packages source code search engine by gonzui. b-Src is maintained by Mitsuteru Nakao and b-Src Team in behalf of CBRC Sequence Analysis Team. Please let us know if you have recommended bioinformatics packages to be indexed.
GraphViz (Graph Visualization Software)
Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains.
Graphviz is open source graph visualization software. It has several main graph layout programs. See the gallery for some sample layouts. It also has web and interactive graphical interfaces, and auxiliary tools, libraries, and language bindings.
Compiled version for Windows, Mac OSX, Linux
Orange Widgets (Data Mining)
Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets. Compiled versions exist for Windows and Macintosh.
f_statistical_analyses
stats R
R is an open source (that is free) software for statistical analysis and data visualisation.
It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
Software for Statistical Analysis
List of softwares, with comments and resources, in the lexicall wiki