In this repository, we try to provide access to material of interest to psycholinguists. Accordingly, the data material listed in our repository is organized using the levels of analysis described below. Separate pages detail the types of material available within each category and the standards used for the storage and description of these resources.

1. Parts of words

Statistics attached to parts of a word. Examples are Bigram or Trigram Frequency (e.g., number of time that the bigram "ar" is found in English words), Syllable Frequency (number of time that the syllable "tar" is found in English words)

2. Words


Statistics attached to a word in full. Examples are Word Frequency, Age of Acquisition (i.e., age at which a word has been acquired), Lexical Neighborhood (i.e., number of words that share all letters but one with the string).

These statistics are typically found in lexical databases or in tables which list a somewhat limited number of words, along with a limited set of attached variable, as age of acquisition or frequency.

3. Nonwords


A fairly recent trend is to create databases for nonwords (strings that eventually could be English words but aren't), that is letter strings that do not match any real wordof the language. Such strings are particularly useful to evaluate the knowledge of print-to-sound associations that readers avoid a material that presents inevitable differences in lexical properties when studying stages of processing at which the lexical status of the word is irrelevant (typically early perceptive processes).

4. Running text


Statistics about the word in sentences (syntax, prosody, etc.)

5. Visual Material visual stimuli

Psycholinguistic studies sometimes also involve the presentation of visual material, for naming. For instance, picture naming tasks have been used to test the source of the age-of-acquisition effects [ref Bonin et al.]

6. Associations (print-to-sound, sound-to-print) associations

Statistics that reflect the connection between a word in one modality (for instance writtten work) and its equivalent in another (for instance spoken word). They typically reflect the regularity or consistency with which parts of words are translated from print to sound or the other way around. For instance, consistency estimates provides a coefficient that reflect the probability with which a given segment would be found with a given pronunciation in English words. The Body-Rime consistency estimate, for instance, reflects the consistency of pronunciaiton for each possible body of English (a body is defined as made up from the nucleus and coda parts of a syllable, or the vowel and final consonants of a one syllable word).

7. Datasets


Datasets provide information about the performance of human participants or computer models on a set of words. They are helpful to test predictions before running a well constructed experiment. As a result, they diminish the risk of running an experiment that leads to null results and henceforth increase productivity.

