Separate pages detail the types of material available and categories used for the classification of resources.
Standards used for the description of resources
Each item in our repository will be accompanied with a description file, which provide useful information about the resource. This document will be stored as a plain text XML document.
The definition file is a simple text file divided into four parts.
- The first part is mandatory and provides information about the resource.
- The second part is optional (but required if you wish a download link or a query interface to be provided by lexicall.org) and defines parameters used for the management of the resource.
- The third part is optional (but required if you wish the resource to be automatically interfaced on this website) and describes its organisation.
- The fourth part (non implement yet) provides a index for cross-platform code conversion.
<keywords> | |
<material> | |
[text, one of "data", "tools", "docs", "links"] Type of material. | |
</material> | |
<category> | |
[text, one of "Part of words", "Words", "Nonwords", "Visual Stimuli", "Associations", "Data Sets"] Category under which the resource should appear, in our repository. | |
</category> | |
<language> | |
[text, one of "english", "french", "spanish", "dutch", "japanese"] Language for language-specific resources. | |
</language> | |
</keywords> |
<resource_description> | |
<name> | |
[text, up to a maximum of 100 characters] Name of the Database. | |
</name> | |
<version> | |
[text, up to a maximum of 10 characters] Version of the resources (by default 1.0) | |
</version> | |
<file_format> | |
[text, one of "text file", "awk script", "perl script", "other"] File format for the resource (it must be one of the options proposed if the resource is to be interfaced automatically on the lexicall website). | |
</file_format> | |
<description> | |
[text, up to a maximum of 255 characters] Short description of the Database. | |
</description> | |
<reference_to_cite> | |
[Bibliographical reference in APA format: Smith, J. (xxx). Title. Journal, Vol(Iss), pages] Reference to a peer-reviewed paper or report which presents this resource and which should be cited in any work that makes uses of this resource. | |
</reference_to_cite> | |
<url_information> | |
[url|Psychonomic.org|Lexique.org|none] Link to a page on the web which provides information about the resource. Ideally, this would be a link to a manual in html or pdf format. | |
</url_information> | |
<url_download> | |
[url|Lexicall.org|Psychonomic.org|Lexique.org|none] Link to a webpage where the resource file can be downloaded. If the value is "lexicall.org", a link should be created to the copy held on the lexicall.org website. In this case, the authors should make sure that the data file is also uploaded. If the authors prefer the users to be directed to another webpage (for instance, psychonomic.org or lexique.org), simply provides the url of the website on which a link to the data file can be found. If the authors do not wish the resource to be publically available for download, they should simply mention none. | |
</url_download> | |
<url_query_interface> | |
[url|Lexicall.org|Psychonomic.org|Lexique.org|none] Link to a webpage with an interface that lets users query the resource. If the value is "lexicall.org" a linkk will be provided to the automatic interface generator of the lexicall website. In this case, the contributor should make sure to provide information about the variables in the resource file and make sure that the the data file is also uploaded. If an interface already exists, simply provide the url of the website on which this interface can be found. If the authors do not wish the resource to be publically interfaced, they should simply mention none. | |
</url_query_interface> | |
<notes_public> | |
[text up to 255 characters] Note visible to the users. | |
</notes_public> | |
<notes_private> | |
[text up to 255 characters] Any information that you prefer to keep attached to the database but do not wish to be seen publicly. | |
</notes_private> | |
</resource_description> |
<contact_details> | |
<contact_person> | |
[text, up to a maximum of 30 characters] Name of the person to contact about this resource (note that users of this website are invited to consult the on-line documentation and forums before taking contact with the authors). | |
</contact_person> | |
<contact_email> | |
[text, up to a maximum of 30 characters, of a format xxx@xxx.xx, where x can be either alphanumeric or "."] Email of the contact person. This email will be listed in the information box only if the author specifies it is for public access (next field). | |
</contact_email> | |
<contact_email_ispublic> | |
[digit, either 1 or 0] Indicates whether the email information should be made public (1) or not (0) | |
</contact_email_ispublic> | |
<contact_lab> | |
[text, up to 50 characters, alphanumeric characters only] Name of the lab or department to which the principal author belongs. | |
</contact_lab> | |
<contact_url> | |
[text, up to 50 characters, of a format xxx.xxx where x can be either alphanumeric, ".", or "/"] Link to the contributor personal webpage or to its lab or department webpage. | |
</contact_url> | |
</contact_details> |
<lexicall_management> | |
<file_copy_at_lexicall> | |
[text, either "yes" or "no"] Indicates whether a copy is held in the lexicall repository. The value must be 'yes' if the contributor indicated that links for download or query interface should be created in the lexicall.org website. | |
</file_copy_at_lexicall> | |
<file_size> | |
[number, up to a maximum of 999999999] Indicates the approximate size of the file (rounded to the top). This will be used, among other things, to automatically convert big data files into a mySQL database, for faster querying. | |
</file_size> | |
<variables_defined> | |
[text, either "yes" or "no"] Indicates whether a description of the variables is also provided. We strongly encourage the contributors to do so, as it guarantees a better use of their resource. The value must be 'yes' if the contributor indicated that links to a query interface should be created in the lexicall.org website. | |
</variables_defined> | |
<nb_variables> | |
[number, up to a maximum of 99] Indicates the number of variables that will be described. Typically, there should be one variable per column in a data file or one variable per parameter in a script file. | |
</nb_variables> | |
</lexicall_management> |
(column number | Variable Name | Variable Description | Query Type | Query specifications |
<variables_definition> | ||||
[00-99] | [text] | [text] | [text] | [text] |
[00-99] | [text] | [text] | [text] | [text] |
[00-99] | [text] | [text] | [text] | [text] |
[00-99] | [text] | [text] | [text] | [text] |
[00-99] | [text] | [text] | [text] | [text] |
</variables_definition> |
This part is organized as 5 columns of data separated by tabulations. This can be easily created in Excel and copy pasted in a text file.
- Column 1: A digit that indicates the column, in the data file, in which the variable is found.
- Column 2: A word or two naming the variable.
- Column 3: A short text (maximum 255 characters) describing the variable.
- Column 4: One of a set of predefined option indicating the type of query to be provided for that column of data.
Current options are:
- [Regular Expression] or [RE]: Typically of use for text string, let the user find the data in that column that match a specific search pattern, the search pattern being defined with regular expressions (see in-site documentation for details on these).
- [Min-Max] or [MM]: Typically of use for continuous values, let the user find any matching data inside a range of values. To speed up computation time, the 5th column should contain information about the absolute minimum and maximum values in the database. If no data is provided, these values will be automatically computed (but this will affect processing time).
- [Single Choice] or [SC]: Typically of use for categorical data, let the user define the value to match as one of a list of options. The fifth column should then contain information about the keys and their meaning. The format for this is key1: value-key2: value-key3: value. The values will be displayed in a drop down menu and the keys used to find matching data. This can be used, for instance, to define syntactic class (V: verb-N: Noun-A: Adj.).
- [Multiple Choice] or [MC]: Quite similar, except that the user can select multiple values. In this case, the data retrieved will be the ones that match any of the values provided by the user. An example of use is the selection of words that are either CVC or CVCC in structure.
- [Tick Box] or [TB]: Of use mainly in scripts, let the user determine whether an option should be on or off.
- [Word List] or [WL]: Of use mainly in scripts, let the user enter a list of words or items to process.
- [None] or [NO]: No query to provide. Typically of use for variables like standard deviations.
- Column 5: Specifications for the query. With Single and Mutliple Choices query options, a description of the (keys: values) pairs. With Min-Max query options, an indication of the absolute Minimum and Maximum values in the database.
<code_conversion> | ||
<french_diacritics> | ||
<apply_to_variables>x y z</apply_to_variables> | ||
<code_table> | ||
Mac | PC | HTML |
é | é | é |
</code_table> | ||
</french_diacritics> | ||
<phonetic_codes> | ||
<apply_to_variables>x y z</apply_to_variables> | ||
<code_table> | ||
this_file | DISC | |
x | x | |
y | y | |
z | z | |
</code_table> | ||
</phonetic_codes> | ||
</code_conversion> |
Files in a text format can be easily processed and exchanged between platforms or used on any platform. However, two problems need to be addressed: (1) characters do not always get displayed the same way on different platforms; (2) in current databases different coding options are often adopted.
However, with databases stored as text file, it is fairly easy to come up with a program that converts codes from non-standard formats to the standard formats, using a code conversion table.
The way this information will be coded still needs to be defined.