Known “issues” with the corpus

This page is in progress. This is where we will provide corpus users with a list of all of the issues to look out for when using the corpus data. We wish this caveat utilitor to make the corpus-user as informed as possible regarding the kinds of unavoidable problems that arise in the creation of a corpus of this type.


about, above: TO BE COMPLETED


  • reduced would and had (as ‘d): word-final [d] is difficult to hear, especially when followed by alveolar-initial words (like just). Sometimes its presence is in question, in ambiguous examples such as We(‘d) come here (see e.g. Emory Cook Part 1). The decision to put the form in the transcription or not has consequences for the Part of Speech tag given to the verb. If we put the form in the transcription and interpret it as would, the following verb is tagged as a bare infinitive; if we do not put the form in, the verb is tagged as a simple past tense form. Furthermore, if context does not help discern whether the ‘d is a reduced form of would or of had, the main verb form becomes ambiguous between a bare infinitive and a past participle. These ambiguities may present problems for researchers interested in the morphological form of the verb in the different tenses.
  • reduced / null have (see also below under schwa): TO BE COMPLETED
  • missing do in 2nd person singular interrogatives: Part of Speech tagging becomes problematic in interrogative examples like You like coal-mining?, where it is unclear whether the sentence is formally a declarative (in which case the verb like is to be tagged as present tense form), or whether the sentence is formally an interrogative, with a null (or elided) do in Comp (in which case the verb like is to be tagged as a bare infinitive). The presence of NPIs can help decide the issue (e.g., You ever go to Tennessee?, where the licit NPI ever is arguably licensed by a verbal element in C (cf. the ungrammatical declarative, *You ever go to Tennesse). In such a case, we can be more sure that there is an elided do in C, and the verb go here would therefore a bare infinitive. Unfortunately, NPIs are rare in these questions, so the correct analysis is often unclear.
  • reduced was:
  • reduced would:

closte: For some Appalachian speakers, [klost] (with a word-final [t]) is an alternate pronunciation for the word close (often alternating with the pronunciation [klos]). It becomes difficult to tell which of the two pronunciations is at play in a context where the following word begins in a [t], e.g., We got close / closte to the tree.  If as a user of this corpus you are interested in the closte pronunciation, we advise you to do a search through the recordings for both the close and closte spellings, to decide for yourself which is the most reasonable transcription, for cases where the context could have made our transcription questionable.

-ing: researchers interested in forms ending in ing vs. in (e.g., singing vs. singin’) must listen to the recordings, as all cases have been transcribed as -ing, regardless of the pronunciation.


said vs. says: it is often difficult to tell whether in a narrative describing a past event a speaker is using said or says (in sentences like “So then he said/says…”). The context often does not allow for disambiguation (given the common use of the form says in narratives for past events, even in the first person, i.e., “So I says to him…”). Researchers interested in this question are advised to not rely solely on our transcriptions, but rather, to listen to examples transcribed as said or as says in these contexts.

schwa: generally speaking, the hesitation “uh” is common in speech. This becomes problematic for transcriptions of speech with a-prefixing (e.g., he was a-running), as in many cases it is difficult to discern whether the speaker is simply hesitating (which would be transcribed as uh), or whether the speaker is using an a-prefix (which would be transcribed with a=). This becomes complicated by the fact that the indefinite article a is generally pronounced as a schwa. Users searching for a-prefixing examples should therefore be careful to listen to all examples, to determine the cases that are arguable.

singulars for plurals: (two year, two mile, etc.) TO BE COMPLETED


whether: as noted in M&H, the interrogative complementizer whether is often pronounced whur. This pronunciation is furthermore homophonous with the wh-pronoun where. This raises the theoretical question of whether the form of the interrogative complementizer is in fact where when it is pronounced where. Despite these issues, we transcribe all possible pronunciations of this word as whether.