Transcribing Narrative Samples

Page Navigation

Installing and Running CLAN
CHAT File Format
Opening and Closing Headers
The Body of the Transcript
Utterance Boundaries
Transcribing Words and Utterances using CHAT Conventions
Completing the Transcript

The narratives were transcribed using a program for transcription and analysis from the Child Language Data Exchange System(CHILDES), consisting of two parts: CHAT and CLAN. CHAT is a set of transcription conventions that are used within the CLAN program, which works like a basic version of any word processing program. CLAN also includes commands that can be used to analyze files transcribed in CHAT format. For more information see, Analyzing Narrative Samples.

CLAN and the CLAN manual (PDF) are available for free download from CHILDES. The CLAN manual describes how to use the program to transcribe and analyze language samples. While CLAN was used to transcribe and analyze language samples for this project, there is also a paid program, SALT, available for this purpose.

Installing and Running CLAN

This page summarizes the basics of running CLAN (which is covered in pp. 9–19 of the CLAN manual).

Once you have installed CLAN according to the instructions on the CLAN website, start the program by double-clicking on the CLAN icon. A window titled Commands will open where you enter commands described in Analyzing Narrative Samples to perform various analyses. If the window does not open automatically, enter Ctrl+D (Windows) or ⌘+D (Mac). Below is what the Commands window looks like on a Mac.

The first thing you need to do when running CLAN is set the working and lib (library) directories. The working directory is the location of the files you plan to work with.

  • Click in the Commands window.
  • Browse to the directory where you would like to save your transcripts and click. The path of the directory you selected will be listed to the right of the button.
  • Follow the same process to define the correct library directory. Navigate to the CLAN program file and select the 'lib' directory, which was created during the installation.

    Note: To run morphological analyses on English transcripts (e.g. MLU in morphemes), CLAN needs an automatic tagger for English. This is available for free download from CHILDES. Select 'English (eng)', and save the downloaded folder, called 'eng', to the 'lib' folder within the CLAN program directory. Unzip the folder to the same location, and in the Commands window, click and set the 'eng' folder as the active folder.
  • Transcription

    After opening CLAN, create a new CHAT file by clicking on File > New. To conduct analyses using CLAN, the ENNI narratives need to conform to a specific format. Download and follow the structure provided in this CHAT transcript template (.CHA). You are now ready to begin transcribing a language sample.

    The CHAT manual (PDF) describes the conventions and principles of CHAT file transcription. The section that is most relevant is called MinCHAT (pp. 20-23). This section describes some of the most important conventions and symbols for creating transcripts to conduct analyses relevant to this project.

    CHAT File Format

    There are several minimum conventions for the form of a CHAT file. These conventions must be followed for the CLAN commands to run successfully on CHAT files. An example transcript is shown below.

    Lines marked in red are the mandatory opening and closing headers. The speaker tiers are *CHI (child) and*EXP (investigator), and %com is a comment tier which adds non-spoken relevant information to the transcript. Further explanation of transcription conventions is below.

    Opening and Closing Headers

    • Every line must end with a line break.
    • Each tier identifier (e.g., @Comment, *EXP, %com) must be followed by one tab stop before entering the content of the line. If there is even one intervening space character, CLAN commands will not be able to run.
    • The first line in the file must be an @Begin line.
    • The header may contain the following lines:
      • @Languages which lists the languages of the transcript (usually 'eng' for English).
      • @Participants line which lists three-letter codes for each participant followed by their name and role.
      • @ID line which can be used to provide further details for each speaker. This line is required, although the contents can be left blank.
      • @Date and @Birth of CHI lines which are entered in the following format: dd-MMM-yyyy.
      • @Comment lines which can contain any additional information you like. 
                  For example:  @Comment  Date of Testing: 20-JAN-2011
                                      @Comment  Location of Testing: Edmonton Lab
    • The last line in the file must be @End.

    The Body of the Transcript

    • Lines beginning with * indicate what was actually said. These are called 'main lines'. Each main line includes only one communicative unit (i.e. a stand-alone utterance). When a speaker produces several utterances in a row, transcribe each separately on a new main line.
    • The keyboard shortcut for adding a new line with a speaker code is Ctrl+1, +2 etc. (Windows) and ⌘+1, +2 etc. (Mac). For example, Ctrl+1 will add a blank *CHI line if 'child' is the first participant listed, and Ctrl+2 will add a new *EXP line if 'investigator' is the second participant listed.
    • Lines beginning with the % symbol contain commentary about the transcript. They are called 'dependent tier' lines. Dependent tiers can be things like actions (%act) or comments (%com) which add relevant non-spoken information to the transcript.
    • Ensure that only a single tab stop occurs between line identifiers (like *CHI: and %com:) and content of lines. If there is even one intervening space character, CLAN commands will not be able to run.

    Determining Utterance Boundaries

    How utterances are transcribed directly affects measures of length of utterance. Therefore, it is important to transcribe utterances in a systematic and consistent manner.

    Utterances in ENNI narratives are roughly equivalent to sentences. They are referred to as clausal units or C-units in the ENNI manual. To determine where utterance boundaries lie, you must first identify which clauses are independent and which clauses are dependent.

    • Independent clauses (also called main clauses) can stand alone as a complete sentence. For example,"and the elephant loved the ball".
    • Dependent clauses (also called subordinate clauses) are a phrase containing a verb, but not sentences that can stand on their own. For example, "the ball that ran away".
    Certain conjunctions ("coordinating conjunctions") such as andbut, and then often begin independent clauses. Utterances that begin with becausealthough, etc. ("subordinating conjunctions") are considered to be dependent clauses and therefore are transcribed on the same line as the independent clause they are subordinate to.

    For example:

    *CHI: she saw the airplane. (independent clause)
    *CHI: and she wanted to play with it, although it wasn’t hers. (independent clause, dependent clause)

    The above two utterances were said as one intonational unit without sentence-final intonation until "hers". However, this bit of speech is parsed as two utterances on syntactic grounds. For more information, see Determining Utterance Boundaries on the ENNI website. For the purposes of these analyses, intonation alone never determines utterance boundaries.

    Transcribing Words and Utterances using CHAT Conventions

    The following rules dictate how words and utterances must be transcribed on main lines:

    • Only capitalize proper nouns and the word “I”. Do not capitalize beginnings of utterances.
    • Utterances must end with a terminating punctuation mark: either a period, an exclamation mark, or a question mark.
    • Commas can be used to mark phrasal junctions within utterances, but are not required.
    • Common acronyms are spelled out as if they were words (e.g. tv, dvd, vcr).

    Excluding certain utterances

    Utterances that are incomplete or not part of the narrative, should be transcribed, but excluded from ENNI analyses. This is done by adding [+ bch] (back channel) at the end of the line, after punctuation.

    For example:

    *CHI: she saw an airplane.
    *CHI: and then. [+ bch]  (incomplete utterance)
    *CHI: I have an airplane too! [+ bch] (not part of the story)

    Story-enders, such as "the end" or "finished" are also excluded. These are not counted because they are very short and may unfairly shorten the child's mean length of utterance. Also, since not all children use story-enders, the effect is not uniform.

    Common symbols used in transcripts






    *CHI: and they ran down the xxx.


    & denotes non-word sounds, stutters, and sound effects

    *CHI: then they go &grr.


    parentheses indicate omitted parts of words

    *CHI: I (a)m running (be)cause I (am) scared.
    (child had said “I’m running cause I’m scared”)


    indicates repetition

    *CHI: <and the> [/] and the elephant ran [/] ran across the deck.

    (the angled brackets are used when the repeated material consists of more than one word)


    indicates retracing, i.e. repeating the word(s) with some changes

    *CHI: <and he says> [//] and she says thank you.

    (the angled brackets are used when the repeated material consists of more than one word)

    Note: Fillers like “uh” and “um” should be transcribed with an “&” to ensure that these are not included in the analyses. For example: &uh, &um, &oh, &mmm, &hmm, &huh, &like. However, “&” should not be used with these words if they have meaning. For example, “Huh?” meaning “What?” or “like” used as a verb are not fillers, but words in their own right.

    Completing the Transcription

    • Save narrative transcripts as one separate file per child. Indicate the beginning of a new story with an @Situation line.
      • For example:
        *CHI:  and then the elephant gets her ball back.
        *CHI:  and then the elephant say thanks.
        *EXP:  the end!
        @Situation:  Narrative B3
        *CHI:  the elephant and the horse go to the pool again.
        *CHI:  the horse is playing so happy.
    • Be sure to add the line “@End” at the end of the transcript and save the file. All file names must end in the file extension .CHA.
    • When you are done transcribing, check the transcripts for coding errors. This is by running a checkcommand in CLAN.
      • Open the Commands window, and type check into the command field.
      • Click File In and highlight the file you are working on in the lower left-hand pane.
      • Click Add->, then Done.
      • In the Commands window, click Run. 
        The image below shows the Commands window after a file has been added. Note the character "@" after the command. This indicates that a command is ready to be run.
      • For researchers who are just beginning to use CHAT and CLAN, we recommend transcribing one narrative first and running the check command to ensure that the file format and transcription conventions have been followed correctly. After seeing that check has run successfully, continue transcribing the remaining narratives.
    Top of Page