Getting started with `scikit-talk`

scikit-talk can be used to explore and analyse conversation files.

It contains three main levels of objects:

Corpora; described with the Corpus class
Conversations; described with the Conversation class
Utterances; described with the Utterance class

To explore the power of scikit-talk, the best entry point is a parser. With the parsers, we can load data into a scikit-talk object.

scikit-talk currently has the following parsers:

Conversation.from_cha(), which parses .cha files.
Conversation.from_eaf(), which parses ELAN (.eaf) files.

Future plans include the creation of parsers for:

.TextGrid files
.xml files

Parsers return an object of the Conversation class.

To get started with scikit-talk, import the module:

[2]:

import sktalk

To see it in action, we will need to start with a transcription file.

For example, you can download a file from the Griffith Corpus of Spoken Australian English. This publicly available corpus contains transcription files in .cha format.

Another publicly available corpus is the IFADV corpus, which contains annotations as .eaf files.

We will go over both options below.

Parsing a `.cha` file

From the Griffith corpus, we have downloaded this file.

We will parse the file with the Conversation.from_cha() method, resulting in a Conversation object.:

[3]:

griffith01 = sktalk.Conversation.from_cha('01.cha')

griffith01

[3]:

<sktalk.corpus.conversation.Conversation at 0x10fce4be0>

A parsed cha file is a conversation object. It has metadata, and a collection of utterances:

[4]:

griffith01.utterances[:10]

[4]:

[Utterance(utterance='0', participant='S', time=[0, 1500], begin='00:00:00.000', end='00:00:01.500', metadata=None, utterance_clean='0', utterance_list=['0'], n_words=1, n_characters=1, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance="mm I'm glad I saw you⇗", participant='S', time=[1500, 2775], begin='00:00:01.500', end='00:00:02.775', metadata=None, utterance_clean='mm Im glad I saw you', utterance_list=['mm', 'Im', 'glad', 'I', 'saw', 'you'], n_words=6, n_characters=15, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance="I thought I'd lost you", participant='S', time=[2775, 3773], begin='00:00:02.775', end='00:00:03.773', metadata=None, utterance_clean='I thought Id lost you', utterance_list=['I', 'thought', 'Id', 'lost', 'you'], n_words=5, n_characters=17, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance='le⌉,', participant='H', time=[4052, 5515], begin='00:00:04.052', end='00:00:05.515', metadata=None, utterance_clean='le', utterance_list=['le'], n_words=1, n_characters=2, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance='⌊xxx⌋', participant='S', time=[4052, 5817], begin='00:00:04.052', end='00:00:05.817', metadata=None, utterance_clean='xxx', utterance_list=['xxx'], n_words=1, n_characters=3, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance=": (.) if ʔI couldn't boʔrrow, (1.3)", participant='S', time=[6140, 9487], begin='00:00:06.140', end='00:00:09.487', metadata=None, utterance_clean='  if ʔI couldnt boʔrrow 13', utterance_list=['if', 'ʔI', 'couldnt', 'boʔrrow', '13'], n_words=5, n_characters=20, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance='r', participant='S', time=[9487, 12888], begin='00:00:09.487', end='00:00:12.888', metadata=None, utterance_clean='r', utterance_list=['r'], n_words=1, n_characters=1, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance='nicating acro-', participant='H', time=[12888, 14050], begin='00:00:12.888', end='00:00:14.050', metadata=None, utterance_clean='nicating acro', utterance_list=['nicating', 'acro'], n_words=2, n_characters=12, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance='for family gender and sexuality', participant='H', time=[14050, 17014], begin='00:00:14.050', end='00:00:17.014', metadata=None, utterance_clean='for family gender and sexuality', utterance_list=['for', 'family', 'gender', 'and', 'sexuality'], n_words=5, n_characters=27, time_to_next=None, dyadic=None, FTO=None),
 Utterance(utterance="that's the second on is itʔ", participant='S', time=[17014, 18611], begin='00:00:17.014', end='00:00:18.611', metadata=None, utterance_clean='thats the second on is itʔ', utterance_list=['thats', 'the', 'second', 'on', 'is', 'itʔ'], n_words=6, n_characters=21, time_to_next=None, dyadic=None, FTO=None)]

[5]:

griffith01.metadata

[5]:

{'source': '01.cha',
 'UTF8': '',
 'PID': '11312/t-00017232-1',
 'Languages': ['eng'],
 'Participants': {'S': {'name': 'Sarah',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''},
  'H': {'name': 'Hannah',
   'language': 'eng',
   'corpus': 'GCSAusE',
   'age': '',
   'sex': '',
   'group': '',
   'ses': '',
   'role': 'Adult',
   'education': '',
   'custom': ''}},
 'Options': 'CA',
 'Media': '01, audio'}

We can explore the conversation using the summary method:

[6]:

griffith01.summary(n=30)

(0 - 1500) S: '0'
(1500 - 2775) S: 'mm I'm glad I saw you⇗'
(2775 - 3773) S: 'I thought I'd lost you'
(4052 - 5515) H: 'le⌉,'
(4052 - 5817) S: '⌊xxx⌋'
(6140 - 9487) S: ': (.) if ʔI couldn't boʔrrow, (1.3)'
(9487 - 12888) S: 'r'
(12888 - 14050) H: 'nicating acro-'
(14050 - 17014) H: 'for family gender and sexuality'
(17014 - 18611) S: 'that's the second on is itʔ'
(18611 - 21090) H: '+≋ I think it's s⌈ame family gender⌉ has a second book'
(19011 - 20132) S: '⌊whatever xxx⌋'
(21090 - 23087) H: 'not communicating across cultures'
(24457 - 25746) H: '⌈family gen⌈der has two'
(24457 - 25931) S: '⌊can-   ⌊can I borrow it⇗'
(25931 - 26971) H: 'ʔh ⌈sure'
(26576 - 27215) S: '⌊thank you'
(27554 - 28309) H: 'I've got all my-'
(28700 - 30774) H: 'in fact all my reading books are all together,'
(31400 - 31876) H: 'so that'
(32276 - 33530) H: 'se them⇗'
(33800 - 34706) H: 'I do ∆sort of∆ think-'
(34706 - 38006) H: 'cause I don't think that one I'll be using (0.2) particularly'
(38100 - 39261) H: 'in⇗'
(40100 - 40518) S: 'ʔwhich ʔone'
(40918 - 41940) H: 'the family gender'
(42258 - 43175) H: 'I don't think it'd be-'
(43714 - 45664) H: 'though:: (.) you know something in:-'
(45800 - 47800) H: 'in the social context of Asian business'
(47800 - 49460) H: '∆°cause°∆ I missed half of that lecture⇗'

This method also allows us to look in detail at e.g. a specific participant:

[7]:

griffith01.summary(participant = 'S', n = 5)

(0 - 1500) S: '0'
(1500 - 2775) S: 'mm I'm glad I saw you⇗'
(2775 - 3773) S: 'I thought I'd lost you'
(4052 - 5817) S: '⌊xxx⌋'
(6140 - 9487) S: ': (.) if ʔI couldn't boʔrrow, (1.3)'

Parsing an `.eaf` file

From the IFADV corpus, we have downloaded this file.

We will use the Conversation.from_eaf() method to parse the file, resulting in a Conversation object.

[8]:

ifadv03 = sktalk.Conversation.from_eaf("DVA3E.EAF")

ifadv03

[8]:

<sktalk.corpus.conversation.Conversation at 0x11780f550>

ELAN formats are a bit more complex than .cha files, as they may contain additional annotations (e.g. for gestures). These annotations are stored in the ELAN format as different tiers, which end up in the Conversation object as utterances from different participants.

We can look at the participants in the conversation:

[9]:

ifadv03.participants

[9]:

{'kijkrichting spreker1 [v] (TIE1)',
 'kijkrichting spreker2 [v] (TIE3)',
 'spreker1 [v] (TIE0)',
 'spreker2 [v] (TIE2)'}

In this case, we are only interested in 'spreker1 [v] (TIE0)' and 'spreker2 [v] (TIE1)'. We want to remove the other “participants” from the conversation.

[10]:

ifadv03.remove(participant = 'kijkrichting spreker1 [v] (TIE1)')
ifadv03.remove(participant = 'kijkrichting spreker2 [v] (TIE3)')

ifadv03.participants

[10]:

{'spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'}

Another way to ensure only the right tiers are included, is to specify the tiers we want to parse when we call the from_eaf() method:

[11]:

ifadv03 = sktalk.Conversation.from_eaf("DVA3E.EAF", tiers = ['spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'])

ifadv03.participants

[11]:

{'spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'}

Analyzing turn-taking dynamics

When creating a Conversation object, a number of calculations and transformations are performed on the Utterance objects within. For example, the number of words in each utterance is calculated, and stored under Utterance.n_words. You can see this for a specific utterance as follows:

[12]:

print(griffith01.utterances[13].utterance)
print(griffith01.utterances[13].utterance_clean)
print(griffith01.utterances[13].n_words)

⌈family gen⌈der has two
family gender has two
4

More sophisticated calculations can be performed, but do not happen automatically. An example of this is the calculation of the Floor Transfer Offset (FTO) per utterance. FTO is defined as the difference between the time that a turn starts, and the end of the most relevant prior turn by the other participant. If there is overlap between these turns, the FTO is negative. If there is a pause between these utterances, the FTO is positive.

We can calculate the FTOs of the utterances in a conversation:

[13]:

griffith01.calculate_FTO()

for utterance in griffith01.utterances[:10]:
    print(f'{utterance.time} {utterance.participant} - FTO: {utterance.FTO}')

[0, 1500] S - FTO: None
[1500, 2775] S - FTO: None
[2775, 3773] S - FTO: None
[4052, 5515] H - FTO: 279
[4052, 5817] S - FTO: None
[6140, 9487] S - FTO: None
[9487, 12888] S - FTO: None
[12888, 14050] H - FTO: 0
[14050, 17014] H - FTO: None
[17014, 18611] S - FTO: 0

To determine which prior turn is the relevant turn for FTO calculation, the following criteria are used to find a relevant utterance prior to an utterance U:

the relevant utterance must be by another participant
the relevant utterance must be the most recent utterance by that participant
the relevant utterance must have started more than a specified number of ms before the start of U. This time defaults to 200 ms, but can be changed with the planning_buffer argument.
the relevant utterance must be partly or entirely within the context window. The context window is defined as 10s (or 10000ms) prior to the utterance U. The size of this window can be changed with the window argument.
within the context window, there must be a maximum of 2 speakers, which can be changed to 3 with the n_participants argument.

When calculating the FTO, the settings for the arguments planning_buffer, window, and n_participants can be changed. Their values are stored in the metadata of the conversation object when the FTOs are calculated.

They can be retrieved as follows:

[14]:

griffith01.metadata["Calculations"]["FTO"]

[14]:

{'window': 10000, 'planning_buffer': 200, 'n_participants': 2}

The `Corpus` object

A Corpus is a way to collect conversations.

A Corpus can be initialized from a single conversation, or a list of conversations. It can also be initialized as an empty object, with metadata.

[15]:

GCSAusE = sktalk.Corpus(name = "Griffith Corpus of Spoken Australian English",
                        url = "https://ca.talkbank.org/data-orig/GCSAusE/")

GCSAusE.metadata

[15]:

{'name': 'Griffith Corpus of Spoken Australian English',
 'url': 'https://ca.talkbank.org/data-orig/GCSAusE/'}

We can add conversations to a Corpus:

[16]:

GCSAusE.append(griffith01)

GCSAusE.conversations

[16]:

[<sktalk.corpus.conversation.Conversation at 0x10fce4be0>]

Storing and retrieving `Conversation` and `Corpus` objects

Both Conversation and Corpus objects can be written to file in .csv and .json formats.

json

.json files are comprehensive, and contain the entire object in one file:

[17]:

# Corpus
GCSAusE.write_json(path = "CGSAusE.json")

# Conversation
griffith01.write_json(path = "GCSAusE_01.json")

Object saved to CGSAusE.json
Object saved to GCSAusE_01.json

The objects can be recreated from the .json files:

[18]:

# Corpus
GCSAusE_2 = sktalk.Corpus.from_json("CGSAusE.json")

# Conversation
griffith01_2 = sktalk.Conversation.from_json(path = "CGSAusE_01.json")

csv

When writing to .csv, two files are created. One contains the utterances, and the other contains the metadata. The former is named using the path provided, and the metadata file is named with the suffix _metadata.csv added.

[19]:

# Corpus
GCSAusE.write_csv(path = "CGSAusE.csv")

# Conversation
griffith01.write_csv(path = "GCSAusE_01.csv")

Utterances saved to CGSAusE.csv
Metadata saved to CGSAusE_metadata.csv
Utterances saved to GCSAusE_01.csv
Metadata saved to GCSAusE_01_metadata.csv

Getting started with scikit-talk

Parsing a .cha file

Parsing an .eaf file