Getting started with scikit-talk
scikit-talk
can be used to explore and analyse conversation files.
It contains three main levels of objects:
Corpora; described with the
Corpus
classConversations; described with the
Conversation
classUtterances; described with the
Utterance
class
To explore the power of scikit-talk
, the best entry point is a parser. With the parsers, we can load data into a scikit-talk
object.
scikit-talk
currently has the following parsers:
Conversation.from_cha()
, which parses .cha files.Conversation.from_eaf()
, which parses ELAN (.eaf) files.
Future plans include the creation of parsers for:
.TextGrid files
.xml files
Parsers return an object of the Conversation
class.
To get started with scikit-talk
, import the module:
[58]:
import sktalk
To see it in action, we will need to start with a transcription file.
For example, you can download a file from the Griffith Corpus of Spoken Australian English. This publicly available corpus contains transcription files in .cha
format.
Another publicly available corpus is the IFADV corpus, which contains annotations as .eaf
files.
We will go over both options below.
Parsing a .cha
file
From the Griffith corpus, we have downloaded this file.
We will parse the file with the Conversation.from_cha()
method, resulting in a Conversation
object.:
[59]:
griffith01 = sktalk.Conversation.from_cha('01.cha')
griffith01
[59]:
<sktalk.corpus.conversation.Conversation at 0x110783e20>
A parsed cha file is a conversation object. It has metadata, and a collection of utterances:
[60]:
griffith01.utterances[:10]
[60]:
[Utterance(utterance='', participant='S', time=[0, 1500], begin=0, begin_timestamp='00:00:00.000', end=1500, end_timestamp='00:00:01.500', utterance_raw='0', utterance_list=[], n_words=0, n_characters=0, FTO=None, metadata=None),
Utterance(utterance="mm I'm glad I saw you", participant='S', time=[1500, 2775], begin=1500, begin_timestamp='00:00:01.500', end=2775, end_timestamp='00:00:02.775', utterance_raw="mm I'm glad I saw you⇗", utterance_list=['mm', "I'm", 'glad', 'I', 'saw', 'you'], n_words=6, n_characters=16, FTO=None, metadata=None),
Utterance(utterance="I thought I'd lost you", participant='S', time=[2775, 3773], begin=2775, begin_timestamp='00:00:02.775', end=3773, end_timestamp='00:00:03.773', utterance_raw="I thought I'd lost you", utterance_list=['I', 'thought', "I'd", 'lost', 'you'], n_words=5, n_characters=18, FTO=None, metadata=None),
Utterance(utterance="no I've been here for a while", participant='H', time=[4052, 5515], begin=4052, begin_timestamp='00:00:04.052', end=5515, end_timestamp='00:00:05.515', utterance_raw="⌈no I've been here for a whi:le⌉,", utterance_list=['no', "I've", 'been', 'here', 'for', 'a', 'while'], n_words=7, n_characters=23, FTO=None, metadata=None),
Utterance(utterance='xxx', participant='S', time=[4052, 5817], begin=4052, begin_timestamp='00:00:04.052', end=5817, end_timestamp='00:00:05.817', utterance_raw='⌊xxx⌋', utterance_list=['xxx'], n_words=1, n_characters=3, FTO=None, metadata=None),
Utterance(utterance="hm if ʔI couldn't boʔrrow", participant='S', time=[6140, 9487], begin=6140, begin_timestamp='00:00:06.140', end=9487, end_timestamp='00:00:09.487', utterance_raw="⌊hm:: (.) if ʔI couldn't boʔrrow, (1.3)", utterance_list=['hm', 'if', 'ʔI', "couldn't", 'boʔrrow'], n_words=5, n_characters=21, FTO=None, metadata=None),
Utterance(utterance='the second book of readings for', participant='S', time=[9487, 12888], begin=9487, begin_timestamp='00:00:09.487', end=12888, end_timestamp='00:00:12.888', utterance_raw='the second (0.2) book of readings fo:r', utterance_list=['the', 'second', 'book', 'of', 'readings', 'for'], n_words=6, n_characters=26, FTO=None, metadata=None),
Utterance(utterance='communicating acro', participant='H', time=[12888, 14050], begin=12888, begin_timestamp='00:00:12.888', end=14050, end_timestamp='00:00:14.050', utterance_raw='commu:nicating acro-', utterance_list=['communicating', 'acro'], n_words=2, n_characters=17, FTO=None, metadata=None),
Utterance(utterance='no for family gender and sexuality', participant='H', time=[14050, 17014], begin=14050, begin_timestamp='00:00:14.050', end=17014, end_timestamp='00:00:17.014', utterance_raw='no: for family gender and sexuality', utterance_list=['no', 'for', 'family', 'gender', 'and', 'sexuality'], n_words=6, n_characters=29, FTO=None, metadata=None),
Utterance(utterance="ah that's the second on is itʔ", participant='S', time=[17014, 18611], begin=17014, begin_timestamp='00:00:17.014', end=18611, end_timestamp='00:00:18.611', utterance_raw="+≋ ah: that's the second on is itʔ", utterance_list=['ah', "that's", 'the', 'second', 'on', 'is', 'itʔ'], n_words=7, n_characters=24, FTO=None, metadata=None)]
[61]:
griffith01.metadata
[61]:
{'source': '01.cha',
'UTF8': '',
'PID': '11312/t-00017232-1',
'Languages': ['eng'],
'Participants': {'S': {'name': 'Sarah',
'language': 'eng',
'corpus': 'GCSAusE',
'age': '',
'sex': '',
'group': '',
'ses': '',
'role': 'Adult',
'education': '',
'custom': ''},
'H': {'name': 'Hannah',
'language': 'eng',
'corpus': 'GCSAusE',
'age': '',
'sex': '',
'group': '',
'ses': '',
'role': 'Adult',
'education': '',
'custom': ''}},
'Options': 'CA',
'Media': '01, audio'}
We can explore the conversation using the summary
method:
[62]:
griffith01.summary()
(0 - 1500) S: ''
(1500 - 2775) S: 'mm I'm glad I saw you'
(2775 - 3773) S: 'I thought I'd lost you'
(4052 - 5515) H: 'no I've been here for a while'
(4052 - 5817) S: 'xxx'
(6140 - 9487) S: 'hm if ʔI couldn't boʔrrow'
(9487 - 12888) S: 'the second book of readings for'
(12888 - 14050) H: 'communicating acro'
(14050 - 17014) H: 'no for family gender and sexuality'
(17014 - 18611) S: 'ah that's the second on is itʔ'
This method also allows us to look in detail at e.g. a specific participant:
[63]:
griffith01.summary(participant = 'S', n = 5)
(0 - 1500) S: ''
(1500 - 2775) S: 'mm I'm glad I saw you'
(2775 - 3773) S: 'I thought I'd lost you'
(4052 - 5817) S: 'xxx'
(6140 - 9487) S: 'hm if ʔI couldn't boʔrrow'
Parsing an .eaf
file
From the IFADV corpus, we have downloaded this file.
We will use the Conversation.from_eaf()
method to parse the file, resulting in a Conversation
object.
[78]:
ifadv03 = sktalk.Conversation.from_eaf("DVA3E.EAF")
ifadv03
[78]:
<sktalk.corpus.conversation.Conversation at 0x108abca30>
ELAN formats are a bit more complex than .cha
files, as they may contain additional annotations (e.g. for gestures). These annotations are stored in the ELAN format as different tiers, which end up in the Conversation
object as utterances from different participants.
We can look at the participants in the conversation:
[65]:
ifadv03.participants
[65]:
{'kijkrichting spreker1 [v] (TIE1)',
'kijkrichting spreker2 [v] (TIE3)',
'spreker1 [v] (TIE0)',
'spreker2 [v] (TIE2)'}
Use the participant names to explore their utterances, and decide whether they are suitable for the conversation:
[66]:
ifadv03.summary(participant = 'spreker1 [v] (TIE0)', n = 5)
(855 - 1692) spreker1 [v] (TIE0): 'oké'
(1692 - 2145) spreker1 [v] (TIE0): 'ja'
(11067 - 11683) spreker1 [v] (TIE0): 'ja ja'
(18386 - 18944) spreker1 [v] (TIE0): 'ja ja'
(21240 - 23285) spreker1 [v] (TIE0): 'moet je de proefpersonen moet je die uh'
In this case, we are only interested in 'spreker1 [v] (TIE0)'
and 'spreker2 [v] (TIE1)'
. We want to remove the other “participants” from the conversation.
[67]:
ifadv03.remove(participant = 'kijkrichting spreker1 [v] (TIE1)')
ifadv03.remove(participant = 'kijkrichting spreker2 [v] (TIE3)')
ifadv03.participants
[67]:
{'spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'}
Another way to ensure only the right tiers are included, is to specify the tiers we want to parse when we call the from_eaf()
method:
[68]:
ifadv03 = sktalk.Conversation.from_eaf("DVA3E.EAF", tiers = ['spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'])
ifadv03.participants
[68]:
{'spreker1 [v] (TIE0)', 'spreker2 [v] (TIE2)'}
Analyzing turn-taking dynamics
When creating a Conversation
object, a number of calculations and transformations are performed on the Utterance
objects within. For example, the number of words in each utterance is calculated, and stored under Utterance.n_words
. You can see this for a specific utterance as follows:
[69]:
print(griffith01.utterances[13].utterance)
print(griffith01.utterances[13].utterance_raw)
print(griffith01.utterances[13].n_words)
family gender has two
⌈family gen⌈der has two
4
More sophisticated calculations can be performed, but do not happen automatically. An example of this is the calculation of the Floor Transfer Offset (FTO) per utterance. FTO is defined as the difference between the time that a turn starts, and the end of the most relevant prior turn by the other participant. If there is overlap between these turns, the FTO is negative. If there is a pause between these utterances, the FTO is positive.
We can calculate the FTOs of the utterances in a conversation:
[70]:
griffith01.calculate_FTO()
for utterance in griffith01.utterances[:10]:
print(f'{utterance.time} {utterance.participant} - FTO: {utterance.FTO}')
[0, 1500] S - FTO: None
[1500, 2775] S - FTO: None
[2775, 3773] S - FTO: None
[4052, 5515] H - FTO: 279
[4052, 5817] S - FTO: None
[6140, 9487] S - FTO: None
[9487, 12888] S - FTO: None
[12888, 14050] H - FTO: 0
[14050, 17014] H - FTO: None
[17014, 18611] S - FTO: 0
To determine which prior turn is the relevant turn for FTO calculation, the following criteria are used to find a relevant utterance prior to an utterance U:
the relevant utterance must be by another participant
the relevant utterance must be the most recent utterance by that participant
the relevant utterance must have started more than a specified number of ms before the start of U. This time defaults to 200 ms, but can be changed with the
planning_buffer
argument.the relevant utterance must be partly or entirely within the context window. The context window is defined as 10s (or 10000ms) prior to the utterance U. The size of this window can be changed with the
window
argument.within the context window, there must be a maximum of 2 speakers, which can be changed to 3 with the
n_participants
argument.
When calculating the FTO, the settings for the arguments planning_buffer
, window
, and n_participants
can be changed. Their values are stored in the metadata of the conversation object when the FTOs are calculated.
They can be retrieved as follows:
[71]:
griffith01.metadata["Calculations"]["FTO"]
[71]:
{'window': 10000, 'planning_buffer': 200, 'n_participants': 2}
The Corpus
object
A Corpus is a way to collect conversations.
A Corpus can be initialized from a single conversation, or a list of conversations. It can also be initialized as an empty object, with metadata.
[72]:
GCSAusE = sktalk.Corpus(name = "Griffith Corpus of Spoken Australian English",
url = "https://ca.talkbank.org/data-orig/GCSAusE/")
GCSAusE.metadata
[72]:
{'name': 'Griffith Corpus of Spoken Australian English',
'url': 'https://ca.talkbank.org/data-orig/GCSAusE/'}
We can add conversations to a Corpus
:
[73]:
GCSAusE.append(griffith01)
GCSAusE.conversations
[73]:
[<sktalk.corpus.conversation.Conversation at 0x110783e20>]
Storing and retrieving Conversation
and Corpus
objects
Both Conversation
and Corpus
objects can be written to file in .csv and .json formats.
json
.json files are comprehensive, and contain the entire object in one file:
[74]:
# Corpus
GCSAusE.write_json(path = "GCSAusE.json")
# Conversation
griffith01.write_json(path = "GCSAusE_01.json")
Object saved to GCSAusE.json
Object saved to GCSAusE_01.json
The objects can be recreated from the .json files:
[75]:
# Corpus
GCSAusE_2 = sktalk.Corpus.from_json("GCSAusE.json")
# Conversation
griffith01_2 = sktalk.Conversation.from_json(path = "GCSAusE_01.json")
csv
When writing to .csv, two files are created. One contains the utterances, and the other contains the metadata. The former is named using the path provided, and the metadata file is named with the suffix _metadata.csv
added.
[76]:
# Corpus
GCSAusE.write_csv(path = "CGSAusE.csv")
# Conversation
griffith01.write_csv(path = "GCSAusE_01.csv")
Utterances saved to CGSAusE.csv
Metadata saved to CGSAusE_metadata.csv
Utterances saved to GCSAusE_01.csv
Metadata saved to GCSAusE_01_metadata.csv