sktalk

Documentation about scikit-talk

Subpackages

Package Contents

Classes

Conversation

Helper class that provides a standard way to create an ABC using

Corpus

Helper class that provides a standard way to create an ABC using

Utterance

Attributes

__author__

__email__

__version__

class sktalk.Conversation(utterances: list[sktalk.corpus.utterance.Utterance], metadata: dict | None = None, suppress_warnings: bool = False)[source]

Bases: sktalk.corpus.write.writer.Writer

Helper class that provides a standard way to create an ABC using inheritance.

property utterances

Get the list of utterances in the conversation.

Returns:

A list of Utterance objects representing the utterances in the conversation.

Return type:

list[Utterance]

property metadata

Get the metadata associated with the conversation.

Returns:

Additional metadata associated with the conversation.

Return type:

dict

property participants

Get the participants in the conversation.

Returns:

A set of unique participant names.

Return type:

set[str]

property metadata_df

Return the conversation metadata as a pandas dataframe.

property utterance_df

Return the conversation utterances as a pandas dataframe.

__len__()[source]

Get the number of utterances in the conversation.

Returns:

The number of utterances in the conversation.

Return type:

int

classmethod from_cha(path)[source]

Parse conversation file in Cha format

Parameters:

path (str) – Path to the Cha file

Returns:

A Conversation object representing the conversation in the file.

Return type:

Conversation

classmethod from_eaf(path: str, tiers: list[str] | None = None)[source]

Parse conversation file in ELAN format

Parameters:
  • path (str) – Path to the ELAN file

  • tiers (Optional[list[str]], optional) – List of tiers to parse. Defaults to None, in which case all tiers are parsed. If an empty list is passed, all tiers are parsed, but a warning is issued.

Raises:

KeyError if tiers are named that are not found in the file.

Returns:

A Conversation object representing the conversation in the file.

Return type:

Conversation

classmethod from_json(path)[source]

Parse conversation file in JSON format

Returns:

A Conversation object representing the conversation in the file.

Return type:

Conversation

classmethod _fromdict(fields)[source]
abstract get_utterance(index) sktalk.corpus.utterance.Utterance[source]
summary(n=10, **fields)[source]

Print the first n lines of a conversation.

Parameters:
  • n (int, optional) – Number of lines to print. Defaults to 10.

  • fields (dict) – key-value pairs with which specific utterances can be selected

select(**fields)[source]

Select utterances based on content in specific fields

Parameters:

fields (dict) – key-value pairs with which specific utterances can be selected

Returns:

Conversation object without metadata, containing a reduced set of utterances

Return type:

Conversation

remove(**fields)[source]

Remove utterances based on content in specific fields

Parameters:

fields (dict) – key-value pairs with which specific utterances can be selected

asdict()[source]

Return the Conversation as a dictionary

Returns:

dictionary containing Conversation metadata and Utterances

Return type:

dict

_subconversation_by_index(index: int, before: int = 0, after: int | None = None) Conversation[source]

Select utterances to provide context as a sub-conversation

Parameters:
  • index (int) – The index of the utterance for which to provide context

  • before (int, optional) – The number of utterances prior to indicated utterance. Defaults to 0.

  • after (int, optional) – The number of utterances after the indicated utterance. Defaults to None, which then assumes the same value as before.

Raises:

IndexError – Index provided must be within range of utterances

Returns:

Conversation object without metadata, containing a reduced set of utterances

Return type:

Conversation

_subconversation_by_time(index: int, before: int = 0, after: int = 0, exclude_utterance_overlap: bool = False) Conversation[source]

Select utterances to provide context as a sub-conversation

Parameters:
  • index (int) – The index of the utterance for which to provide context

  • before (int, optional) – The time in ms preceding the utterance’s begin. Defaults to 0.

  • after (int, optional) – The time in ms following the utterance’s end. Defaults to 0

  • exclude_utterance_overlap (bool, optional) – If True, the duration of the utterance itself is not used to identify overlapping utterances, and only the window before or after the utterance is used. Defaults to False. If True, only one of before or after can be more than 0, as the window for overlap will be limited to the window preceding or following the utterance.

Returns:

Conversation object without metadata, containing a reduced set of utterances

Return type:

Conversation

count_participants(except_none: bool = False) int[source]

Count the number of participants in a conversation

Importantly: if one of the utterances has no participant, it is counted as a separate participant (None). If you want to exclude these, set except_none to True.

Parameters:

except_none (bool, optional) – if True, utterances without a participant are not counted. Defaults to False.

Returns:

number of participants

Return type:

int

_update(field: str, values: list, **kwargs)[source]

Update all utterances in the conversation with calculated values

This function also stores relevant arguments in the Conversation metadata.

Parameters:
  • field (str) – field of the Utterance to update

  • values (list) – list of values to update each utterance with

  • kwargs (dict) – information about the calculation to store in the Conversation metadata

calculate_FTO(window: int = 10000, planning_buffer: int = 200, n_participants: int = 2)[source]

Calculate Floor Transfer Offset (FTO) per utterance

FTO is defined as the difference between the time that a turn starts and the end of the most relevant prior turn by the other participant, which is not necessarily the prior utterance.

An utterance does not receive an FTO if there are preceding utterances within the window that do not have timing information, or if it lacks timing information itself.

Parameters:
  • window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.

  • planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.

  • n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.

relevant_prior_utterance(index, window=10000, planning_buffer=200, n_participants=2)[source]

Determine the most relevant prior utterance for a given utterance

To be a relevant prior turn, the following conditions must be met, respective to utterance U: - the utterance must be by another speaker than U - the utterance by the other speaker must be the most recent utterance by that speaker - the utterance must have started before utterance U, more than planning_buffer ms before. - the utterance must be partly or entirely within the context window (window ms prior

to the start of utterance U)

  • within the context window, there must be a maximum of n_participants speakers.

Parameters:
  • index (int) – index of the utterance to assess

  • window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.

  • planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.

  • n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.

Returns:

the most relevant prior utterance, or None, if no relevant prior utterance can be identified

Return type:

Utterance

class sktalk.Corpus(conversations: list[sktalk.corpus.conversation.Conversation] = None, **metadata)[source]

Bases: sktalk.corpus.write.writer.Writer

Helper class that provides a standard way to create an ABC using inheritance.

property metadata

Get the metadata associated with the Corpus.

Returns:

Additional metadata associated with the Corpus.

Return type:

dict

property conversations

Get the conversations contained in the Corpus

Returns:

listed conversations contained in this Corpus

Return type:

list

property metadata_df

Return the corpus metadata as a pandas dataframe.

property utterance_df

Return the corpus utterances as a pandas dataframe.

__add__(other: Corpus) Corpus[source]
append(conversation: sktalk.corpus.conversation.Conversation)[source]

Append a conversation to the Corpus

Parameters:

conversation (Conversation) – Conversation object that should be added to the Corpus

asdict()[source]

Return the Corpus as a dictionary

Returns:

dictionary containing Corpus metadata and Conversations

Return type:

dict

classmethod from_json(path)[source]

Parse corpus file in JSON format

Returns:

A Corpus object representing the corpus in the file.

Return type:

Corpus

classmethod _fromdict(fields)[source]
classmethod from_xml(path)[source]
class sktalk.Utterance[source]
utterance: str
participant: str | None
time: list | None
begin: int | None
begin_timestamp: str | None
end: int | None
end_timestamp: str | None
utterance_raw: str | None
utterance_list: list[str] | None
n_words: int | None
n_characters: int | None
FTO: int | None
metadata: dict[str, Any] | None
__post_init__()[source]
get_audio()[source]
asdict()[source]
classmethod _fromdict(fields)[source]
until(other)[source]
overlap(other)[source]
window_overlap(time)[source]
overlap_duration(other)[source]
window_overlap_duration(time)[source]
overlap_percentage(other)[source]
window_overlap_percentage(time)[source]
same_speaker(other)[source]
precede_with_buffer(other, planning_buffer=200)[source]
_validate_time()[source]
static _to_timestamp(time_ms)[source]
static _clean_utterance(utterance)[source]
sktalk.__author__ = 'Barbara Vreede'[source]
sktalk.__email__ = 'b.vreede@esciencecenter.nl'[source]
sktalk.__version__ = '0.1.1'[source]