`sktalk`

Documentation about scikit-talk

Subpackages

sktalk.corpus

Package Contents

Classes

`Conversation`	Helper class that provides a standard way to create an ABC using
`Corpus`	Helper class that provides a standard way to create an ABC using
`Utterance`

Attributes

`__author__`
`__email__`
`__version__`

class sktalk.Conversation(utterances: list[sktalk.corpus.utterance.Utterance], metadata: dict | None = None, suppress_warnings: bool = False)[source]

Bases: sktalk.corpus.write.writer.Writer

Helper class that provides a standard way to create an ABC using inheritance.

property utterances

Get the list of utterances in the conversation.

Returns:: A list of Utterance objects representing the utterances in the conversation.
Return type:: list[Utterance]

property metadata

Get the metadata associated with the conversation.

Returns:: Additional metadata associated with the conversation.
Return type:: dict

property participants

Get the participants in the conversation.

Returns:: A set of unique participant names.
Return type:: set[str]

property metadata_df: Return the conversation metadata as a pandas dataframe.

property utterance_df: Return the conversation utterances as a pandas dataframe.

__len__()[source]

Get the number of utterances in the conversation.

Returns:: The number of utterances in the conversation.
Return type:: int

classmethod from_cha(path)[source]

Parse conversation file in Cha format

Parameters:: path (str) – Path to the Cha file
Returns:: A Conversation object representing the conversation in the file.
Return type:: Conversation

classmethod from_eaf(path: str, tiers: list[str] | None = None)[source]

Parse conversation file in ELAN format

Parameters:

path (str) – Path to the ELAN file
tiers (Optional[list[str]], optional) – List of tiers to parse. Defaults to None, in which case all tiers are parsed. If an empty list is passed, all tiers are parsed, but a warning is issued.

Raises:

KeyError if tiers are named that are not found in the file. –

Returns:

A Conversation object representing the conversation in the file.

Return type:

Conversation

classmethod from_json(path)[source]

Parse conversation file in JSON format

Returns:: A Conversation object representing the conversation in the file.
Return type:: Conversation

classmethod _fromdict(fields)[source]

abstract get_utterance(index) → sktalk.corpus.utterance.Utterance[source]

summary(n=10, **fields)[source]

Print the first n lines of a conversation.

Parameters:

n (int, optional) – Number of lines to print. Defaults to 10.
fields (dict) – key-value pairs with which specific utterances can be selected

select(**fields)[source]

Select utterances based on content in specific fields

Parameters:: fields (dict) – key-value pairs with which specific utterances can be selected
Returns:: Conversation object without metadata, containing a reduced set of utterances
Return type:: Conversation

remove(**fields)[source]

Remove utterances based on content in specific fields

Parameters:: fields (dict) – key-value pairs with which specific utterances can be selected

asdict()[source]

Return the Conversation as a dictionary

Returns:: dictionary containing Conversation metadata and Utterances
Return type:: dict

_subconversation_by_index(index: int, before: int = 0, after: int | None = None) → Conversation[source]

Select utterances to provide context as a sub-conversation

Parameters:

index (int) – The index of the utterance for which to provide context
before (int, optional) – The number of utterances prior to indicated utterance. Defaults to 0.
after (int, optional) – The number of utterances after the indicated utterance. Defaults to None, which then assumes the same value as before.

Raises:

IndexError – Index provided must be within range of utterances

Returns:

Conversation object without metadata, containing a reduced set of utterances

Return type:

Conversation

_subconversation_by_time(index: int, before: int = 0, after: int = 0, exclude_utterance_overlap: bool = False) → Conversation[source]

Select utterances to provide context as a sub-conversation

Parameters:

index (int) – The index of the utterance for which to provide context
before (int, optional) – The time in ms preceding the utterance’s begin. Defaults to 0.
after (int, optional) – The time in ms following the utterance’s end. Defaults to 0
exclude_utterance_overlap (bool, optional) – If True, the duration of the utterance itself is not used to identify overlapping utterances, and only the window before or after the utterance is used. Defaults to False. If True, only one of before or after can be more than 0, as the window for overlap will be limited to the window preceding or following the utterance.

Returns:

Conversation object without metadata, containing a reduced set of utterances

Return type:

Conversation

count_participants(except_none: bool = False) → int[source]

Count the number of participants in a conversation

Importantly: if one of the utterances has no participant, it is counted as a separate participant (None). If you want to exclude these, set except_none to True.

Parameters:: except_none (bool, optional) – if True, utterances without a participant are not counted. Defaults to False.
Returns:: number of participants
Return type:: int

_update(field: str, values: list, **kwargs)[source]

Update all utterances in the conversation with calculated values

This function also stores relevant arguments in the Conversation metadata.

Parameters:

field (str) – field of the Utterance to update
values (list) – list of values to update each utterance with
kwargs (dict) – information about the calculation to store in the Conversation metadata

calculate_FTO(window: int = 10000, planning_buffer: int = 200, n_participants: int = 2)[source]

Calculate Floor Transfer Offset (FTO) per utterance

FTO is defined as the difference between the time that a turn starts and the end of the most relevant prior turn by the other participant, which is not necessarily the prior utterance.

An utterance does not receive an FTO if there are preceding utterances within the window that do not have timing information, or if it lacks timing information itself.

Parameters:

window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.
planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.
n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.

relevant_prior_utterance(index, window=10000, planning_buffer=200, n_participants=2)[source]

Determine the most relevant prior utterance for a given utterance

To be a relevant prior turn, the following conditions must be met, respective to utterance U: - the utterance must be by another speaker than U - the utterance by the other speaker must be the most recent utterance by that speaker - the utterance must have started before utterance U, more than planning_buffer ms before. - the utterance must be partly or entirely within the context window (window ms prior

to the start of utterance U)

within the context window, there must be a maximum of n_participants speakers.

Parameters:

index (int) – index of the utterance to assess
window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.
planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.
n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.

Returns: