sktalk
Documentation about scikit-talk
Subpackages
Package Contents
Classes
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Attributes
- class sktalk.Conversation(utterances: list[sktalk.corpus.utterance.Utterance], metadata: dict | None = None, suppress_warnings: bool = False)[source]
Bases:
sktalk.corpus.write.writer.Writer
Helper class that provides a standard way to create an ABC using inheritance.
- property utterances
Get the list of utterances in the conversation.
- property metadata
Get the metadata associated with the conversation.
- Returns:
Additional metadata associated with the conversation.
- Return type:
- property participants
Get the participants in the conversation.
- property metadata_df
Return the conversation metadata as a pandas dataframe.
- property utterance_df
Return the conversation utterances as a pandas dataframe.
- __len__()[source]
Get the number of utterances in the conversation.
- Returns:
The number of utterances in the conversation.
- Return type:
- classmethod from_cha(path)[source]
Parse conversation file in Cha format
- Parameters:
path (str) – Path to the Cha file
- Returns:
A Conversation object representing the conversation in the file.
- Return type:
- classmethod from_eaf(path: str, tiers: list[str] | None = None)[source]
Parse conversation file in ELAN format
- Parameters:
- Raises:
KeyError if tiers are named that are not found in the file. –
- Returns:
A Conversation object representing the conversation in the file.
- Return type:
- classmethod from_json(path)[source]
Parse conversation file in JSON format
- Returns:
A Conversation object representing the conversation in the file.
- Return type:
- abstract get_utterance(index) sktalk.corpus.utterance.Utterance [source]
- select(**fields)[source]
Select utterances based on content in specific fields
- Parameters:
fields (dict) – key-value pairs with which specific utterances can be selected
- Returns:
Conversation object without metadata, containing a reduced set of utterances
- Return type:
- remove(**fields)[source]
Remove utterances based on content in specific fields
- Parameters:
fields (dict) – key-value pairs with which specific utterances can be selected
- asdict()[source]
Return the Conversation as a dictionary
- Returns:
dictionary containing Conversation metadata and Utterances
- Return type:
- _subconversation_by_index(index: int, before: int = 0, after: int | None = None) Conversation [source]
Select utterances to provide context as a sub-conversation
- Parameters:
index (int) – The index of the utterance for which to provide context
before (int, optional) – The number of utterances prior to indicated utterance. Defaults to 0.
after (int, optional) – The number of utterances after the indicated utterance. Defaults to None, which then assumes the same value as before.
- Raises:
IndexError – Index provided must be within range of utterances
- Returns:
Conversation object without metadata, containing a reduced set of utterances
- Return type:
- _subconversation_by_time(index: int, before: int = 0, after: int = 0, exclude_utterance_overlap: bool = False) Conversation [source]
Select utterances to provide context as a sub-conversation
- Parameters:
index (int) – The index of the utterance for which to provide context
before (int, optional) – The time in ms preceding the utterance’s begin. Defaults to 0.
after (int, optional) – The time in ms following the utterance’s end. Defaults to 0
exclude_utterance_overlap (bool, optional) – If True, the duration of the utterance itself is not used to identify overlapping utterances, and only the window before or after the utterance is used. Defaults to False. If True, only one of before or after can be more than 0, as the window for overlap will be limited to the window preceding or following the utterance.
- Returns:
Conversation object without metadata, containing a reduced set of utterances
- Return type:
- count_participants(except_none: bool = False) int [source]
Count the number of participants in a conversation
Importantly: if one of the utterances has no participant, it is counted as a separate participant (None). If you want to exclude these, set except_none to True.
- _update(field: str, values: list, **kwargs)[source]
Update all utterances in the conversation with calculated values
This function also stores relevant arguments in the Conversation metadata.
- calculate_FTO(window: int = 10000, planning_buffer: int = 200, n_participants: int = 2)[source]
Calculate Floor Transfer Offset (FTO) per utterance
FTO is defined as the difference between the time that a turn starts and the end of the most relevant prior turn by the other participant, which is not necessarily the prior utterance.
An utterance does not receive an FTO if there are preceding utterances within the window that do not have timing information, or if it lacks timing information itself.
- Parameters:
window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.
planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.
n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.
- relevant_prior_utterance(index, window=10000, planning_buffer=200, n_participants=2)[source]
Determine the most relevant prior utterance for a given utterance
To be a relevant prior turn, the following conditions must be met, respective to utterance U: - the utterance must be by another speaker than U - the utterance by the other speaker must be the most recent utterance by that speaker - the utterance must have started before utterance U, more than planning_buffer ms before. - the utterance must be partly or entirely within the context window (window ms prior
to the start of utterance U)
within the context window, there must be a maximum of n_participants speakers.
- Parameters:
index (int) – index of the utterance to assess
window (int, optional) – the time in ms prior to utterance in which a relevant preceding utterance can be found. Defaults to 10000.
planning_buffer (int, optional) – minimum speaking time in ms to allow for a response. Defaults to 200.
n_participants (int, optional) – maximum number of participants overlapping with the utterance and preceding window. Defaults to 2.
- Returns:
the most relevant prior utterance, or None, if no relevant prior utterance can be identified
- Return type:
- class sktalk.Corpus(conversations: list[sktalk.corpus.conversation.Conversation] = None, **metadata)[source]
Bases:
sktalk.corpus.write.writer.Writer
Helper class that provides a standard way to create an ABC using inheritance.
- property metadata
Get the metadata associated with the Corpus.
- Returns:
Additional metadata associated with the Corpus.
- Return type:
- property conversations
Get the conversations contained in the Corpus
- Returns:
listed conversations contained in this Corpus
- Return type:
- property metadata_df
Return the corpus metadata as a pandas dataframe.
- property utterance_df
Return the corpus utterances as a pandas dataframe.
- append(conversation: sktalk.corpus.conversation.Conversation)[source]
Append a conversation to the Corpus
- Parameters:
conversation (Conversation) – Conversation object that should be added to the Corpus
- asdict()[source]
Return the Corpus as a dictionary
- Returns:
dictionary containing Corpus metadata and Conversations
- Return type: