Glossary of Terms

AD - H - L - P - T - W

A

Analogue dictation: refers to the traditional tape recorders and players, which use analogue tapes

Acoustic Adaptation: process which continuously improves an author’s user profile. This is done by analysing dictations and automatically updating the users profile files to understand an author’s voice better.

Acoustic references: statistical data which describes the voice characteristics of an individual user. This can include the user’s accent, pronunciation, input device, etc. Speech Recognition uses this data to interpret an author's speech input.

Audio Wizard: tool for adjusting the settings of your audio system to ensure the best possible sound quality for speech recognition. These include setting recording volume, playback volume and determining the STNR (speech to noise ratio).

Author: User who records the sound file for a dictation. Also known as user, dictator or profile.

Auto-text: Pre-defined text which can be inserted into a document via command. Also known as a Macro.

B

Binaural: A binaural headset is one that has headphones on both ears, i.e. stereo headphones.

C

Command grammar: Collection of words or phrases which are not recognized as text but upon recognition execute a specific action. For example, text formatting commands and document navigation commands will be grouped into their own respective command grammars.

Correction: Editing a document created by speech recognition in order to replace incorrectly recognised words or phrases. Correction may be performed by the author or a designated correctionist, and can be aided by functions such as synchronous playback and correction alternatives.

Correction alternative:Alternative recognition results which are offered by the Lexicon and can be used to replace incorrectly recognized words or phrases.

Compression: How a particular file format is reduced in size (for faster transfer between recorder and computer, and computer to computer). In the case of dictation for instance, only voice frequencies are useful, and so low and high frequencies are truncated. The most common compressions for digital dictation are DSS, DSS Pro, LPEC.

Compression Ratio:The ratio of the size of a compressed digital file to the original uncompressed digital file.

Conf (erence) Recording: Mode for the microphone sensitivity. “Conf” is the high-sensitivity mode that records sounds in all directions and is e.g. suited for conferences.

D

Dictaphone: Original brand, company name and now mistakenly the generic term used for devices that record audio dictations.

Dict(ate) Setting: Mode for the microphone sensitivity. “Dict” is the low-sensitivity mode suited for dictation.

Dictation: The act of speaking into a recording device, or the sound file resulting from this process.

Dictation software: A type of computer application that allows users to speak freely as the application records what they say and then manages the dictations workflow and transcription process.

Dictionary: A large set of data used by a speech engine while doing speech recognition that defines the phonemes in a language or dialect.

Digital Dictation: The process of recording dictations on digital media.

Digital input device: Input device which converts speech into digital audio data signals, e.g. microphone, desktop mic, Digital Voice Recorder.

DSS: Stands for Digital Speech Standard and produces very small files that are easily transported.
The leaders in digital dictation (Olympus and Philips) established a sound compression scheme that preserved the vocal range, but virtually ignored other frequencies.

DSS “Pro”: Increasing the fidelity of the DSS recording, and allow for “in-recorder” encryption and additional demographic information.

DVR (Digital Voice Recorders): Generic term for dictaphones that now record in a digital format. These dictations are stored digitally and can be downloaded to a PC for further processing.

E

Editing: Append Dictation Provides the ability to add dictation to the end of a file.

Editing: Insertion Create a space within a voice file to add / record additional dictation.

Editing: Overwrite Possibility to simply over record in a previously recorded file from any point within the file.

Editing: Partial Erase Possibility to remove any part within a recording.

Encryption: Any procedure used to convert plaintext into a keyed text in order to prevent any but the intended recipient from reading that data.

Erase Protection: Protect dictation by locking the voice file to prevent accidental erasure.

F

File Format: Describes the contents of files. Common audio file formats include <dss>, <wav> and <wma> for speech recordings.

Flash Memory: This indicates the machine has a built-in memory which cannot be expanded. Check machines specifications for maximum record time.

Foot Control: Foot pedal devices which provides functions for navigation, playback within a sound file during correction or transcription process.

Frequency: The number of (sound) waves per second, measured in Hertz (Hz). Low sounds have a low frequency and high sounds have a high frequency. The hearing ability of a young person with normal hearing ranges from about 20 Hz to about 20 kHz.

FTP: File Transfer Protocol. Protocol for transferring files between computers and the internet.

G

Grammar: A file that contains a list of words and phrases to be recognized by a speech application. Grammars may also contain bits of programming logic to aid the application. All of the active grammar words make up the vocabulary

H

Hertz: A standard measure of frequency named after the German physicist Heinrich Hertz.
Frequency has the unit "per second" The symbol for Hertz is Hz.
1 kilohertz = 1 kHz = 1,000 Hz
1 megahertz = 1 MHz = 1,000,000 Hz
1 gigahertz = 1 GHz = 1,000,000,000 Hz

Hold Switch: Hold switch locks the machine into the functions selected i.e. In record mode, the hold switch protects the machine from being accidentally turned off. In Stop mode is much like a transit lock i.e. the machine won't accidentally turn on.

I

Initial Training: Process during which an author-specific profile is created from training texts read aloud. This can take from as little as 10 seconds to over an hour, depending on the user’s standard and pronunciation of English, and is important to establish a high initial recognition rate.

input device: Peripheral device (e.g. a keyboard) which enables users to give input to a computer. In a speech recognition environment, 'input device' usually refers specifically to a device for recording speech, e.g. a microphone.

Insert mode: Recording mode which inserts new speech at the current position in the sound file but does not overwrite existing parts of the sound file.

L

language model: Statistical model which represents word usage and sequences of words. The language model is specific to an author.

LC-Display: Liquid Crystal Display (LCD) LCD’s are commonly used in calculators, watches, digital cameras, laptops, and Digital Voice Recorders.

LP: Long Playback recording mode.

M

Memory Card: A removable memory medium (e.g. smartMedia-card, xD-Picture card)

Monaural: A monaural headset is one that has the headphone on one ear only.

MP3: A digital audio compression algorithm that achieves a compression factor of about twelve while preserving sound quality. It does this by optimising the compression according to the range of sound that people can actually hear. MP3 is a very powerful algorithm in a series of audio encoding standards developed under the sponsorship of the Motion Picture Experts Group (MPEG) and formalised by the International Organization for Standardization (ISO).

N

natural langauge: An approach to speech application design that encourages users to speak naturally to the system.

New Button: A fundamental difference between tape and digital. Each new file (letter, document, fax etc) will display the name of the author, recording date and time, duration of the recording and allocated file number. Great for secretaries as they can instantly identify each author and/or work type.

Ni-Cd battery: Nickel-Cadmium battery. The archetype power tool battery. 
Ni-Cd batteries prone to the memory effect which can reduce their voltage with age.

Ni-MH battery: Nickel-Metal Hydride battery. Rechargeable batteries that have an energy density 100% higher than NiCd batteries and can supply high energy levels when required. They are environmentally-friendly (free of cadmium and mercury). Among other devices, Ni-MH are used to power Digital Voice Recorders.

Noise Cancel Function: Recorded audio may be difficult to understand because of noise. This function will reduce the noise in the file for better sound quality.

O

Overwrite mode: Recording mode which inserts new speech at the current position in the sound file and overwrites existing speech for the duration of the recording.

P

PCM - Pulse Code Modulation: A method by which an audio signal is represented as digital data.

Phoneme: Smallest unit of speech used to differentiate word meanings from one another, yet without having its own meaning. For instance, the English word “food” has three phonemes (the "f" sound, the "oo" sound, and the "d" sound) but four letters. A speech engine uses its dictionary to break up vocabulary words and utterances into phonemes, and compares them to one another to perform speech recognition.

Plug and play: Developed by Intel, this standard allows the installation of hardware into a computer without the subsequent need to alter the configuration.

playback volume: Volume at which dictated text is played back through the speakers or headphones.

R

recognition rate: Percentage of words correctly recognized when dictating. Sometimes referred to as recognition accuracy.

recognition result: Text generated by the recognition of dictated speech, including ancillary data such as sound file position and correction alternatives.

recording volume: Loudness of the speech input when recording a dictation; also referred to as microphone input level.

Recording Modes:  Digital Voice Recorder support different recording modes:
HQ (High Quality) mode,
SP (Standard Playback) mode 
LP (Long Playback) mode.
SHQ (stereo High Quality) mode.

The available recording time depends on the chosen recording mode and available memory size.

Rechargeable battery: Type of battery that once empty can be recharged using a charger.

S

sound card: Adapter card in the computer hardware which enables the input and output of audio signals, for example via a microphone and speakers.

sound file: File which contains audio data in a special format, e.g. .wav or .dss.

synchronous playback: Function which highlights every word while playing back sound files. This makes it easier to find errors and to correct them. See also: asynchronous playback.

Slide Control: Designed with the busy author in mind, slide control provides instant access to all the main functions: Record, Stop, Rewind. Much easier to use and mostly preferred by professionals.

SD Card: Secure Digital Card.

SmartMedia: storage cards are small (45 mm x 37 mm x 0.76 mm) and light (approximately 2 g) storage media. The controller is located in the drive instead of being incorporated in the card to allow simple construction. SmartMedia cards are very affordable and ideal for the storage of digital data like photos, digital speech files and music.

Sensitivity: A microphone's output voltage at any given sound pressure level. A more sensitive microphone will sound louder at the same gain setting.

Sound format: Describes the format of the file holding a dictation. Examples include MP3, WAV, WMA…

Sampling rate: refers to how many times each second a sound is converted into the binary code (ones and zeros) that represent the recorded audio

Speech recognition: The process of converting a digital sound file to text.

 

T

Telephony dictation: The process of recording a dictation over a phone line

Typist: The person transcribing a dictation (digital dictation) or proofreading and finalising a document (speech recognition); also referred as support staff

transcription: Act of manually typing out a dictation recorded by an author.

U

utterance: Spoken input from the user of a speech application. An utterance may be a single word, an entire phrase, a sentence, or even several sentences.

USB: USB is an external peripheral --> interface standard for communication between a computer and external peripherals. It is standard on current operating systems and supports plug and play. USB offers faster data transfers than serial or parallel ports and offers low-speed (1.5 Mbps), full-speed (12 Mbps, or USB 1.1), and high-speed (480 Mbps, or USB 2.0).

USB Audio Class:USB Audio Class compatibility means that the digital voice recorder also functions via the computer as USB microphone or loudspeaker and direct dictation to the PC is also possible. (USB Audio Class is not supported by operating systems under Windows NT 4.)

V

Vocabulary: The total list of words the speech engine will be comparing an utterance against. The vocabulary is made up of all the words in all active grammars.

Voice recognition: A form of biometrics that identifies users by recognizing their unique voices. Though it is often used interchangeably with speech recognition, the two are different. Voice recognition is concerned with recognizing voices, while speech recognition is concerned with recognizing the content of speech.

VOR level or VA level:  Threshold level of the recording volume which enables the recording program to distinguish between speech (which should be recorded), and silence or background noise (which should not be recorded). Recording stops automatically when the speech input volume is below the VA level, and restarts automatically when the input volume is above this level. It is calculated by the Audio Wizard. Also referred to as silence detection level, voice sensitivity threshold, VOR level or VAR level.

Variable Control Voice Actuator. When the microphone senses that sounds have reached a preset volume, the built-in Variable Control Voice Actuator (VCVA) starts recording automatically and stops when the volume drops. Particularly helpful for extended recording, the VCVA not only conserves memory by turning off recording during silent periods, but also makes playback more efficient and convenient.

vocabulary: Words contained in the ConText with information on how they are pronounced. New words can be added to the vocabulary via ConText Adaptation or by using the ConText Tuner.

W

WAV file: Abbreviation for ‘wave’, Suffix used for audio files saved in Microsoft’s Wave file format. Audio file format developed by Microsoft and used extensively in Microsoft Windows. Conversion tools are available to allow most other operating systems to play .wav files.

workflow: Automation of a business process during which documents, information or tasks are passed from one workflow participant to another for action, according to a set of procedural rules. For example, it may be defined how and by whom a document is corrected, reviewed and approved.

Work Types: Setup the work types in the DSS Player software then download to the digital voice recorder. A work type can then be selected by the author to confirm the type of document dictated, e.g. reports, letters, projects.

WMA: Windows Media audio file (WMA) is a compression standard that archives CD quality at only half the size of comparable MP3 files.