Phonetic transcription of Czech
The V2T systems work with a list of words (a lexicon), where each word is given its pronunciation. E.g., a word pět is pronounced [pjet]. It is convenient for some words to include more pronunciations, such as LCD [elcédé, elsídí], DVD [dévédé, dívídí].
Words are marked in the lexicon using letters, numbers, or symbols. Their pronunciations are transcribed using phonemes. Each phoneme is always represented by a single character. E.g., the word D1, which is composed of letter D and numeral 1, is pronounced [déjedna], which is transcribed using the respective seven phonemes. The symbols for phonemes are chosen to match, if possible, the corresponding letters in the respective language.
The purpose of the phonetic transcription is to convert the word entry into the corresponding pronunciation using phonemes. Since a word in the lexicon may contain digits and other special symbols, phonetic transcription is defined over a limited set of graphemes, which consists of letters of the given alphabet.
It is easy to convert a text entry into graphemes. Transcribe a word using the characters of the given alphabet as it is pronounced. The graphemic transcription of common words is often the same as the word is normally written (and read). On the contrary, attention should be paid to foreign words, abbreviations and words containing numerals or symbols.
The phonetic transcription of the word or phrase takes place in two steps:
step: The word is transcribed using graphemes (alphabet characters) in the way it is pronounced
step: The phonetic transcription rules (described briefly at the end of the manual) or the appropriate automatic tool or program (the G2P converter) are used.
Examples of graphemic and phonetic transcription of native and foreign words, acronyms, abbreviations, numbers, etc.
Word type | Example | Graphemic transcription | Phonetic transcription |
---|---|---|---|
Native word | Petr | petr | petr |
kolo | kolo | kolo | |
chtít | chtít | Xťít | |
Foreign word | John | džon | Čon |
Voight | vojt | vojt | |
Citroën | sitroen | sitroen | |
Acronym | UNESCO | unesko | unesko |
Click4Sky | klik for skáj | klikforskáj | |
Abbreviation | ČSSD | čé es es dé | čéesezdé |
ŘSD | ř s d | řEsEdE | |
ÖMV | é em fau | éemfau | |
Multi-word expression | The Beatles | d bítls | dEbítls |
Aix en Provence | eks án prováns | eksánprováns | |
chargé d'affaires | šaržé d afér | šaržédafér | |
Numeral expressions | 23. | dvacátý třetí | dvacátítŘeťí |
7x | sedumkrát | sedumkrát | |
B52 | bé padesát dva | bépadesáddva |
The list of phonemes and corresponding alphabet characters along with explanatory notes is given below, followed by rules of phonetic transcription from graphemes to phonemes (G2P).
Množina fonémů (fonetická abeceda)
Phoneme | Graheme(s) | Example - text | Example - phonetic transcription | Note |
---|---|---|---|---|
a | a | akt | a k t | |
á | á | pár | p á r | |
b | b | bota | b o t a | |
c | c | co | c o | |
C | dz | Dzurinda | C u r i n d a | |
leckdo | l e C g d o | voiced variant of phoneme c | ||
č | č | čas | č a s | |
Č | dž | džungle | Č u N g l e | |
George | Č o r č | |||
d | d | dům | d ú m | |
ď | ď | dívka | ď í f k a | |
e | e | erb | e r p | |
é | é | éter | é t e r | |
f | f | film | f i l m | |
g | g | gól | g ó l | |
h | h | hůl | h ú l | |
X | ch | chalupa | X a l u p a | |
i | i | bil | b i l | graphemes i and y are pronounced identically in Czech |
y | byl | b i l | graphemes i and y are pronounced identically in Czech | |
í | í | bít | b í t | graphemes í and ý are pronounced identically in Czech |
ý | být | b í t | graphemes í and ý are pronounced identically in Czech | |
j | j | jen | j e n | |
k | k | kos | k o s | |
l | l | los | l o s | |
m | m | muž | m u š | |
M | m | tramvaj | t r a M v a j | labiodental m, a variant of phoneme m before f and v |
n | n | nám | n á m | |
N | n | banka | b a N k a | velar n, a variant of phoneme n before k and g |
ň | ň | někdo | ň e g d o | |
ňadro | ň a d r o | |||
o | o | olej | o l e j | |
ó | ó | óda | ó d a | |
p | p | pak | p a k | |
r | r | rok | r o k | |
ř | ř | řeka | ř e k a | voiced variant |
Ř | ř | keř | k e Ř | voiceless variant which occurs word-finally and in the vicinity of voiceless paired consonants |
příklad | p Ř í k l a t | voiceless variant which occurs word-finally and in the vicinity of voiceless paired consonants | ||
s | s | sýr | s í r | |
š | š | šel | š e l | |
t | t | tak | t a k | |
ť | ť | dítě | ď í ť e | |
u | u | ujel | u j e l | |
ú | ú | ústí | ú s ť í | |
ů | kůl | k ú l | ||
v | v | vata | v a t a | |
z | z | zrak | z r a k | |
ž | ž | žal | ž a l | |
E | ŘSD | ř E s E d E | neutral vowel, also called "schwa" which occurs particularly in abbreviations and words of English origin | |
the | d E | neutral vowel, also called "schwa" which occurs particularly in abbreviations and words of English origin |
In addition to the phonemes described above, the system also works with other acoustic models that represent a variety of noises and silence. Some of them can be used to create word pronunciations - see [list] (noises.html).
A set of allowed graphemes that can be converted into phonemes:
Czech alphabet letters (both upper-case and lower-case):
a, á, b, c, č, d, ď, e, é, ě, f, g, h, ch, i, í, j, k, l, m, n, ň, o, ó, p, q, r, ř, s, š, t, ť, u, ú, ů, v, w, x, y, ý, z, ž
Other characters (foreign alphabet characters or digits) must be, prior to phonetic transcription, converted to a grapheme above (for example, German ö in Köln is transcribed as its closest pronunciation neighbour, grapheme é).
Phonetic transcription rules (grapheme to phoneme = G2P rules)
General rules
The basic rule is that a grapheme is transcribed to a corresponding phoneme, for example doma -> d o m a. Apart from that, when trancribing certain graphemes, you must follow a few special rules as well as voicing assimilation rules described below.
Special rules
Grapheme(s) | Phoneme | Example | Note |
---|---|---|---|
d | ď | dítě -> ď í ť e | palatalization caused by the following vowel i |
dz | C | Dzurinda -> C u r i n d a | |
dž | Č | džungle -> Č u N g l e | |
ě | e | děti -> ď e ť i | |
je | pěkný -> p je k n í | ||
ňe | měl -> m ňe l | ||
m | M | tramvaj -> t r a M v a j | |
n | N | banka -> b a N k a | velar N caused by the following consonant k or g |
ň | nikdo -> ň i g d o | palatalization caused by the following vowel i | |
q | kv | Quido -> kv i d o | |
t | ť | ďítě -> ď í ť e | palatalization caused by the following vowel i |
ů | ú | půl -> p ú l | |
w | v | wolfram -> v o l f r a m | |
x | ks | text -> t e ks t | |
gz | exil -> e gz i l | ||
y | i | byl -> b i l | |
ý | í | kýl -> k í l |
Voicing assimilation
Assimilation of voicing is widely applied in Czech; voiced consonants can be, under certain circumstances, pronounced as voiceless and vice versa. This alteration in pronunciation is not reflected orthographically. Voicing assimilation occurs under the following circumstances:
voiced consonants are pronounced as voiceless word-finally, e.g., led -> l e t
under the influence of adjacent consonants - usually, voicing of the last consonant in the group affects the pronunciation of the entire group. For example, voiced consonant z in r o z t o k is pronounced as voiceless s (r o s t o k) due to the following voiceless consonant t.
Voiced and voiceless consonants constitute pairs that influence each other in the second type of the voicing assimilation:
Voicing pairs | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Voiceless consonant | p | t | c | č | k | f | s | ť | Ř | š | X |
Voiced consonant | b | d | C | Č | g | v | z | ď | ř | ž | h |
Selected examples of voicing assimilation caused by following consonant:
Assimilation | Example | Reason of assimilation |
---|---|---|
Voiced to voiceless | ||
b -> p | h l o u b k a -> h l o u p k a | b before voiceless k |
d -> t | h l a d k ý -> h l a t k í | d before voiceless k |
ď -> ť | l o ď k a -> l o ť k a | ď before voiceless k |
g -> k | g a n g s t e r -> g a N k s t e r | g before voiceless s |
h -> X | l e h k ý -> l e X k í | h before voiceless k |
ž -> š | Ž i ž k a -> ž i š k a | ž before voiceless k |
v -> f | d í v k a -> ď í f k a | v before voiceless k |
z -> s | h e z k ý -> h e s k í | z before voiceless k |
Voiceless to voiced | ||
č -> Č | l é č b a -> l é Č b a | č before voiced b |
f -> v | š é f d i r i g e n t -> š é v d i r i g e n t | f before voiced d |
s -> z | I n n s b r u c k -> i n z b r u k | s before voiced b |
š -> ž | H u š b a u e r -> h u ž b a u e r | š before voiced b |
Groups of consonants | ||
ck -> Cg | l e c k d o -> l e C g d o | c and k before voiced d |
vz -> fs | v z t á h l -> f s t á h l | v and z before voiceless t |
An exception to this rule is the phoneme v, which is subject to voicing assimilation (as can be seen in the example "dívka" above), but does not cause assimilation itself: svatba -> s v a t b a or kotva -> k o t v a.