I love language, but have yet to find a book that gives the 30,000 foot perspective. Still, I am finding some good stuff online…
A reddit post worth a gander:
Say a chinese man is reading a text out loud. He finds a character he doesn’t know. Does he have a clue what the pronunciation is like? Does he know what tone to use? Can he take a guess, based on similarity with another character with, say, few or less strokes, or the same radical? Can he imply the meaning of that character by context?
u/contenyo answers:
There are really only five main ways Chinese characters are composed. The first is 100% pictographic. However, because of script changes, most of these pictures are hard to see for the untrained eye. For example 日 is “sun” (used to just be a circle with a dot in the middle) and 龜 is “turtle” (turn it on counterclockwise! 😛 ). Fortunately these characters, although widely used, are in a minority. Most literate people already know them.
The next it very similar to the first type. They are pictures made up of parts of other characters that represent abstract concepts. For example, 上 (originally 丄)is up, and 下 (originally 丅) is down. These are also a minority, and widely known.
The third type puts two characters together to get a new meaning. An example is 休, to “rest”. It is composed of 亻”person” and 木 “tree,” and is meant to represent someone on a tree. Again, there are few of these characters overall, but they are widely used and known.
The fourth type uses phonetic loaning. In old times, after the first three types of characters were created, people realized it was unproductive to keep making them from scratch, so they would use characters that sounded the same, or almost the same, to represent a word there was not character for. Most of these types of characters have died out in usage, but some remain. An example is 而, that originally meant “mustache” (it’s pictographic) but was used to represent “but” because they sounded the same.
The fifth type is just a refined version of phonetic loaning. When phonetic loaning got popular it got confusing because a lot of characters would have several different meanings. To solve this people put meaning components next to the phonetic loan to differentiate them. An example would be 箸 for “chopsticks.” The ⺮ means “bamboo” (from 竹, which is pictographic), while 者 is the phonetic. (In Old Chinese, the pronunciation of 箸 would have been something like ” da ” while 者 would have been ” da’ ” The ‘ is basically is your throat stopping the word so it doesn’t blend with the next one so it doesn’t blur with the next word. This stopping eventually became a tone.) About 80% of the some 80,000 Chinese characters are phonetic. There’s so many that nobody could know them all, and almost all characters people haven’t encountered before are phonetic.
So, getting back to your question, most people can distinguish the phonetic and meaning portions of a character immediately if they are common phonetics that still sound similar to each other. The only problem is, the phonetic system was invented back around 1,400 BC, but hasn’t changed much since then. Meanwhile, the sounds of the language have changed drastically. You get a bunch of things that didn’t used to be pronounced the same, becoming homophones, and a lot of things that used to sound the similar changing. English spelling, to an extent, is like this, too. “Knight” and “night” used to be pronounced differently, but now are the same. “Again” used to rhyme with words like “pain” (and still does in some older songs/poetry) but doesn’t anymore. Chinese writing just has a lot more of this, because it’s a couple thousand years older, which can get confusing. For example, if I told an average Chinese person that 他, 也, and 施 (ta, ye, and shi) all were based on the phonetic 也, they tell me I’m crazy, but the old pronunciations are actually very similar (“lhāi”, “lāi”, and “lhai”, respectively. The bar indicates length of the vowel.)
So, let’s put this all together. I give an average Chinese person an obscure word they probably don’t know, like 闍 “barbican.” (How many of you know what a barbican is? 😛 ) They immediately can say the meaning component is 門, which means “gate” (It’s pictographic. Looks like a gate, right?), so they start thinking of words that have to do with gates. (A barbican is the inner gate of the inner city walls of a castle in the middle ages, in case you were wondering. If you’ve played Age of Empires, you probably already knew that.) Then, they look at the phonetic component 者, and think of words that have pronunciations similar to other characters that contain 者 as a phonetic and words that have to do with gates. So our guy is thinking well we’ve got 者 as zhě, and 都 as dū/dōu, and 著 as zhū. If he knows the word for barbican already (but didn’t know how it was written), he’d say, well it’s got to be dū because that’s how you say barbican and it makes sense here. If he doesn’t have barbican in his vocabulary, he might guess one of the other pronunciations and be wrong.
With enough practice and characters, people can usually pretty get close to real pronunciations, but they might make minor mistakes. Studying Old Chinese poetry (especially how it was supposed to rhyme) helps immensely, but your average guy doesn’t have the time or desire to do that. (It can be kind of dry : ) Tone is particularly hard to estimate, even for those versed in Old Chinese, because a lot modern tones were actually consonant suffixes that were not hardcoded into the phonetic components.
tl:dr Most characters have a meaning and phonetic component, so if a Chinese person can pick them out and compare the character against their vocabulary, they can approximate the reading and meaning.