As we saw in part 1 of this series, each program reads stuff from input and writes stuff to output. Whenever it reads strings of characters in the input, these strings are encoded in a certain encoding such as UTF-8. The program must decode these strings into an internal representation. When writing to the output, the program encodes strings from its internal representation to an encoding such as UTF-8.
What this internal representation is is usually not our concern. For example, in Python versions earlier than 3.3, the internal representation is either UCS-2 or UCS-4, depending on how Python was compiled. In Python 3.3 or later, the internal representation is more complicated and described in PEP 393. But these details are rarely of interest; what matters is Continue reading