3.6. String Literals๏ƒ

3.6.1. Escape Characters๏ƒ

  • \n - New line (ENTER)

  • \t - Horizontal Tab (TAB)

  • \' - Single quote ' (escape in single quoted strings)

  • \" - Double quote " (escape in double quoted strings)

  • \\ - Backslash \ (to indicate, that this is not escape char)

  • More information in Builtin Printing

  • https://en.wikipedia.org/wiki/List_of_Unicode_characters

>>> print('Hello\World')
Hello\World
>>> print('Hello\nWorld')
Hello
World
>>> print('Hello\tWorld')  
Hello   World

3.6.2. Unicode๏ƒ

>>> print('\U0001F680')
๐Ÿš€
>>> a = '\U0001F9D1'  # ๐Ÿง‘
>>> b = '\U0000200D'  # ''
>>> c = '\U0001F680'  # ๐Ÿš€
>>>
>>> astronaut = a + b + c
>>> print(astronaut)
๐Ÿง‘โ€๐Ÿš€

3.6.3. Format String๏ƒ

  • String interpolation (variable substitution)

  • Since Python 3.6

  • Used for str concatenation

>>> name = 'Mark'
>>>
>>> print('Hello {name}')
Hello {name}
>>> name = 'Mark'
>>> print(f'Hello {name}')
Hello Mark

3.6.4. Unicode Literal๏ƒ

  • In Python 3 str is Unicode

  • In Python 2 str is Bytes

  • In Python 3 u'...' is only for compatibility with Python 2

>>> u'zaลผรณล‚ฤ‡ gฤ™ล›lฤ… jaลบล„'
'zaลผรณล‚ฤ‡ gฤ™ล›lฤ… jaลบล„'

3.6.5. Bytes Literal๏ƒ

  • Used while reading from low level devices and drivers

  • Used in sockets and HTTP connections

  • bytes is a sequence of octets (integers between 0 and 255)

  • bytes.decode() conversion to unicode str

  • str.encode() conversion to bytes

>>> data = 'Moon'   # Unicode Literal
>>> data = u'Moon'  # Unicode Literal
>>> data = b'Moon'  # Bytes Literal

Encode string from unicode (UTF-8) string to bytes:

>>> data = 'czeล›ฤ‡'
>>>
>>> data.encode()
b'cze\xc5\x9b\xc4\x87'

Decode string from bytes to unicode (UTF-8):

>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'czeล›ฤ‡'

Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data:

>>> data = 'czeล›ฤ‡'
>>>
>>>
>>> data.encode('utf-8')
b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.encode('iso-8859-2')
b'cze\xb6\xe6'
>>>
>>> data.encode('windows-1250')
b'cze\x9c\xe6'
>>>
>>> data.encode('cp1250')
b'cze\x9c\xe6'

3.6.6. Raw String๏ƒ

  • Escapes does not matters

>>> print('Print "\n" to get new line')
Print "
" to get new line
>>> print('Print "\\n" to get new line')
Print "\n" to get new line

3.6.7. Use Case - 0x01๏ƒ

Raw-string in Regular Expressions:

>>> '\\b[a-z]+\\b'
'\\b[a-z]+\\b'
>>> r'\b[a-z]+\b'
'\\b[a-z]+\\b'

3.6.8. Use Case - 0x02๏ƒ

Raw-string in escaping tab character:

>>> print('C:\watney\temporary.txt')  
C:\watney       emporary.txt
>>>
>>> print(r'C:\watney\temporary.txt')
C:\watney\temporary.txt

Raw-string in escaping newline character:

>>> print('C:\nasa\myfile.txt')
C:
asa\myfile.txt
>>>
>>> print(r'C:\nasa\myfile.txt')
C:\nasa\myfile.txt

Raw-string in escaping newline and tab character:

>>> print('C:\nasa\temporary.txt')  
C:
asa     emporary.txt
>>>
>>> print(r'C:\nasa\temporary.txt')
C:\nasa\temporary.txt

3.6.9. Use Case - 0x03๏ƒ

There are no problems with escapes in POSIX compliant paths:

>>> path = '/home/mwatney/myfile.txt'  # Linux
>>> path = '/User/mwatney/myfile.txt'  # macOS

In Windows you can find escape character in paths. In order to avoid problems you can use slashes instead of backslashes:

>>> path = 'c:/Users/mwatney/myfile.txt'

This is not typical for this operating system, therefore hardly anyone does that. Typically users will put paths using slashes, and that's ok, if you are using escaped slashes or raw-strings:

>>> path = 'c:\\Users\\mwatney\\myfile.txt'
>>> path = r'c:\Users\mwatney\myfile.txt'

As soon as you forget about using either of them, the problem occurs:

>>> path = 'c:\Users\mwatney\myfile.txt'
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Problem is with \Users. After escape sequence \U... Python expects hexadecimal Unicode codepoint, i.e. '\U0001F680' which is a rocket ๐Ÿš€ emoticon. In this example, Python finds letter s, which is invalid hexadecimal character and therefore raises an SyntaxError telling user that there is an error with decoding bytes. The only valid hexadecimal numbers are 0123456789abcdefABCDEF and s isn't one of them.

3.6.10. Assignments๏ƒ

Code 3.22. Solution๏ƒ
"""
* Assignment: String Literals Emoticon
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Print `Hello ๐Ÿ˜€`
    2. Run doctests - all must succeed

Polish:
    1. Wypisz `Hello ๐Ÿ˜€`
    2. Uruchom doctesty - wszystkie muszฤ… siฤ™ powieล›ฤ‡

Hints:
    * ๐Ÿ˜€ unicode codepoint is `\U0001F600`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> '๐Ÿ˜€' in result
    True
    >>> result
    'Hello ๐Ÿ˜€'
"""

# str: Hello ๐Ÿ˜€
result = ...