4.5. Str Methods

4.5.1. Rationale

  • str is immutable

  • str methods create a new modified str

4.5.2. Strip Whitespace

>>> name = '\tAngus MacGyver    \n'
>>>
>>>
>>> name.strip()
'Angus MacGyver'
>>>
>>> name.rstrip()
'\tAngus MacGyver'
>>>
>>> name.lstrip()
'Angus MacGyver    \n'

4.5.3. Change Case

  • Unify data format before analysis

>>> name = 'Angus MacGyver III'
>>>
>>>
>>> name.upper()
'ANGUS MACGYVER III'
>>>
>>> name.lower()
'angus macgyver iii'
>>>
>>> name.title()
'Angus Macgyver Iii'
>>>
>>> name.capitalize()
'Angus macgyver iii'

4.5.4. Replace

>>> name = 'Angus MacGyver Iii'
>>>
>>>
>>> name.replace('Iii', 'III')
'Angus MacGyver III'

4.5.5. Starts With

>>> email = 'mark.watney@nasa.gov'
>>>
>>>
>>> email.startswith('mark.watney')
True
>>>
>>> email.startswith('melissa.lewis')
False
>>> email = 'mark.watney@nasa.gov'
>>> vip = ('mark.watney', 'melissa.lewis')
>>>
>>>
>>> email.startswith(vip)
True

4.5.6. Ends With

>>> email = 'mark.watney@nasa.gov'
>>>
>>>
>>> email.endswith('nasa.gov')
True
>>>
>>> email.endswith('esa.int')
False
>>> email = 'mark.watney@nasa.gov'
>>> whitelist = ('nasa.gov', 'esa.int')
>>>
>>> email.endswith(whitelist)
True

4.5.7. Split by Line

>>> text = 'Hello\nPython\nWorld'
>>>
>>> text.splitlines()
['Hello', 'Python', 'World']
>>> text = """We choose to go to the Moon!
... We choose to go to the Moon in this decade and do the other things,
... not because they are easy, but because they are hard;
... because that goal will serve to organize and measure the best of our
... energies and skills, because that challenge is one that we are willing
... to accept, one we are unwilling to postpone, and one we intend to win,
... and the others, too."""
>>>
>>>
>>> text.splitlines()  
['We choose to go to the Moon!',
 'We choose to go to the Moon in this decade and do the other things,',
 'not because they are easy, but because they are hard;',
 'because that goal will serve to organize and measure the best of our',
 'energies and skills, because that challenge is one that we are willing',
 'to accept, one we are unwilling to postpone, and one we intend to win,',
 'and the others, too.']

4.5.8. Split by Character

  • No argument - any number of whitespaces

>>> setosa = '5.1,3.5,1.4,0.2,setosa'
>>>
>>> setosa.split(',')
['5.1', '3.5', '1.4', '0.2', 'setosa']
>>> text = 'We choose to go to the Moon'
>>>
>>>
>>> text.split(' ')
['We', 'choose', 'to', 'go', 'to', 'the', 'Moon']
>>>
>>> text.split()
['We', 'choose', 'to', 'go', 'to', 'the', 'Moon']
>>> text = '10.13.37.1      nasa.gov esa.int roscosmos.ru'
>>>
>>>
>>> text.split(' ')
['10.13.37.1', '', '', '', '', '', 'nasa.gov', 'esa.int', 'roscosmos.ru']
>>>
>>> text.split()
['10.13.37.1', 'nasa.gov', 'esa.int', 'roscosmos.ru']

4.5.9. Join by Character

>>> text = ['We', 'choose', 'to', 'go', 'to', 'the', 'Moon']
>>>
>>> ' '.join(text)
'We choose to go to the Moon'
>>> setosa = ['5.1', '3.5', '1.4', '0.2', 'setosa']
>>>
>>> ','.join(setosa)
'5.1,3.5,1.4,0.2,setosa'
>>> crew = ['Mark Watney', 'Jan Twardowski', 'Melissa Lewis']
>>>
>>> '\n'.join(crew)
'Mark Watney\nJan Twardowski\nMelissa Lewis'
>>> TEXT = ['We choose to go to the Moon!',
...         'We choose to go to the Moon in this decade and do the other things,',
...         'not because they are easy, but because they are hard;',
...         'because that goal will serve to organize and measure the best of our energies and skills,',
...         'because that challenge is one that we are willing to accept, one we are unwilling to postpone,',
...         'and one we intend to win, and the others, too.']
>>>
>>> print('\n'.join(TEXT))
We choose to go to the Moon!
We choose to go to the Moon in this decade and do the other things,
not because they are easy, but because they are hard;
because that goal will serve to organize and measure the best of our energies and skills,
because that challenge is one that we are willing to accept, one we are unwilling to postpone,
and one we intend to win, and the others, too.

4.5.10. Is Whitespace

>>> text = ''
>>> text.isspace()
False
>>> text = ' '
>>> text.isspace()
True
>>> text = '\t'
>>> text.isspace()
True
>>> text = '\n'
>>> text.isspace()
True
../../_images/str-methods-iss.jpg

Figure 4.1. ISS - International Space Station. Credits: NASA/Crew of STS-132 (img: s132e012208).

4.5.11. Is Alphabet Characters

>>> text = 'hello'
>>> text.isalpha()
True
>>> text = 'hello1'
>>> text.isalpha()
False

4.5.12. Is Numeric

>>> '1'.isdecimal()
True
>>>
>>> '+1'.isdecimal()
False
>>>
>>> '-1'.isdecimal()
False
>>>
>>> '1.'.isdecimal()
False
>>>
>>> '1,'.isdecimal()
False
>>>
>>> '1.0'.isdecimal()
False
>>>
>>> '1,0'.isdecimal()
False
>>>
>>> '1_0'.isdecimal()
False
>>>
>>> '10'.isdecimal()
True
>>> '1'.isdigit()
True
>>>
>>> '+1'.isdigit()
False
>>>
>>> '-1'.isdigit()
False
>>>
>>> '1.'.isdigit()
False
>>>
>>> '1,'.isdigit()
False
>>>
>>> '1.0'.isdigit()
False
>>>
>>> '1,0'.isdigit()
False
>>>
>>> '1_0'.isdigit()
False
>>>
>>> '10'.isdigit()
True
>>> '1'.isnumeric()
True
>>>
>>> '+1'.isnumeric()
False
>>>
>>> '-1'.isnumeric()
False
>>>
>>> '1.'.isnumeric()
False
>>>
>>> '1.0'.isnumeric()
False
>>>
>>> '1,0'.isnumeric()
False
>>>
>>> '1_0'.isnumeric()
False
>>>
>>> '10'.isnumeric()
True
>>> '1'.isalnum()
True
>>>
>>> '+1'.isalnum()
False
>>>
>>> '-1'.isalnum()
False
>>>
>>> '1.'.isalnum()
False
>>>
>>> '1,'.isalnum()
False
>>>
>>> '1.0'.isalnum()
False
>>>
>>> '1,0'.isalnum()
False
>>>
>>> '1_0'.isalnum()
False
>>>
>>> '10'.isalnum()
True

4.5.13. Find Sub-String Position

>>> text = 'We choose to go to the Moon'
>>>
>>>
>>> text.find('M')
23
>>>
>>> text.find('Moo')
23
>>>
>>> text.find('x')
-1

4.5.14. Count Occurrences

>>> text = 'Moon'
>>>
>>>
>>> text.count('o')
2
>>>
>>> text.count('Moo')
1
>>>
>>> text.count('x')
0

4.5.15. Remove Prefix or Suffix

Since Python 3.9: PEP 616 -- String methods to remove prefixes and suffixes

>>> filename = '1969-apollo11.txt'
>>>
>>>
>>> filename.removeprefix('1969-')
'apollo11.txt'
>>>
>>> filename.removesuffix('.txt')
'1969-apollo11'
>>>
>>> filename.removeprefix('1969-').removesuffix('.txt')
'apollo11'

4.5.16. Method Chaining

>>> text = 'Python'
>>>
>>> text = text.upper()
>>> text = text.replace('P', 'C')
>>> text = text.title()
>>>
>>> print(text)
Cython
>>> text = 'Python'
>>>
>>> text = text.upper().replace('P', 'C').title()
>>>
>>> print(text)
Cython
>>> text = 'Python'
>>>
>>> text.upper().replace('P', 'C').title()
'Cython'

How it works:

  1. text -> 'Python'

  2. 'Python'.upper() -> 'PYTHON'

  3. 'PYTHON'.replace('P', 'C') -> 'CYTHON'

  4. 'CYTHON'.title() -> 'Cython'

>>> text = 'Python'
>>>
>>> text = text.upper().startswith('P').replace('P', 'C')
Traceback (most recent call last):
AttributeError: 'bool' object has no attribute 'replace'

Note, that there cannot be any char, not even space after \ character:

>>> text = 'Python'
>>>
>>> text = text.upper() \
...            .replace('P', 'C') \
...            .title()
>>>
>>> print(text)
Cython
>>> text = 'Python'
>>>
>>> text = (text.upper()
...             .replace('P', 'C')
...             .title())
>>>
>>> print(text)
Cython

4.5.17. Assignments

Code 4.5. Solution
"""
* Assignment: Str Methods Normalize
* Required: yes
* Complexity: easy
* Lines of code: 4 lines
* Time: 8 min

English:
    1. Use `str` methods to clean `DATA`
    2. Run doctests - all must succeed

Polish:
    1. Wykorzystaj metody `str` do oczyszczenia `DATA`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'Jana Twardowskiego III'
"""

DATA = 'UL. jana \tTWArdoWskIEGO 3'

# str: Jana Twardowskiego III
result = ...