s

Strings in Python

By Angela C

March 1, 2021 in python strings

Reading time: 4 minutes.

Strings represent text of any kind. In Python strings must be wrapped in quotes, either single or double quotes. Triple quotes can be used to create multiline strings.

Strings are immutable.

Strings are sequences and indexed as such. There are many string methods including .upper() to uppercase, .lower() to lowercase, .count() to count the number of occurences of an element

text1 = "This is a string"
text2 = " and this is another string"
text1 + text2
'This is a string and this is another string'
text3 = """This is a multiline string
Therefore you can write multiline text.
The string must be wrapped in triple quotes.
"""
text3
'This is a multiline string\nTherefore you can write multiline text.\nThe string must be wrapped in triple quotes.\n'

String methods

"hello World".count('o')

2

"Hello World".upper()
'HELLO WORLD'
"hello World".lower()
'hello world'
"hello World".title()
'Hello World'
"Hello world".replace("world", "universe")
'Hello universe'

.strip() to remove leading and trailing whitespace.

mystring = "            hello there and       a very big                space"
mystring.strip()

‘hello there and a very big space’

"Using the find method to find the index of the first occurence for the matching string".find("method")
15
"Using the rfind method to find the index of the first occurence for the matching string from the end".rfind("string")
81
"to be or not to be".find("be")
3
"to be or not to be".rfind("be")
16
"to be or not to be".startswith('t')
True

String formatting

"hello".rjust(10, ' ')
'     hello'
"hello".ljust(10, ' ')
'hello     '
"123".zfill(10)
'0000000123'
"Hello {} child and your {} family".format("dear", "wonderful")
'Hello dear child and your wonderful family'
'{0} {1} {3} {2}'.format("welcome", "to", "home","my").title()
'Welcome To My Home'
"Welcome to our {adj} shop".format(adj="new")
'Welcome to our new shop'

F-strings

Everything inside the curly brackets is executable code. Arithmetic and functions could be placed inside the curly brackets.

adj1, adj2 = "new", "exciting"
f"Welcome to my {adj1} and {adj2} shop"
'Welcome to my new and exciting shop'
hours = 24
days =7
f"We are open for {hours * days} hours every week"
'We are open for 168 hours every week'
shop = "The new sweet shop"
adj=  "fantastic"
f"My {adj } new shop is called {shop.title()}."
'My fantastic new shop is called The New Sweet Shop.'

Formatting mini-language

String objects have a format method for substituting formatted arguments into a string producing a new string.

Add a colon : after the expression in curly brackets, followed by the mini-language notation.

For example

  • {0:.2f} format the first argument as a floating point number with two decimal places
  • {1:s} format the second argument as a string
  • {2:d} to format the 3rd argument as an exact integer.
pct = .20
first_customers = 500
value = 100
f"There will be a discount of {pct:.1%} for the first {first_customers} customers every Monday with everything less than €{value:.2f}"

'There will be a discount of 20.0% for the first 500 customers every Monday with everything less than €100.00'

Strings are sequences

"Python treats strings as sequences."[0]
'P'
"Python treats strings as sequences."[0:10]
'Python tre'
"Python treats strings as sequences."[:10]
'Python tre'
"Python treats strings as sequences."[10:]
'ats strings as sequences.'
"string" in "You can use the 'in' keyword to check if one string contains another string"
True


Strings can be concatenated using `+`

```python
text1 + text2
'This is a string and this is another string'

Some useful string methods

While most objects in Python are mutable, strings (and tuples) are not and therefore you cannot modify them.

str.partition(sep)

Split the string at the first occurrence of sep, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings.

This is useful when searching through a directory for files whose file names contain a particular pattern.

Also when splitting a dataframe column into new column(s) based on the part before the separator, the separator itself and the part after the separator.

For example the Month column below consists of a year and month separated by a ‘M’.

Month VALUE
0 1958M01
1 1958M01
2 1958M01

Using str.partition to split the Month into a year and month column.

df[['year', 'month']] = df['Month'].str.partition('M')[[0,2]]

This creates 2 new columns

Month VALUE year month
0 1958M01 160.2 1958
1 1958M01 95.6 1958

This achieves the same result as using str.split. Set expand = True to create multiple columns.

df[['year', 'month']] = df['Month'].str.split('M', n=1, expand=True)

Use pop to remove the original column if it is not needed once the new columns are created.

df[['year', 'month']] = df.pop('Month').str.split('M', n=1, expand=True)


Pandas String partitioning using str.partition.

pandas str.partition method splits a string into three parts using the given separator. This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

This is especially useful when splitting a URL into parts.

For example I have a dataframe containing URLs to datasets from the Central Statistics Pffice (CSO) PxStat database.

Each URL follows the same format:

https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/DHA09/CSV/1.0/en"

df['url'].str.partition('/CSV/') will split the string into 3 parts, the part before ‘/CSV/’ in position 0, ‘/CSV/’ in position 1 and the part after ‘/CSV/’ in position 2.

https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/DHA09", “/CSV/” and “1.0/en”

Each of the three parts can be retrieving using indexing (from 0 to 2)

To further split the first part of the URL, call str.partition again.

df['url'].str.partition('/CSV/')[0]\ .str.partition('https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/')