Shuan Chen

PhD Student in KAIST CBE

0%

5 Useful Python Pacakges and Functions

Here are the 5 useful packages/functions I wish I could have learned earlier

1. pprint
2. One-line code
3. enumerate() and zip()
4. defaultdict
5. dir()

pprint

pprint is the abbrevation of “pretty-print” and it makes your printing output looks much easier to read.
For example I have a set of coordinates, if I print with regular print:

1
2
3
4
coordinates = [{"name": "Location 1", "gps": (29.008966, 111.573724)},
{"name": "Location 2", "gps": (40.1632626, 44.2935926)},
{"name": "Location 3", "gps": (29.476705, 121.869339)}]
print (coordinates)

it gives
1
[{'name': 'Location 1', 'gps': (29.008966, 111.573724)}, {'name': 'Location 2', 'gps': (40.1632626, 44.2935926)}, {'name': 'Location 3', 'gps': (29.476705, 121.869339)}]

but if you print with pprint:
1
2
from pprint import pprint 
pprint (coordinates)

it gives
1
2
3
[{'gps': (29.008966, 111.573724), 'name': 'Location 1'},
{'gps': (40.1632626, 44.2935926), 'name': 'Location 2'},
{'gps': (29.476705, 121.869339), 'name': 'Location 3'}]

One-line code

I tended to use a lot of for loop to iteratively append values in a list when I first learned python. For example:

1
2
3
4
5
6
7
8
9
10
11
from numpy.random import randint

def linear(x):
return 3*x*x + 5*x + 12
numbers = [i for i in randint(0, 10, 1000)]

# The for loop starts from here
large_numbers = []
for number in numbers:
if linear(number) > 300:
large_numbers.append(number)

However I found this nested loop lines can be written in one line:
1
large_numbers = [number for number in numbers if linear(number) > 300]

Everything gets much simpler ever since I knew this tip!

enumerate() and zip()

These two build-in functions are really useful when doing iteration operations (for/while loop)
Let’s say we want to make a dictionary of 12 animals from chinese and english

1
2
animals_chi = ['鼠', '牛', '虎', '兔', '龍', '蛇', '馬', '羊', '猴', '雞', '狗', '豬']
animals_eng = ['rat', 'ox', 'tiger', 'hare', 'dragon', 'snake', 'horse', 'sheep', 'monkey', 'cock', 'dog', 'boar']

and give them numbers
1
2
3
4
animal_dict = {}
for i in range(12):
animal_dict[i] = (animals_chi[i], animals_eng[i])
pprint (animal_dict)

outputs
1
2
3
4
5
6
7
8
9
10
11
12
{0: ('鼠', 'rat'),
1: ('牛', 'ox'),
2: ('虎', 'tiger'),
3: ('兔', 'hare'),
4: ('龍', 'dragon'),
5: ('蛇', 'snake'),
6: ('馬', 'horse'),
7: ('羊', 'sheep'),
8: ('猴', 'monkey'),
9: ('雞', 'cock'),
10: ('狗', 'dog'),
11: ('豬', 'boar')}

It’s not a bad code but it can be shorter with the help of enumerate() and zip .
enumerate() returns the index of interation and zip() enable the code to iterate two list at the same time
This code is equivalent to
1
2
animal_dict = {i: (chi, eng) for i, (chi, eng) in enumerate(zip(animals_chi, animals_eng))}
pprint (animal_dict)

defaultdict

At many times we need to create a dictionary to record the results of our function. It can be really annoying to initialize the value in the dictionary:

1
2
3
4
5
6
7
8
9
10
results = {}

# initialize
for i in range(10):
results[i] = list()

# appending value in the dictionary
for i, seed in enumerate(random(10)):
results[int(seed*10)].append(i)
print (results)

and it gives
1
{0: [2, 5], 1: [], 2: [], 3: [4], 4: [3, 6], 5: [], 6: [9], 7: [7], 8: [1], 9: [0, 8]}

With defaultdict, you can initialize all the values with the given class without specifying any keys:
1
2
3
4
5
6
from collections import defaultdict

results = defaultdict(list)
for i, seed in enumerate(random(10)):
results[int(seed*10)].append(i)
print (results)

and it gives
1
defaultdict(<class 'list'>, {7: [0, 2], 3: [1], 5: [3, 8], 8: [4, 7], 1: [5], 2: [6], 9: [9]})

dir()

As we use more packages, sometimes we are not sure what attribute does the object in the package has. However I found that I can easily browse all the attributes by using dir(). For example I made a molecule of phenol by rdkit:

1
2
3
4
5
from rdkit import Chem

phenol = 'Oc1ccccc1'
mol = Chem.MolFromSmiles(phenol)
print (type(mol))

outputs
1
<class 'rdkit.Chem.rdchem.Mol'>

However, I don’t know how can I do with this molecule, so I can see the attribute of this object by dir(mol) and it gives

1
['AddConformer', 'ClearComputedProps', 'ClearProp', 'Debug', 'GetAromaticAtoms', 'GetAtomWithIdx', 'GetAtoms', 'GetAtomsMatchingQuery', 'GetBondBetweenAtoms', 'GetBondWithIdx', 'GetBonds', 'GetBoolProp', 'GetConformer', 'GetConformers', 'GetDoubleProp', 'GetIntProp', 'GetNumAtoms', 'GetNumBonds', 'GetNumConformers', 'GetNumHeavyAtoms', 'GetProp', 'GetPropNames', 'GetPropsAsDict', 'GetRingInfo', 'GetStereoGroups', 'GetSubstructMatch', 'GetSubstructMatches', 'GetUnsignedProp', 'HasProp', 'HasSubstructMatch', 'NeedsUpdatePropertyCache', 'RemoveAllConformers', 'RemoveConformer', 'SetBoolProp', 'SetDoubleProp', 'SetIntProp', 'SetProp', 'SetUnsignedProp', 'ToBinary', 'UpdatePropertyCache', '__DebugMol', '__GetSubstructMatch', '__GetSubstructMatches', '__class__', '__copy__', '__deepcopy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getinitargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__instance_size__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__safe_for_unpickling__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_repr_png_', '_repr_svg_']

So I know there are functions like mol.AddConformer() or mol.GetAtoms() or mol.GetBonds() to use. Such as

1
print ([atom.GetSymbol() for atom in mol.GetAtoms()])

outputs
1
['O', 'C', 'C', 'C', 'C', 'C', 'C']