1. pprint
2. One-line code
3. enumerate() and zip()
4. defaultdict
5. dir()
pprint
pprint is the abbrevation of “pretty-print” and it makes your printing output looks much easier to read.
For example I have a set of coordinates, if I print with regular print:1
2
3
4coordinates = [{"name": "Location 1", "gps": (29.008966, 111.573724)},
{"name": "Location 2", "gps": (40.1632626, 44.2935926)},
{"name": "Location 3", "gps": (29.476705, 121.869339)}]
print (coordinates)
it gives1
[{'name': 'Location 1', 'gps': (29.008966, 111.573724)}, {'name': 'Location 2', 'gps': (40.1632626, 44.2935926)}, {'name': 'Location 3', 'gps': (29.476705, 121.869339)}]
but if you print with pprint:1
2from pprint import pprint
pprint (coordinates)
it gives1
2
3[{'gps': (29.008966, 111.573724), 'name': 'Location 1'},
{'gps': (40.1632626, 44.2935926), 'name': 'Location 2'},
{'gps': (29.476705, 121.869339), 'name': 'Location 3'}]
One-line code
I tended to use a lot of for loop to iteratively append values in a list when I first learned python. For example:1
2
3
4
5
6
7
8
9
10
11from numpy.random import randint
def linear(x):
return 3*x*x + 5*x + 12
numbers = [i for i in randint(0, 10, 1000)]
# The for loop starts from here
large_numbers = []
for number in numbers:
if linear(number) > 300:
large_numbers.append(number)
However I found this nested loop lines can be written in one line:1
large_numbers = [number for number in numbers if linear(number) > 300]
Everything gets much simpler ever since I knew this tip!
enumerate() and zip()
These two build-in functions are really useful when doing iteration operations (for/while loop)
Let’s say we want to make a dictionary of 12 animals from chinese and english1
2animals_chi = ['鼠', '牛', '虎', '兔', '龍', '蛇', '馬', '羊', '猴', '雞', '狗', '豬']
animals_eng = ['rat', 'ox', 'tiger', 'hare', 'dragon', 'snake', 'horse', 'sheep', 'monkey', 'cock', 'dog', 'boar']
and give them numbers1
2
3
4animal_dict = {}
for i in range(12):
animal_dict[i] = (animals_chi[i], animals_eng[i])
pprint (animal_dict)
outputs1
2
3
4
5
6
7
8
9
10
11
12{0: ('鼠', 'rat'),
1: ('牛', 'ox'),
2: ('虎', 'tiger'),
3: ('兔', 'hare'),
4: ('龍', 'dragon'),
5: ('蛇', 'snake'),
6: ('馬', 'horse'),
7: ('羊', 'sheep'),
8: ('猴', 'monkey'),
9: ('雞', 'cock'),
10: ('狗', 'dog'),
11: ('豬', 'boar')}
It’s not a bad code but it can be shorter with the help of enumerate()
and zip
.enumerate()
returns the index of interation and zip()
enable the code to iterate two list at the same time
This code is equivalent to1
2animal_dict = {i: (chi, eng) for i, (chi, eng) in enumerate(zip(animals_chi, animals_eng))}
pprint (animal_dict)
defaultdict
At many times we need to create a dictionary to record the results of our function. It can be really annoying to initialize the value in the dictionary:1
2
3
4
5
6
7
8
9
10results = {}
# initialize
for i in range(10):
results[i] = list()
# appending value in the dictionary
for i, seed in enumerate(random(10)):
results[int(seed*10)].append(i)
print (results)
and it gives1
{0: [2, 5], 1: [], 2: [], 3: [4], 4: [3, 6], 5: [], 6: [9], 7: [7], 8: [1], 9: [0, 8]}
With defaultdict, you can initialize all the values with the given class without specifying any keys:1
2
3
4
5
6from collections import defaultdict
results = defaultdict(list)
for i, seed in enumerate(random(10)):
results[int(seed*10)].append(i)
print (results)
and it gives1
defaultdict(<class 'list'>, {7: [0, 2], 3: [1], 5: [3, 8], 8: [4, 7], 1: [5], 2: [6], 9: [9]})
dir()
As we use more packages, sometimes we are not sure what attribute does the object in the package has. However I found that I can easily browse all the attributes by using dir()
. For example I made a molecule of phenol by rdkit:1
2
3
4
5from rdkit import Chem
phenol = 'Oc1ccccc1'
mol = Chem.MolFromSmiles(phenol)
print (type(mol))
outputs1
<class 'rdkit.Chem.rdchem.Mol'>
However, I don’t know how can I do with this molecule, so I can see the attribute of this object by dir(mol)
and it gives
1 | ['AddConformer', 'ClearComputedProps', 'ClearProp', 'Debug', 'GetAromaticAtoms', 'GetAtomWithIdx', 'GetAtoms', 'GetAtomsMatchingQuery', 'GetBondBetweenAtoms', 'GetBondWithIdx', 'GetBonds', 'GetBoolProp', 'GetConformer', 'GetConformers', 'GetDoubleProp', 'GetIntProp', 'GetNumAtoms', 'GetNumBonds', 'GetNumConformers', 'GetNumHeavyAtoms', 'GetProp', 'GetPropNames', 'GetPropsAsDict', 'GetRingInfo', 'GetStereoGroups', 'GetSubstructMatch', 'GetSubstructMatches', 'GetUnsignedProp', 'HasProp', 'HasSubstructMatch', 'NeedsUpdatePropertyCache', 'RemoveAllConformers', 'RemoveConformer', 'SetBoolProp', 'SetDoubleProp', 'SetIntProp', 'SetProp', 'SetUnsignedProp', 'ToBinary', 'UpdatePropertyCache', '__DebugMol', '__GetSubstructMatch', '__GetSubstructMatches', '__class__', '__copy__', '__deepcopy__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getinitargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__instance_size__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__safe_for_unpickling__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_repr_png_', '_repr_svg_'] |
So I know there are functions like mol.AddConformer()
or mol.GetAtoms()
or mol.GetBonds()
to use. Such as1
print ([atom.GetSymbol() for atom in mol.GetAtoms()])
outputs1
['O', 'C', 'C', 'C', 'C', 'C', 'C']