How to perform groupby using Python Itertools

Itertools is a powerful module and is part of python standard library. It provides a set of fast and memory efficient functions. You can learn more about them by referring to this article.

# Sample Data 
data = [
        {'id': 1, 'name': 'abc', 'child': {'id': 2, 'name': 'child1'}},
        {'id': 1, 'name': 'abc', 'child': {'id': 3, 'name': 'child2'}},
        {'id': 2, 'name': 'def', 'child': {'id': 4, 'name': 'child3'}},
        {'id': 2, 'name': 'def', 'child': {'id': 5, 'name': 'child4'}}
      ]

Problem statement:  For a particular id, name, all the entries should be turned into list of dictionaries.

# Expected output
 [
   {'id': 1, 'name': 'abc', 'child': [
                                      {'id': 2, 'name': 'child1'}, 
                                      {'id': 3, 'name': 'child2'}
                                     ]
   },
   {'id': 2, 'name': 'def', 'child': [
                                      {'id': 4, 'name': 'child3'},
                                      {'id': 5, 'name': 'child4'}
                                      ]
   }
 ]

I have used following code to get the expected output

import pprint
pp = pprint.PrettyPrinter(indent=4)

from itertools import groupby
from operator import itemgetter
# Define group by key
grouper = itemgetter("id", "name")
result = []

#itertools requires sorted input, so we will first sort the input data

for key, grp in groupby(sorted(data, key = grouper), grouper):
    temp_dict = dict(zip(["id", "name"], key))
    temp_dict['child'] = []
    # Use list comprehension to collect all the items in grp
    temp_dict['child'] = list(item['child'] for item in grp)
    result.append(temp_dict)

# print the result
pp.pprint(result)

Tags: ,

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.