How to perform groupby using Python Itertools

Itertools is a powerful module and is part of python standard library. It provides a set of fast and memory efficient functions. You can learn more about them by referring to this article.

# Sample Data 
data = [
        {'id': 1, 'name': 'abc', 'child': {'id': 2, 'name': 'child1'}},
        {'id': 1, 'name': 'abc', 'child': {'id': 3, 'name': 'child2'}},
        {'id': 2, 'name': 'def', 'child': {'id': 4, 'name': 'child3'}},
        {'id': 2, 'name': 'def', 'child': {'id': 5, 'name': 'child4'}}

Problem statement:  For a particular id, name, all the entries should be turned into list of dictionaries.

# Expected output
   {'id': 1, 'name': 'abc', 'child': [
                                      {'id': 2, 'name': 'child1'}, 
                                      {'id': 3, 'name': 'child2'}
   {'id': 2, 'name': 'def', 'child': [
                                      {'id': 4, 'name': 'child3'},
                                      {'id': 5, 'name': 'child4'}

I have used following code to get the expected output

import pprint
pp = pprint.PrettyPrinter(indent=4)

from itertools import groupby
from operator import itemgetter
# Define group by key
grouper = itemgetter("id", "name")
result = []

#itertools requires sorted input, so we will first sort the input data

for key, grp in groupby(sorted(data, key = grouper), grouper):
    temp_dict = dict(zip(["id", "name"], key))
    temp_dict['child'] = []
    # Use list comprehension to collect all the items in grp
    temp_dict['child'] = list(item['child'] for item in grp)

# print the result

Amit Bansal

Experienced professional with 16 years of expertise in database technologies. In-depth knowledge of designing and implementation of Disaster Recovery / HA solutions, Database Migrations , performance tuning and creating technical solutions. Skills: Oracle,MySQL, PostgreSQL, Aurora, AWS, Redshift, Hadoop (Cloudera) , Elasticsearch, Python

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.