Introduction

  • MongoDB is one of the powerful NoSQL database. MongoDB can scale horizontally.
  • Pandas are used vastly in data analytics , machine learning and for other data related operations.
  • Combining these two tools can unleash some powerful features.

Connection

  • To start mongodb - command: mongod
  • Code for connecting to DB from python-
import pymongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
# connecting to desired database
# here test is database name. if db doesn't exist it will create db automatically
mydb = myclient["test"] 

# connecting to a collection, where our data will be stored.
# if it doesn't exist. It will create automatically.
customer_collection = mydb["customers"]
  • Inserting some documents into customers collection of test db
data = [{ "name": "John", "address": "Highway 37" }, 
          { "name": "Peter", "address": "Lowstreet 27" },
          { "name": "Manu", "address": "Ashwath 9th" }]
customer_collection.insert_many(data)
Inserted Data

Conversion:

From MongoDB to Pandas Data frame:

  • We already have connection to customers collection by variable name 'customers_collection'. (refer above)
exclude_col = {'_id': False } # we are ignoring '_id' column.
data = list(customer_collection.find({}, projection=exclude_col)) # data is in json format

# converting json to pandas dataframe
import pandas as pd
df = pd.DataFrame(data, columns=['name', 'address'])
Data Frame Screenshot

From Pandas Data frame to MongoDB:

  • We are going to insert above data frame into customer collection.
# converting dataframe into json format
df_json = df.T.to_json()

# converting into list of json rows.
df_json_list = json.loads(df_json).values()

# inserting into customers collection of test db
customer_collection.insert_many(df_json_list)
New data screenshot