Import Large Data from MongoDB Using MongoDB C++ Interface
This example shows how to import a large set of flight data from a MongoDB® collection into the MATLAB® workspace using the MongoDB C++ interface. To avoid out-of-memory issues when retrieving many documents, use a loop to import large data in batches.
Create MongoDB C++ Interface Connection
Create a MongoDB connection to the database mongotest
using the MongoDB C++ interface. Here, the database server dbtb01
hosts this database using port number 27017
.
server = "dbtb01"; port = 27017; dbname = "mongotest"; conn = mongoc(server,port,dbname)
conn = connection with properties:
Database: "mongotest"
UserName: ""
Server: "dbtb01"
Port: 27017
CollectionNames: [14×1 string]
conn
is the connection
object that contains the MongoDB connection. The object properties contain information about the connection and the database.
The database name is
mongotest
.The user name is blank.
The database server is
dbtb01
.The port number is
27017
.This database contains 14 document collections.
Verify the MongoDB connection.
isopen(conn)
ans = logical
1
The database connection is successful because the isopen
function returns 1. Otherwise, the database connection is closed.
Determine Number of Documents to Import
Find the total number of documents, specified as totaldocs
, in the airlinesmall
collection for the years 1997 through 2010. Use a MongoDB query to filter the flight data for the specified years.
collection = "airlinesmall"; mongoquery = "{""Year"":{""$gte"":1997,""$lte"":2010}}"; totaldocs = count(conn,collection,Query=mongoquery);
Retrieve Large Data in Batches
Estimate the batch size to be 15,000 documents. Define the MATLAB workspace variable for storing the retrieved data.
batchsize = 15000; flightdata = [];
You can change the batch size depending on the performance and memory capacity of your system.
Use a while
loop to retrieve flight data from the collection. The variable flightdata
accumulates each batch of retrieved data.
% Track number of documents read index = 0; while index < totaldocs % Retrieve documents in a batch localdata = find(conn,collection,Query=mongoquery, ... Skip=index,Limit=batchsize); % Store retrieved documents locally flightdata = [flightdata; localdata]; % Move to the next batch index = index + batchsize; end
Display information about the flightdata
variable. The retrieved data is a structure array that contains 75,603 structures. Each structure contains 30 fields of flight data.
whos flightdata
Name Size Bytes Class Attributes flightdata 75603x1 248848692 struct
Close MongoDB C++ Interface Connection
close(conn)
See Also
mongoc
| isopen
| count
| find
| close