Jul 15, 2025
How to go from S3 to MongoDB with no code using Unstructured
Unstructured
LLM
Modern AI systems—whether you're building Retrieval-Augmented Generation (RAG), semantic search, document intelligence, or agentic workflows—depend on transforming unstructured content into structured, vectorized knowledge. But connecting your cloud storage to a vector database like MongoDB Atlas typically means writing custom code, standing up infrastructure, and managing orchestration.
This tutorial shows you how to skip all of that.
With Unstructured's Workflow builder, you can ingest PDFs, Word docs, HTML, and more from Amazon S3, apply powerful parsing and metadata enrichment, generate embeddings, and send the results directly into MongoDB, all from your browser. Whether you're prototyping a chatbot or deploying a production-grade search system, this is the fastest way to turn unstructured data into AI-ready vectors.
You'll walk through building a full document ingestion pipeline from S3 to MongoDB, entirely in the UI. No orchestration code required.
What You'll Learn
In this tutorial, we'll show you how to:
Pull documents from an S3 bucket
Partition text from PDFs, DOCX, HTML, and other document types
Apply enrichments (image captions, table summaries, etc.)
Generate embeddings
Push data into MongoDB for RAG or search applications
Step 1: Connect Your S3 Bucket in Unstructured
🔑 Retrieve AWS Security Credentials
Navigate to the top bar in AWS and click your account ID in the top right
Scroll down to Security Credentials
Scroll to the Access keys section and click Create access key
You'll receive an Access Key ID and Secret Access Key
Click Download .csv file to keep a local copy of the keys for reference
🪣 Create a New S3 Bucket
This bucket will contain your input PDFs.
In the AWS Console, go to Amazon S3 → Buckets, then click Create bucket
Use a name like
nicks-demo-s3-bucket
Keep Block all public access checked
Leave all other settings as default
Click Create bucket
📄 Upload Your Files
Locate your new bucket in the list and click its name
Click the Upload button
Click Add files and select the PDF documents you want to upload (PDF, DOCX, HTML, JPEG, etc.)
Copy the full Destination URI (e.g.,
s3://nicks-demo-s3-bucket
)Scroll down and click Upload
🔒 Set S3 Bucket Permissions
Navigate to your bucket
Select the Permissions tab at the top
Note: If you're using access keys tied to an account with full S3 read/write permissions, you can leave the bucket policy blank.
Step 2: Create a New S3 Connector in Unstructured
Go to
platform.unstructured.io
or your organization's tenant addressIn the left sidebar, click Connectors
Click + New, ensure Source is selected, and choose Amazon S3
Set a name like
nicks-test-s3-connector
Fill in the Bucket URI, AWS Key, and AWS Secret Key
Check Recursive if you want to ingest nested folders
Leave Custom URL blank
Click Save and Test
Upon success, you'll see a confirmation message
Step 3: Set Up MongoDB Atlas
🧾 Create an Account
Visit MongoDB Atlas and sign up.
📁 Create a New Project
Go to Projects → New Project
Name your project (Example:
nicks-test-mongodb
)On Add Members and Set Permissions, leave the defaults and click Create Project
🔐 Whitelist Unstructured's IP Addresses
We need to allow Unstructured's connectors to communicate with your MongoDB cluster.
Go to the Project Home Page
In the left-hand sidebar, navigate to Security → Network Access
Click Add IP Address under the IP Access List section
To fetch Unstructured's IPs, run the following command in your terminal:
Alternatively, manually add each of these IPs one at a time in CIDR notation:
104.42.153.20/30
104.45.176.240/30
20.23.19.236/30
20.88.104.236/30
Paste each into the IP input box. Leave other fields default, then click Confirm
After saving, you should see all IPs listed under the access list
🗂 Deploy a New Cluster
In the Project Home Page, go to Database → Clusters
Click Build a Cluster
Select the M10 tier
Important: Connectors to M0 and Flex databases are currently not supported due to the lesser encryption standard used for these assets by MongoDB.
Choose your preferred cloud provider (AWS, Google Cloud, Azure). For this demo:
Cloud Provider: Azure
Tier: M10
(Optional) Set Cluster Name to
Cluster0
Ensure Quicksetup → Automate security setup is checked and Preload sample dataset is unchecked
Click Create Deployment
👤 Create a Database User
A window will prompt you to create credentials
Click Create Database User
Record your username and password — you'll use them to configure the Unstructured connector
You should see confirmation that a database was added
🔗 Get the MongoDB URI
Under Choose a connection method, select Drivers
In Connecting with MongoDB Driver, scroll to section 3: Add your connection string into your application code
Copy the URI — you'll need it for Unstructured setup
Example:
Click Done to finish
👤 (Optional) Manually Add a New Database User
If you skipped the earlier prompt to create a user during the database setup:
Go to Database Access → Add New Database User
Fill in the fields:
Username:
testuser
Password:
testpassword
Role: Read and Write to any database
📚 Add Your Own Data
Navigate to Database → Clusters
Select your cluster
At the top, click Collections → Add My Own Data
Set:
Database name: your choice
Collection name: your choice
Preferences: Clustered Index Collection
Click Create
🎉 You should now see your cluster and collection listed.

Step 4: Create a MongoDB Destination Connector in Unstructured
Log in to Unstructured
For a full walkthrough of how to set up a MongoDB destination connector, you can follow the official guide Unstructured Docs - MondgoDB Destination or follow the instructions below:
Go to platform.unstructured.io or the tenant address of your organization
Select “Connectors” on the lefthand panel
Click the “+ New” button on the side bar.
Set a name for the connector
Make sure “Destination” is highlighted and choose the “MongoDB” destination connector <image_2.png>
Click “Continue”
Enter the database and collection names that you created for your cluster in MongoDB Atlas.
Add the connection string for your cluster that you created during initialization.
Click Test Connection — you should get a success message
Step 5: Create a Workflow in Unstructured
From the main dashboard, click Workflows → New Workflow
Select Build it for me
Name your workflow, choose the previously created source and destination connectors, then click Continue
Use the automatic partitioning strategy, default embedding model and size
Leave other settings default, then click Complete
Optional: Adjust the Embedder
Go to the Embedder segment of your workflow
Click the gear icon in the top right
Choose your embedding model
Step 6: Run & Test the Workflow
▶️ Full Run
Go to the Workflows page
Click Run next to your workflow
Use the Schedule tab to automate runs
📄 Upload a Sample Document
In your workflow, go to the Source segment
Upload a single document
Click the Results </> icon above the segment to inspect JSON output at every stage
Step 7: Get More from Your Workflow
🪄 Partitioning Strategy
The default auto strategy detects structure (titles, tables, images) and selectively applies VLM parsing. Read more info here.
🖼️ Image Description Enrichment
Generates human-readable captions for diagrams, photos, and visual elements.
When useful:
Instruction manuals with schematics
Research reports with charts
Scanned docs with key visual content
📊 Table Summary Enrichment
Converts tables to natural language summaries (e.g., 'North America leads in Q1 sales').
Ideal for:
Financial reports
Policy documents
Scanned PDFs with tables
🛠️ Additional Options
Table-to-HTML Enrichment
Named Entity Recognition (NER)
Chunking: by title, character, page, or similarity
Contextual Chunking: prepend summaries to chunks
Conclusion
And that's it!
You now have a fully automated pipeline from S3 documents to enriched, vectorized content in MongoDB and built entirely in Unstructured's UI. Whether launching a RAG system or indexing internal files, this is a fast, reliable starting point.
This no-code approach eliminates the complexity of building custom data pipelines while providing enterprise-grade capabilities for document processing and vector generation.