Building an OCR Application with Mistral OCR AI Model, React and TypeScript

Document processing has traditionally been a complex task requiring multiple tools and OCR engines. With Mistral AI's new OCR capabilities, we can build powerful document processing applications with minimal setup. Let's explore how to implement Mistral OCR in a modern Next.js application.
The Challenge
Processing PDFs and extracting structured text has always been challenging. Developers often face:
- Complex OCR library setups
- Inconsistent text formatting
- Poor handling of multi-page documents
- Limited markdown support
- High costs for enterprise solutions
The beauty of Mistral OCR is that we get a powerful solution that's both developer-friendly and production-ready. We can use it to create our document processing models while getting clean, structured output in markdown format.
How It Works
Our approach offers several powerful features that make document processing a breeze:
- Native Markdown Output - Get clean, formatted text without the hassle
- Multi-page Support - Handle documents of any length seamlessly
- Clean Text Formatting - Preserve your document's structure perfectly
- Error Handling - Catch and handle issues gracefully
- Type Safety - Get full TypeScript support for reliable code
Setting Up the Project
Let's get our development environment ready! First, we'll create a Next.js project and install the necessary dependencies. The setup process is straightforward and takes about 5 minutes.
# Create a new Next.js project
npx create-next-app@latest my-ocr-app --typescript --tailwind --app
# Install required dependencies
npm install react-markdown
Before we start coding, you'll need to set up your Mistral API key:
MISTRAL_API_KEY=your-api-key-here
Complete Code
Here's our implementation. I've broken it down into clear, manageable parts so you can follow along easily:
You can check out the full code in the repo: https://github.com/thestriver/ai-ocr-playground
# Create a new Next.js project
npx create-next-app@latest my-ocr-app --typescript --tailwind --app
# Install required dependencies
npm install react-markdown
Add your Mistral API key to your environment variables:
MISTRAL_API_KEY=your-api-key-here
Backend Implementation
Here's the core implementation for processing documents with Mistral OCR:
interface MistralOCRPage {
index: number
markdown: string
images: any[]
dimensions: {
dpi: number
height: number
width: number
}
}
interface MistralOCRResponse {
pages: MistralOCRPage[]
}
async function processPDF(file: File) {
const buffer = Buffer.from(await file.arrayBuffer())
// Step 1: Upload the file
const formData = new FormData()
formData.append('purpose', 'ocr')
formData.append('file', new Blob([buffer], { type: file.type }), file.name)
const uploadResponse = await fetch('https://api.mistral.ai/v1/files', {
method: 'POST',
headers: {
'Authorization': \`Bearer \${process.env.MISTRAL_API_KEY}\`,
},
body: formData
})
if (!uploadResponse.ok) {
throw new Error('File upload failed')
}
const { id: fileId } = await uploadResponse.json()
// Step 2: Get signed URL
const signedUrlResponse = await fetch(
\`https://api.mistral.ai/v1/files/\${fileId}/url?expiry=24\`,
{
method: 'GET',
headers: {
'Authorization': \`Bearer \${process.env.MISTRAL_API_KEY}\`,
'Accept': 'application/json'
}
}
)
const { url: signedUrl } = await signedUrlResponse.json()
// Step 3: Process with OCR
const ocrResponse = await fetch('https://api.mistral.ai/v1/ocr', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': \`Bearer \${process.env.MISTRAL_API_KEY}\`,
},
body: JSON.stringify({
model: 'mistral-ocr-latest',
document: {
type: 'document_url',
document_url: signedUrl,
},
include_image_base64: false
})
})
const ocrData = await ocrResponse.json() as MistralOCRResponse
return ocrData.pages?.map(page => page.markdown).join('\\n\\n')
}
Frontend Implementation
Let's create a custom hook to manage the OCR state and processing:
// hooks/use-ocr.ts
import { useState } from "react"
interface OCRResult {
result: string
processingTime: number
}
export function useOCR() {
const [isProcessing, setIsProcessing] = useState(false)
const [result, setResult] = useState<OCRResult | null>(null)
const [error, setError] = useState<string | null>(null)
const processDocument = async (file: File) => {
try {
setIsProcessing(true)
setError(null)
const formData = new FormData()
formData.append("file", file)
const response = await fetch("/api/ocr", {
method: "POST",
body: formData,
})
if (!response.ok) {
throw new Error(\`OCR processing failed: \${response.statusText}\`)
}
const data = await response.json()
setResult(data)
} catch (err: any) {
setError(err.message)
} finally {
setIsProcessing(false)
}
}
return { isProcessing, result, error, processDocument }
}
And here's how to use it in your page:
// app/page.tsx
"use client"
import { useOCR } from "@/hooks/use-ocr"
import ReactMarkdown from 'react-markdown'
export default function OCRPage() {
const { isProcessing, result, error, processDocument } = useOCR()
const handleFileUpload = async (event: React.ChangeEvent<HTMLInputElement>) => {
const file = event.target.files?.[0]
if (file) {
await processDocument(file)
}
}
return (
<div className="max-w-4xl mx-auto p-4">
<h1 className="text-2xl font-bold mb-4">Mistral OCR Demo</h1>
<input
type="file"
accept="application/pdf"
onChange={handleFileUpload}
className="mb-4"
/>
{isProcessing && <div>Processing document...</div>}
{error && (
<div className="text-red-500 mb-4">{error}</div>
)}
{result && (
<div className="border rounded-lg p-4">
<div className="prose dark:prose-invert">
<ReactMarkdown>{result.result}</ReactMarkdown>
</div>
</div>
)}
</div>
)
}
Best Practices
- Error Handling: Implement robust error handling at each step
- File Validation: Check file types and sizes before processing
- Progress Tracking: Implement processing status updates
- Cleanup: Delete uploaded files after processing
- Caching: Cache results for frequently accessed documents
Conclusion
Mistral OCR transforms complex document processing into a streamlined workflow that any developer can implement. This approach makes enterprise-grade OCR capabilities accessible while maintaining high accuracy and performance.
The combination of powerful AI models and modern web technologies opens up new possibilities for document processing and analysis. Whether you're building a small document processing tool or a large-scale enterprise solution, Mistral OCR provides the capabilities you need.
📫 DM Me for consulting inquiries and professional work.