Ottoman/Arabic End-to-End Pipeline
Unlocking 600 years of history through Artificial Intelligence.
Millions of pages of Ottoman/Arabic documents—spanning books, journals, and historical archives—have remained largely inaccessible for over a century, readable only by a select few experts. The manual transcription of this vast cultural heritage is a task that exceeds human constraints of time and budget.
By synthesizing state-of-the-art Computer Vision, Deep Learning, and Natural Language Processing (NLP), we provide an automated workflow to digitize, transcribe, and translate printed Ottoman documents into Modern Turkish.
Segmentation
Before recognition begins, raw document images undergo advanced layout analysis. Our algorithms separate text blocks, detect headers, and perform precise line segmentation to isolate content for the neural networks.
Optical Character Recognition
We utilize OCR architectures specifically trained on millions of Ottoman text lines. This allows for high-accuracy character recognition even on aged or lower-quality historical documents.
NLP & Transliteration
Raw text is processed through a complex NLP pipeline including orthographic normalization, morphological analysis, and semantic resolution. The system converts the Ottoman alphabet to the Latin alphabet and modernizes the lexicon for contemporary readers.
Technical Specifications
Authentication
The Osmanlica.com API uses API keys to authenticate requests. You must include your unique key in the headers of every request you make to the API.
1. Get an API Key
You can create and manage your API keys in the Console. We recommend creating separate keys for development and production environments.
2. Pass the Key in Headers
Pass your key using the Authorization
header with the Api-Key
prefix.
curl -X POST https://api.osmanlica.com/health \
-H "Authorization: Api-Key YOUR_API_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{"image_base64": "..."}'
Security Best Practice
Your API key carries many privileges, so be sure to keep it secure! Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, or front-end JavaScript.
1. Segmentation API
The foundation of accurate OCR lies in precise document layout analysis. The Segmentation API decomposes complex historical documents into their constituent structural elements.
Powered by Deep Learning
Historical Ottoman documents often feature irregular layouts, marginalia, and tight line spacing that confuse standard OCR engines. Our segmentation models have been trained on a proprietary dataset of thousands of verified Ottoman documents.
By mathematically isolating text regions before recognition begins, we dramatically reduce noise and increase the character accuracy rate (CER) of the downstream OCR process.
Line Segmentation
Detects baselines and separates individual lines of text. Essential for feeding the CRNN OCR engine.
Block Segmentation
Identifies paragraphs and text columns to determine distinct regions of content.
Layout Analysis
The Holistic View: Combines block and line segmentation with orientation detection to map the full page structure and establish a logical reading order.
/segmentation/line
Detects and segments individual text lines within a document. Our model allows for the detection of dense, small text lines in historical manuscripts without the information loss typical of standard resizing methods.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the document image. (Required if image_file is not provided) |
| image_file | file | Binary image file upload (Multipart/form-data). |
| confidence_threshold | float | Minimum confidence score. Default: 0.3 |
Example Request (JSON)
Example Response
Note: The lines array returns a list of polygons. Each
polygon is a list of [x, y] coordinates tracing the exact contour of
the text line, providing higher precision than standard bounding boxes.
/segmentation/block
Identifies and isolates distinct regions of text (paragraphs, columns, marginalia) within the document. The architecture is optimized for instance segmentation, allowing the system to distinguish between main content bodies and side notes, which is crucial for establishing the correct logical reading order.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the document image. (Required if image_file is not provided) |
| image_file | file | Binary image file upload (Multipart/form-data). |
| confidence_threshold | float | Minimum score to detect a block. Default: 0.5 |
Example Request (JSON)
Example Response
Note: The returned blocks are defined by polygon
coordinates rather than bounding boxes. This allows the system to accurately capture
non-rectangular text regions common in artistic or marginal Ottoman script layouts.
/segmentation/layout
The Holistic Pipeline
This endpoint processes a document through a multi-stage segmentation pipeline. It performs block detection, line-level segmentation, and orientation estimation to reconstruct the structural hierarchy of the page in a machine-readable format.
Processing Logic
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the document image. |
| image_file | file | Binary image file upload. |
Example Request (JSON)
Example Response
Pipeline Note: This endpoint performs automatic coordinate
transformation. Even if the internal model rotates the image to correct orientation
(e.g., 180° flip), the returned coordinates in lines and
points are mapped back to the original image's coordinate
space.
2. OCR API
Convert Ottoman script images into machine-readable digital text.
/ocr/line
Recognizes text from a single, pre-segmented line, powered by our OCR engine. It is ideal for scenarios where segmentation is handled externally or for processing individual snippets of text.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the cropped line image. |
| image_file | file | Binary image file upload. |
| detect_script | boolean |
If true, performs a
secondary analysis to determine if the text is handwritten or printed.
Default: false
|
Example Request (JSON)
Example Response
Performance Note: This endpoint includes an intelligent caching layer. Repeated requests for the same image hash are served instantly from memory, bypassing the OCR inference engine.
/ocr/page
End-to-End Document Understanding
The Page OCR endpoint is the comprehensive solution for full document digitization. It orchestrates the entire AI pipeline—segmentation, layout analysis, orientation correction, and character recognition—into a single API call. It returns a fully reconstructed textual representation of the page, ordered logically from right-to-left and top-to-bottom.
1. Detect Blocks
Isolate paragraphs
2. Detect Lines
Granular extraction
3. Angle & OCR
Correct & Recognize
4. Reassemble
Ordered Output
Performance Optimization
This endpoint includes a skip feature. If you have already processed the
image via /segmentation/layout, you can pass that result in the
layout_analysis parameter. The OCR engine will skip the segmentation
phase and immediately process the text based on your provided coordinates.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the document image. |
| image_file | file | Binary image file upload. |
| layout_analysis | object | Optional The exact JSON object returned from the Layout API. If provided, segmentation is skipped. |
| detect_script | boolean | Identify handwriting vs print. Default: false |
Example Request (JSON)
Example Response
Mapping Logic: The ocr_result string contains line
breaks (\n). These lines correspond 1-to-1 with the order of the
polygons in the layout_segmentation[].lines array. This allows you to
highlight the exact location of every recognized sentence on the original image.
Try OCR/HTR
Test the flagship endpoint live. Upload a document to see the full pipeline in action.
Drag & Drop your Ottoman document
Supports JPG, PNG (Max 5MB)
Run analysis to see text results.
JSON structure will appear here.
Analyzing Document
Running segmentation & OCR models...
Analysis Failed
{v errorMsg v}
3. Text Processing API
Beyond character recognition lies true understanding. The Text Processing API transforms raw Ottoman script into fluent, modern Turkish, bridging the gap between archival history and contemporary readership.
Built on Millions of Lines of History
Our linguistic models are not generic. They are distilled from a massive proprietary parallel corpus spanning millions of lines of literary, administrative, and legal Ottoman texts. By leveraging state-of-the-art Transformer-based architectures, we capture the nuance of archaic vocabulary and complex Persian/Arabic grammatical structures.
Contextual Awareness
Disambiguates words based on sentence context.
Orthographic Correction
Normalizes archaic spelling variations automatically.
Low Latency
Optimized inference for real-time applications.
/text/transliteration
Hybrid Transliteration Engine
This endpoint converts Ottoman Turkish script (Arabic letters) into Modern Turkish (Latin letters). Unlike simple dictionary lookups, it employs an Adaptive Routing Architecture that selects the best translation strategy based on text complexity and length.
Neural Transformer
Complex sentences and paragraphs are routed to our fine-tuned Transformer models. These models understand deep context, managing grammatical shifts and archaic vocabulary within the flow of a full sentence.
Morphological NLP Engine
Short phrases, titles, or isolated words are processed via a rigorous Rule-based NLP Engine. This ensures high-precision orthographic conversion without the "hallucination" risks sometimes present in large language models when context is scarce.
Pipeline Intelligence
-
Contextual Integrity Preservation: Orphaned words or extremely short lines are automatically merged with neighboring context before translation to prevent fragmentation errors.
-
Smart Batching: The engine splits large texts into optimized batches, processing them in parallel for high throughput.
-
Structural Alignment: Post-processing algorithms realign the output to match the input line structure, ensuring the translated text visually mirrors the original document's layout.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| text | string | The raw Ottoman Turkish text string to be transliterated. |
Example Request
Example Response
Try Transliteration
Test the hybrid engine interactively. Type or paste Ottoman script on the left.
{v outputText v}
Translation will appear here.../text/summarization
Intelligent Archival Indexing
This endpoint is powered by a custom Large Language Model (LLM) fine-tuned for historical taxonomy. It goes beyond simple abstraction; it generates multi-tiered summaries optimized for archival search and extracts semantic metadata (entities and terminology) to structure unstructured history.
Output Feature 1: Multi-Tiered Summaries
Short Summary
Length: ~0.5x
A concise digest retaining core "WH" elements (Who, What, Where). Perfect for list views and mobile interfaces.
Main Summary
The gold standard for archival indexing. It comprehensively answers the 5 Ws (Who, When, Where, Why, How), mimicking the style of expert archivists.
Long Summary
Length: ~2.0x
A detailed elaboration including background context, causes, effects, and secondary actors. Ideal for research deep-dives.
Output Feature 2: Semantic Extraction
Named Entity Recognition (NER)
Identifies and categorizes specific actors and occurrences.
Historical Terminology (Istılah)
Filters out common Turkish words to isolate specific Ottoman historical terms (Istılah Terimleri). This creates a highly specific index of legal, administrative, and literary jargon found in the corpus.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| text | string | The Ottoman Turkish text (in Latin script) to be analyzed. |
Example Request
Example Response
4. Orientation API
Historical scanning processes are rarely perfect. The Orientation API uses specialized classification models to detect the cardinal rotation of text regions, ensuring that downstream OCR models receive upright, readable input.
Discrete Classification Logic
Unlike the Segmentation API which handles fine skew correction (e.g., 2° or 3° tilts), this API is designed for Gross Orientation Correction. It utilizes lightweight, high-speed networks to classify input images into one of four cardinal directions: 0°, 90°, 180°, or 270°.
Output Reference Guide
/orientation/line
Determines the cardinal orientation of a single text line. This endpoint uses a specialized networks trained on narrow, high-aspect-ratio image strips. It is critical for validating the output of segmentation before passing it to OCR, as upside-down text (180°) will result in gibberish output from the recognition engine.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the cropped line image. |
| image_file | file | Binary image file upload. |
Example Request
Example Response
Correction Strategy: If the result is 180, you must
rotate the image 180 degrees before sending it to the OCR endpoint. If the result is
0, proceed directly.
/orientation/block
Detects the orientation of a text block or a full page. This model is trained on macro-level features (paragraph structure, margins, and layout density) rather than individual line strokes.
Use Case: Bulk Page Correction
This endpoint is the most efficient way to pre-process scanned archives. Instead of checking
the orientation of 50 individual lines, you can check the whole page once. If the page
returns 0°, you can safely assume all lines are upright, saving significant
computational resources.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the block or full page image. |
| image_file | file | Binary image file upload. |
Example Request
Example Response
5. Other Utilities
A suite of auxiliary computer vision tools designed to enhance document quality and extract metadata before the main pipeline processing begins.
Super Resolution
Historical documents are often scanned at low DPI. Our AI upscaler reconstructs missing pixel details to improve OCR accuracy on degraded text.
Style Classification
Automatically routes documents to the correct OCR engine by detecting if a page is Printed (Matbu) or Handwritten (Rika/Nesih).
/image/upscale
Increases the resolution of an image using Deep Convolutional Neural Networks. Unlike standard bicubic interpolation, this model "hallucinates" plausible high-frequency details based on the texture of paper and ink.
Note: This endpoint returns a binary image stream (image/webp),
not JSON. The response time scales linearly with the input image size and the chosen scale
factor.
Body Parameters (Multipart/Form-Data)
| Parameter | Type | Description |
|---|---|---|
| image_file | file | Binary image file. |
| image_url | string | URL to the image (Alternative to file upload). |
| scale_factor | integer |
Magnification level. Supported values:
1,
2,
4.
Default: 2 |
Example Request (cURL)
curl -X POST /image/upscale \
-F "[email protected]" \
-F "scale_factor=4" \
--output doc_4x.webp
Example Response
/image/script-recognition
Analyzes the visual strokes of the document to determine if it is handwritten or printed. This probability score helps in deciding whether to send the image to a standard OCR engine or a specialized handwritten (HTR) model.
Body Parameters
| Parameter | Type | Description |
|---|---|---|
| image_url | string | Direct URL to the document image. |
| image_file | file | Binary image file upload. |
Example Request
Example Response
Logic: The is_handwritten boolean is automatically set
to true if the probability exceeds 0.5.
However, for ambiguous documents, you may want to implement your own threshold logic
using the raw probability score.