Documentation Pipeline V1

Console

Ottoman/Arabic End-to-End Pipeline

Unlocking 600 years of history through Artificial Intelligence.

Millions of pages of Ottoman/Arabic documents—spanning books, journals, and historical archives—have remained largely inaccessible for over a century, readable only by a select few experts. The manual transcription of this vast cultural heritage is a task that exceeds human constraints of time and budget.

By synthesizing state-of-the-art Computer Vision, Deep Learning, and Natural Language Processing (NLP), we provide an automated workflow to digitize, transcribe, and translate printed Ottoman documents into Modern Turkish.

Segmentation

Before recognition begins, raw document images undergo advanced layout analysis. Our algorithms separate text blocks, detect headers, and perform precise line segmentation to isolate content for the neural networks.

Optical Character Recognition

We utilize OCR architectures specifically trained on millions of Ottoman text lines. This allows for high-accuracy character recognition even on aged or lower-quality historical documents.

NLP & Transliteration

Raw text is processed through a complex NLP pipeline including orthographic normalization, morphological analysis, and semantic resolution. The system converts the Ottoman alphabet to the Latin alphabet and modernizes the lexicon for contemporary readers.

Technical Specifications

Supported Scripts Handwritten and Printed / Arabic, Ottoman or Persian

Output JSON (Structured Text & Coordinates)

Authentication

The Osmanlica.com API uses API keys to authenticate requests. You must include your unique key in the headers of every request you make to the API.

1. Get an API Key

You can create and manage your API keys in the Console. We recommend creating separate keys for development and production environments.

2. Pass the Key in Headers

Pass your key using the Authorization header with the Api-Key prefix.

Example Request

curl -X POST https://api.osmanlica.com/health \
              -H "Authorization: Api-Key YOUR_API_KEY_HERE" \
              -H "Content-Type: application/json" \
              -d '{"image_base64": "..."}'

Security Best Practice

Your API key carries many privileges, so be sure to keep it secure! Do not share your secret API keys in publicly accessible areas such as GitHub, client-side code, or front-end JavaScript.

1. Segmentation API

The foundation of accurate OCR lies in precise document layout analysis. The Segmentation API decomposes complex historical documents into their constituent structural elements.

Powered by Deep Learning

Historical Ottoman documents often feature irregular layouts, marginalia, and tight line spacing that confuse standard OCR engines. Our segmentation models have been trained on a proprietary dataset of thousands of verified Ottoman documents.

By mathematically isolating text regions before recognition begins, we dramatically reduce noise and increase the character accuracy rate (CER) of the downstream OCR process.

Line Segmentation

Detects baselines and separates individual lines of text. Essential for feeding the CRNN OCR engine.

Block Segmentation

Identifies paragraphs and text columns to determine distinct regions of content.

Layout Analysis

The Holistic View: Combines block and line segmentation with orientation detection to map the full page structure and establish a logical reading order.

POST

/segmentation/line

Detects and segments individual text lines within a document. Our model allows for the detection of dense, small text lines in historical manuscripts without the information loss typical of standard resizing methods.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the document image. (Required if image_file is not provided)
image_file	file	Binary image file upload (Multipart/form-data).
confidence_threshold	float	Minimum confidence score. Default: `0.3`

Example Request (JSON)

Example Response

Note: The lines array returns a list of polygons. Each polygon is a list of [x, y] coordinates tracing the exact contour of the text line, providing higher precision than standard bounding boxes.

POST

/segmentation/block

Identifies and isolates distinct regions of text (paragraphs, columns, marginalia) within the document. The architecture is optimized for instance segmentation, allowing the system to distinguish between main content bodies and side notes, which is crucial for establishing the correct logical reading order.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the document image. (Required if image_file is not provided)
image_file	file	Binary image file upload (Multipart/form-data).
confidence_threshold	float	Minimum score to detect a block. Default: `0.5`

Example Request (JSON)

Example Response

Note: The returned blocks are defined by polygon coordinates rather than bounding boxes. This allows the system to accurately capture non-rectangular text regions common in artistic or marginal Ottoman script layouts.

POST

/segmentation/layout

The Holistic Pipeline

This endpoint processes a document through a multi-stage segmentation pipeline. It performs block detection, line-level segmentation, and orientation estimation to reconstruct the structural hierarchy of the page in a machine-readable format.

Processing Logic

Block Seg

Isolate regions

Crop & Pad

Focus ROI

Deskew

Correct Angle

Line Seg

Final Extract

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the document image.
image_file	file	Binary image file upload.

Example Request (JSON)

Example Response

Pipeline Note: This endpoint performs automatic coordinate transformation. Even if the internal model rotates the image to correct orientation (e.g., 180° flip), the returned coordinates in lines and points are mapped back to the original image's coordinate space.

2. OCR API

Convert Ottoman script images into machine-readable digital text.

POST

/ocr/line

Recognizes text from a single, pre-segmented line, powered by our OCR engine. It is ideal for scenarios where segmentation is handled externally or for processing individual snippets of text.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the cropped line image.
image_file	file	Binary image file upload.
detect_script	boolean	If `true`, performs a secondary analysis to determine if the text is handwritten or printed. Default: `false`

Example Request (JSON)

Example Response

Performance Note: This endpoint includes an intelligent caching layer. Repeated requests for the same image hash are served instantly from memory, bypassing the OCR inference engine.

POST

/ocr/page

Flagship Endpoint

End-to-End Document Understanding

The Page OCR endpoint is the comprehensive solution for full document digitization. It orchestrates the entire AI pipeline—segmentation, layout analysis, orientation correction, and character recognition—into a single API call. It returns a fully reconstructed textual representation of the page, ordered logically from right-to-left and top-to-bottom.

1. Detect Blocks

Isolate paragraphs

2. Detect Lines

Granular extraction

3. Angle & OCR

Correct & Recognize

4. Reassemble

Ordered Output

Performance Optimization

This endpoint includes a skip feature. If you have already processed the image via /segmentation/layout, you can pass that result in the layout_analysis parameter. The OCR engine will skip the segmentation phase and immediately process the text based on your provided coordinates.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the document image.
image_file	file	Binary image file upload.
layout_analysis	object	Optional The exact JSON object returned from the Layout API. If provided, segmentation is skipped.
detect_script	boolean	Identify handwriting vs print. Default: `false`

Example Request (JSON)

Example Response

Mapping Logic: The ocr_result string contains line breaks (\n). These lines correspond 1-to-1 with the order of the polygons in the layout_segmentation[].lines array. This allows you to highlight the exact location of every recognized sentence on the original image.

Try OCR/HTR

Test the flagship endpoint live. Upload a document to see the full pipeline in action.

Input Image

Blocks

Lines

Drag & Drop your Ottoman document

Supports JPG, PNG (Max 5MB)

Or try a sample

{v tooltip.text v}

Angle: {v tooltip.angle v}°

Block {v idx + 1 v}

{v block.lineCount v} line(s)

{v line ? line : '<no-text>' v}

Confidence: {v (block.confidence * 100).toFixed(1) v}%

{v ocrResult v}

Run analysis to see text results.

JSON structure will appear here.

Analyzing Document

Running segmentation & OCR models...

Analysis Failed

{v errorMsg v}

Status: {v statusText v} Latency: {v latency ? latency + 's' : '--' v}

3. Text Processing API

Beyond character recognition lies true understanding. The Text Processing API transforms raw Ottoman script into fluent, modern Turkish, bridging the gap between archival history and contemporary readership.

Built on Millions of Lines of History

Our linguistic models are not generic. They are distilled from a massive proprietary parallel corpus spanning millions of lines of literary, administrative, and legal Ottoman texts. By leveraging state-of-the-art Transformer-based architectures, we capture the nuance of archaic vocabulary and complex Persian/Arabic grammatical structures.

Contextual Awareness

Disambiguates words based on sentence context.

Orthographic Correction

Normalizes archaic spelling variations automatically.

Low Latency

Optimized inference for real-time applications.

POST

/text/transliteration

Hybrid Transliteration Engine

This endpoint converts Ottoman Turkish script (Arabic letters) into Modern Turkish (Latin letters). Unlike simple dictionary lookups, it employs an Adaptive Routing Architecture that selects the best translation strategy based on text complexity and length.

Long Context

Neural Transformer

Complex sentences and paragraphs are routed to our fine-tuned Transformer models. These models understand deep context, managing grammatical shifts and archaic vocabulary within the flow of a full sentence.

Short Context

Morphological NLP Engine

Short phrases, titles, or isolated words are processed via a rigorous Rule-based NLP Engine. This ensures high-precision orthographic conversion without the "hallucination" risks sometimes present in large language models when context is scarce.

Pipeline Intelligence

Contextual Integrity Preservation: Orphaned words or extremely short lines are automatically merged with neighboring context before translation to prevent fragmentation errors.
Smart Batching: The engine splits large texts into optimized batches, processing them in parallel for high throughput.
Structural Alignment: Post-processing algorithms realign the output to match the input line structure, ensuring the translated text visually mirrors the original document's layout.

Request

Body Parameters

Parameter	Type	Description
text	string	The raw Ottoman Turkish text string to be transliterated.

Example Request

Example Response

Try Transliteration

Test the hybrid engine interactively. Type or paste Ottoman script on the left.

Ottoman Turkish (Source)

{v charCount v} / 5000

Latin Turkish (Target)

{v outputText v}

Translation will appear here...

{v errorMsg v} Model: Hybrid Latency: {v latency v}s

POST

/text/summarization

Intelligent Archival Indexing

This endpoint is powered by a custom Large Language Model (LLM) fine-tuned for historical taxonomy. It goes beyond simple abstraction; it generates multi-tiered summaries optimized for archival search and extracts semantic metadata (entities and terminology) to structure unstructured history.

Output Feature 1: Multi-Tiered Summaries

Short Summary

Length: ~0.5x

A concise digest retaining core "WH" elements (Who, What, Where). Perfect for list views and mobile interfaces.

Standard

Main Summary

The gold standard for archival indexing. It comprehensively answers the 5 Ws (Who, When, Where, Why, How), mimicking the style of expert archivists.

Long Summary

Length: ~2.0x

A detailed elaboration including background context, causes, effects, and secondary actors. Ideal for research deep-dives.

Output Feature 2: Semantic Extraction

Named Entity Recognition (NER)

Identifies and categorizes specific actors and occurrences.

Dates People Events Places Orgs

Historical Terminology (Istılah)

Filters out common Turkish words to isolate specific Ottoman historical terms (Istılah Terimleri). This creates a highly specific index of legal, administrative, and literary jargon found in the corpus.

Example: "Lisan-ı hal"

Request

Body Parameters

Parameter	Type	Description
text	string	The Ottoman Turkish text (in Latin script) to be analyzed.

Example Request

Example Response

4. Orientation API

Historical scanning processes are rarely perfect. The Orientation API uses specialized classification models to detect the cardinal rotation of text regions, ensuring that downstream OCR models receive upright, readable input.

Discrete Classification Logic

Unlike the Segmentation API which handles fine skew correction (e.g., 2° or 3° tilts), this API is designed for Gross Orientation Correction. It utilizes lightweight, high-speed networks to classify input images into one of four cardinal directions: 0°, 90°, 180°, or 270°.

Output Reference Guide

0°

Correct / Upright

90°

Rotated Clockwise

180°

Upside Down

270°

Counter-Clockwise

POST

/orientation/line

Determines the cardinal orientation of a single text line. This endpoint uses a specialized networks trained on narrow, high-aspect-ratio image strips. It is critical for validating the output of segmentation before passing it to OCR, as upside-down text (180°) will result in gibberish output from the recognition engine.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the cropped line image.
image_file	file	Binary image file upload.

Example Request

Example Response

Correction Strategy: If the result is 180, you must rotate the image 180 degrees before sending it to the OCR endpoint. If the result is 0, proceed directly.

POST

/orientation/block

Detects the orientation of a text block or a full page. This model is trained on macro-level features (paragraph structure, margins, and layout density) rather than individual line strokes.

Use Case: Bulk Page Correction

This endpoint is the most efficient way to pre-process scanned archives. Instead of checking the orientation of 50 individual lines, you can check the whole page once. If the page returns 0°, you can safely assume all lines are upright, saving significant computational resources.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the block or full page image.
image_file	file	Binary image file upload.

Example Request

Example Response

5. Other Utilities

A suite of auxiliary computer vision tools designed to enhance document quality and extract metadata before the main pipeline processing begins.

Super Resolution

Historical documents are often scanned at low DPI. Our AI upscaler reconstructs missing pixel details to improve OCR accuracy on degraded text.

Style Classification

Automatically routes documents to the correct OCR engine by detecting if a page is Printed (Matbu) or Handwritten (Rika/Nesih).

POST

/image/upscale

Increases the resolution of an image using Deep Convolutional Neural Networks. Unlike standard bicubic interpolation, this model "hallucinates" plausible high-frequency details based on the texture of paper and ink.

Note: This endpoint returns a binary image stream (image/webp), not JSON. The response time scales linearly with the input image size and the chosen scale factor.

Request

Body Parameters (Multipart/Form-Data)

Parameter	Type	Description
image_file	file	Binary image file.
image_url	string	URL to the image (Alternative to file upload).
scale_factor	integer	Magnification level. Supported values: 1, 2, 4. Default: 2

Example Request (cURL)

curl -X POST /image/upscale \
  -F "[email protected]" \
  -F "scale_factor=4" \
  --output doc_4x.webp

Example Response

Binary Data (image/webp)

POST

/image/script-recognition

Analyzes the visual strokes of the document to determine if it is handwritten or printed. This probability score helps in deciding whether to send the image to a standard OCR engine or a specialized handwritten (HTR) model.

Request

Body Parameters

Parameter	Type	Description
image_url	string	Direct URL to the document image.
image_file	file	Binary image file upload.

Example Request

Example Response

Logic: The is_handwritten boolean is automatically set to true if the probability exceeds 0.5. However, for ambiguous documents, you may want to implement your own threshold logic using the raw probability score.