MarkLogic: Driving New User Experiences

Gary Bloom, President & CEO It was the 40th Anniversary of Saturday Night Live and NBC’s digital team intended to do “something special” with all the great content it has generated over the years. Few would have expected what was to come. The network broke new ground by taking high value content—nearly the entire SNL catalog of skits and performances across every season—and making it all available on their free and ingenious SNL app. It uses ‘smart content’ where the metadata describes the semantics of the show: seasons, eras, stars, guests and how they all relate together are part of the way an individual uses the app. What’s more, the data is perfectly linked to the assets. Behind the scenes, powering the SNL app is MarkLogic’s operational and transactional enterprise NoSQL database platform, seamlessly processing mountains of unstructured data into an audio-visual spectacle—innovating and putting semantic data to work to create a customized, continuous streaming experience.

The platform’s semantics and search helped NBC create smart content by enabling the team to integrate, manage, and search a wide range of data— scenes, characters, guest stars, and more—and further enrich the content with user preferences and actions that grew along with the user base. The app not only queued video recommendations but also discovered and recommended additional content based on users’ previous activities.

"MarkLogic indexes words, values and even the document structure and doesn’t require adherence to a particular schema—creating a new generation of technology for this big data world"

Today, enterprises such as NBC are realizing that Big Data has a wider role to play than the traditional approach of mere storage and retrieval of data. Corporate, business and government bodies generate terabytes of data every second in structured and unstructured formats making it difficult to search across and extract the value held by the data. “What isn't working is the data integration projects that large companies have tried to do with relational databases. Many of those projects have become stalled because of the limitations of rows and columns. We can handle both unstructured and structured data and that means integration is faster and far less costly,” says Gary Bloom, CEO, MarkLogic. Designed to integrate data from silos better and faster, MarkLogic allows firms to integrate data and build their 360-degree view up to four times faster than a traditional database—without sacrificing any of the enterprise features required for storing and managing mission-critical data.

MarkLogic’s database helps enterprises fuse multi-structured data with its built-in search and application services so it can be stored, managed, searched—and actually used. Bloom mentions, “MarkLogic indexes words, values and even the document structure and doesn’t require adherence to a particular schema—creating a new generation of technology for this big data world.”

Living off Innovation

Founded and incorporated in San Carlos, CA, MarkLogic uses XML documents as its data model, and stores the information within an ACID (Atomicity, Consistency, Isolation, Durability), fully transactional NoSQL repository.
People in the NoSQL world have been taught to believe that NoSQL cannot have the same characteristics as the grand-daddy of databases, the relational database. But when MarkLogic Founder Christopher Lindblad designed a “database built for search” nearly 15 years ago, he knew that to be government-grade and enterprise ready, his database had to have the very features that are most coveted in existing relational databases: consistency and durability. Lindblad realized that enterprises were going to be limited in what they could do with their text-based data unless someone figured out a different way to ingest, index, and store it. “Before Big Data was called Big Data, the MarkLogic team spotted the ‘publishing problem’ that plagues every large enterprise: Digitize content or die. That led the team to build the first-ever NoSQL database.”

Our system means enterprises can leave data management to us, whether on-premise or in the cloud, while they grow their business

And he built search right into the technology stack; if data is going to be stored, it has to be searchable, and bolt-on search engines cause synchronization and latency issues (as well as development resources) that he felt could be easily avoided.

The company's unique approach to the vast data management sector has been boosted with its adoption of even more enterprise features including complementing, leveraging, and enhancing investments in Hadoop—for example, letting clients easily move data between MarkLogic and Hadoop within applications. The company's NoSQL solution, a specialized and flexible database with world-class authentication, authorization and agility, allows organizations to more quickly and easily derive value out of their vast stores of Big Data.

Value Propositions that Sets the Company Apart from its Peers

MarkLogic defines its solution as “a document-centric, transactional, search-centric, structure-aware, schema-agnostic, programmatic, high-performance, clustered, database server.” Utilizing a hierarchical data model, MarkLogic supports “any structured” data in compressed binary “trees.” Ostensibly, MarkLogic handles virtually any kind of structured, unstructured or semi-structured data, from documents, image metadata and video to spreadsheet and financial data.

To manage inserts, updates, and concurrent reads, MarkLogic uses Multi-Versioning Concurrency Control (MVCC), appending changes and using time stamps to track the birth and death of a document. MVCC benefits include support for ACID transactions, (critical to the financial industry), fast updates, large sequential block writes (which offers fast ingestion and the ability to run on block storage), point-in-time recovery, fast database rollback, and lock-free reads.

MarkLogic stores XML document data in a highly efficient manner utilizing a binary coding scheme to maintain its high performance and skimp on storage space. According to a whitepaper, Inside MarkLogic Server, “The tree structure of a document gets saved using a compact binary encoding. The text nodes get saved using a dictionary-based compression scheme. In this scheme, the text gets tokenized (into words, whitespace and punctuation) and each document constructs its own dictionary, mapping numeric token IDs to token values.

Instead of storing strings as sequences of characters, each string gets stored as a sequence of numeric token IDs. The original string can be reconstructed using the dictionary as a lookup table.”

On Cloud ‘9’

MarkLogic World conference in San Francisco witnessed the firm launch version 9 of its namesake NoSQL database to ensure continued success. The platform touts new data integration, security, and manageability capabilities. The firm plans to target enterprises with government-grade security with this update. “You can actually describe what entities are, such as ‘This is what a customer is and how it relates to other entities,’” says Joe Pasqua, EVP-Products, MarkLogic. “Today all the focus is on a low-level schema and all that information is encoded into the platform. We’re capturing that in the database to make data more valuable.”

The release also introduced Optic API, a query mechanism that enables developers to combine documents, triples and rows flexibly across entities, perform aggregations, and project data in different views. Queries using the Optic API would be faster as the system will now leverage a new underlying index and distributed execution across a cluster. The company also enhanced integration between data from the MarkLogic database, existing SQL tools, and security.

The enhancement in security included an advanced encryption capability using standards-based cryptography, key management and regular separation of duties. Element-level security will allow specific elements or properties of XML and JSON documents to be hidden from particular users, providing a more granular level of security than the current document-level protection. Permissions can be specified down to the element level in JSON or XML formats.

Also on the security front, the firm announced a licensing and technology partnership with Cryptsoft, an Australian vendor of enterprise key management security. MarkLogic will embed Cryptsoft’s Key Management Interoperability Protocol (KMIP) technology into MarkLogic 9, giving customers access to a key storage and management suite that’s compliant with the Organization for the Advancement of Structured Information Standards (OASIS).

The ‘Big’Future

According to Wikibon, a Wiki for Sharing Technology and Business Knowledge, entities will derive $1.2 trillion in new value from Big Data in the coming decade. “Big Data is big business and companies who do it well will thrive. Those who don’t, will fall behind. Those winning enterprises need to have insight into the data they store and they need to operationalize their business based on this insight,” substantiates Bloom. “We believe NoSQL is the next-generation technology for the modern enterprise. That’s why we’ve spent years engineering our database and honing a winning business model that’s as revolutionary and impactful today as relational was 35 years ago—even more so given the central importance of data in every aspect of the 21st-century business.”

“Our system means enterprises can leave data management to us, whether on-premise or in the cloud, while they grow their business. When we make a change, it’s all fully integrated and the enterprise customer only notices a benefit—nothing more,” mentions Bloom.

He further informs, “For decades to come, enterprises who are data rich but information poor will fall to others who can harness, secure, analyze and act on humungous amounts of insightful data. We will be there, the adult in charge.”


San Carlos, CA

Gary Bloom, President & CEO

MarkLogic is an operational and transactional Enterprise NoSQL database platform which integrates a firm’s critical data and builds innovative applications on a 360-degree view of that data

Whitepapers of MarkLogic