Amazon DocumentDB Review

Gianluca Della Corte | Systems Architect, Hotels.com in London

Originally published on the Hotels.com Technology blog

On January 9th Amazon announced a new database service called Amazon DocumentDB that they described as a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads”.

Is Amazon DocumentDB a real MongoDB?

While offering a MongoDB-compatible API, DocumentDB is not running MongoDB software, but “Amazon DocumentDB emulates the responses that a client expects from a MongoDB server by implementing the Apache 2.0 open source MongoDB 3.6 API” on top of an undisclosed storage engine. From some information, it looks like it is built on top of the Aurora storage subsystem that is also used by both Aurora MySQL and Aurora PostgreSQL. In fact the following features/limitations are common to both DocumentDB and Aurora:

  • both replicate six copies of data across three AWS Availability Zones
  • both have cluster size limit of 64 TB
  • both do not allow null characters (‘\0’ ) in strings
  • identifiers are limited to 63 letters for both
  • both persist a write-ahead log when writing
  • both don’t need to write full buffer page syncs

High Availability

Amazon DocumentDB is designed for 99.99% availability and replicates six copies of your data across three AWS Availability Zones (AZs). DocumentDB availability goal is lower when you have less instances or when it is deployed in less than 3 AZs:

Fig. 1: DocumentDB availability

An Amazon DocumentDB cluster consists of two components:

  • Cluster volume: cluster has exactly one cluster volume, which can store up to 64 TB of data.
  • Instances: provide the processing power for the database, writing data to, and reading data from, the cluster storage volume. An Amazon DocumentDB cluster can have 0–16 instances:
     – Primary instance: supports read and write operations and performs all data modifications to the cluster volume. Each Amazon DocumentDB cluster has one primary instance.
     – Replica instance: supports only read operations. An Amazon DocumentDB cluster can have up to 15 replicas in addition to the primary instance.
Fig. 2: Deployment scenario

If the primary instance fails, an Amazon DocumentDB replica is promoted to the primary instance. There is a brief interruption during which read and write requests made to the primary instance fail with an exception. Amazon estimates this interruption is less than 120 seconds.
You can customise the order in which replicas are promoted to the primary instance after a failure by assigning each replica a priority, note that it is strongly suggested that replicas should be of the same instance class as the primary. It is also really important to create at least one or more Amazon DocumentDB replicas in two or more different Availability Zones, in this way your datastore can survive a zone failure.

Scalability & Replication

By placing replica instances in separate Availability Zones, it is possible to scale reads and increase cluster availability.

Compute and storage scale independently. It is possible to scale reads by deploying additional replicas. Scalability and storage are scalable up-to 64TB. DocumentDB automatically adds 10GB whenever it reaches capacity.

DocumentDB is also able to automatically fail over to a read replica in the event of a failure–typically in less than 30 seconds. Currently Amazon DocumentDB doesn’t support any kind of multi-region setup.

Amazon DocumentDB does not rely on replicating data to multiple instances to achieve durability, data is durable whether it contains a single instance or 15 instances.
All writes are processed by the primary instance that executes a durable write to the cluster volume. It then replicates the state of that write (not the data) to each active replica. Writes to an Amazon DocumentDB cluster are atomic within a single document.

Consistency

Reads from Amazon DocumentDB replicas are eventually consistent with minimal replica lag (AWS says usually less than 100 milliseconds) after the primary instance writes the data:

  • reads from an Amazon DocumentDB cluster’s primary instance have read-after-write consistency
  • reads from a read replica have eventual consistency

It is possible to modify the read consistency level by specifying the read preference for the request or connection (it supports all MongoDB read preferences):

  • primary: reads are always routed to the primary instance
  • primaryPreferred: routes reads to the primary instance under normal operation, in case of failover a replica is used
  • secondary: reads are only routed to a replica, never the primary instance
  • secondaryPreferred: reads are routed to a read replica when one or more replicas are active. If there are no active replica instances in a cluster, the read request is routed to the primary instance
  • nearest: read preference routes reads based solely on the measured latency between the client and all instances in the Amazon DocumentDB cluster

Operations

It is possible to create an AWS DocumentDB cluster using CloudFormation stack (as described here).

Amazon DocumentDB is a fully managed solution that provides the following features:

  • auto scaling storage (up to 64 TB in 10GB increments)
  • simple compute resource scaling (resources allocated to an instance can be modified by changing instance class)
  • built-in monitoring, fault detection, and failover
  • daily snapshots

AWS DocumentDB vs AWS ElasticSearch

DocumentDB and ElasticSearch have a lot of features in common, in fact you could even use ElasticSearch as a primary datastore. Some of the features they have in common are:

  • document oriented store
  • schema-free
  • distributed data storage
  • high-availability
  • replication

However, they come from 2 different database families and are made for different purposes. DocumentDB is a document store while ElasticSearch is a search engine.

Here are some key differences between the two:

  1. Indexing — ElasticSearch uses Apache Lucene for indexing while MongoDB indexes are based on traditional B+ Tree. Real-time indexing and searching power of ElasticSearch comes from Lucene, which allows creation of indexes on every field of a document by default. In MongoDB, we have to define the index, which improves query performance, but affects write operations.
  2. Writing — ElasticSearch is slower on adding new data. In ElasticSearch indexing semantics are defined on client side. Indexing cannot be optimised as well as with DocumentDB.

In practice, ElasticSearch is often used together with NoSQL and SQL databases. A datastore is used as persistent storage and source of truth, and ElasticSearch is used for doing complex search queries.

Another key consideration while evaluating DocumentDB vs ElasticSearch is the effort/complexity associated with an ElasticSearch domains definition, sizing and maintenance. It is not so straightforward to do it (in fact it is hard to correctly size storage, shards and instance size). AWS provides some good guidelines, but it is more complex than working with DocumentDB which doesn’t require these considerations.

Hotels.com Architecture team’s advice

Currently in Hotels.com we use many different datastores/search engines, so it is good to summarise our advice on when Amazon DocumentDB is or is not a good option.

Amazon DocumentDB is a good solution when you need to store unstructured data that doesn’t require too many indexes or complex search features. 
A good benefit is that you don’t need to care too much about queries upfront. This is particularly useful when you are not the owner/producer of the data you are storing, so you don’t need to adapt your schema to a possible new data structure (like you must do with a SQL database like Amazon Aurora) and you can query data also using new fields (thing that you cannot easily do using another NoSQL solution like Amazon DynamoDB, where your data schema is based on your queries).

It is also a good solution when you don’t need rich indexing capabilities and complex/fast search support (ranked results, full text search with partial matching without using regex, complex geospatial queries with inclusion/exclusion). For these kind of scenarios Amazon ElasticSearch is a better choice.

Currently Amazon DocumentDB has two big drawbacks:

  • no multi-region support
  • just provisioned mode (not available in serverless mode)

References

Career Check-in: Divya Bhardwaj

Divya Bhardwaj | Supervisor, International Payroll in Gurgaon

What does your typical work day look like?

The beauty of working in a truly global & diverse environment – different time zones – is that there is no “typical” working day. There is a planned itinerary and then there is an unplanned one which spices up the day with new encounters. This is absolutely stimulating to the brain cells. But amidst all this excitement something that’s never off the radar is keeping a pulse of customer satisfaction through Service Now Dashboard.

What have you enjoyed most about working at Expedia Group?

It’s evolving each day – lightning fast! Working on multiple global & regional projects and initiatives sets a prodigious learning ground: “Fasten your seat-belts, we are in for a bumpy ride” – I JUST LOVE IT!

What makes your team unique?

The People! I love the One Team, Group First culture & the appetite for extraordinary customer service advanced from a diet of customer-centric values.

With my team in Tokyo

What accomplishment are you most proud of?

Happy customers make my day and I strongly believe in “First Time Right Philosophy”. I am proud to have lived by both!

Who has influenced you the most?

The list is long & beautiful – all Women leaders starting from my working mother to business leaders like Indra Nooyi to political leaders like Hillary Clinton to my own Expedia Group leaders like Becky and Preet – keeps me going and motivated!!


Cultural day 2018 India

How and where do you find inspiration?

Crazy, but I am inspired by risk. An adventurous trip where the destination is yet not confirmed however the journey is bound to be exciting (full of potential failures & experiments)? That’s what gets my heart and imagination pumping

How did you learn to embrace failure?

Albert Einstein rightly said, “If you’ve never failed, you’ve never tried anything new.” I have witnessed failures, but I am still on my journey to gracefully embrace failures. I believe in assessing potential risks and mitigating them to minimize the chance of failure.

Year end team lunch

What is your favorite piece of career advice?

Avoid being paralyzed by fear – Give wings to your thoughts and you will soar high. It’s a piece of advice I follow, too!

Tell us about your favorite vacation.

One of my most memorable vacation was with my family last year to Andaman and Nicobar Islands. From the multicultural town of Port Blair to picturesque pristine beaches, crystal clear water of Neil and Havelock Islands, Andaman offers a perfect choice for an exciting and peaceful vacation, and the more adventurous deep-sea diving – this place completely bowled me over for the second time.

What is your favorite weekend?

Lazing around in perfect peace in mesmerizing ambiance of my living room with my family (two naughty chirpy girls and a not so naughty husband) with some quick snacking & chit chat is the perfect weekend for me. Sounds cheesy maybe, but this comes as a gift of motherhood to me.

Career Check-in: Faisal Saiyed

Faisal Saiyed | Director, APAC People Services in Gurgaon

What does your typical work day look like?

In general, I have long days since I handle APAC. Being based in India, my first half typically is about engaging with my team, employees and managers in APAC. Evenings are often about hosting/participating in calls from US or other locations and thus I can often be found checking emails late in the night😊

What have you enjoyed most about working at EG?

The encouragement to think wide, to test and learn. There is a hugely supportive environment that allows one to risk failure without any negativity attached to it. Plus, I get to play out my role with a lot of freedom and autonomy.

What makes your team unique?

My team comprises of 6 nationalities and works across multiple time zones in APAC. They are incredibly passionate, driven and highly empathetic. I love their energy and ability to get stuff done.

What accomplishments are you most proud of?

When we started off People Services team in APAC, there were many things that needed to align better. We were expending a lot of effort, but the impact on employees was sub optimal. Over the last 18 months, I am incredibly proud of the team that we have built, the technology interventions we have implemented and process excellence that we have fostered. While we still have a long way to go, we have already started impacting employees in a positive way. Our Employee experience is much improved and that such makes me incredibly excited.

Who has influenced you the mos?

Growing up, my father was a key influence in my life. Then, my wife and my daughter have two big influences on my life and I have learnt so much from them!

How and where do you find inspiration?

I find inspiration in little little things in every day. A kind gesture, a lovely song or beautiful scenery really charge me up. I often turn to poetry to sooth a troubled day. Finally, I am also inspired by how people surmount challenges and demonstrate an incredible will to live and live well.

How did you learn to embrace failure?

I have always taken failure ‘personally’ and often brood on it. Over time, I have pushed myself to ‘let go’ and not let my ego come in the middle. This has been a really tough and learning experience for me and I am still on that journey.

What is best career advice?

My most frequent recommendations in terms of career advice are two (i) strive to be awesome at the role that you are doing such that you are upheld as a role model, and (ii) create a wider spectrum of skills so that one is able to broaden one’s capabilities to take on different roles. That way, we can demonstrate excellence in the current role and have a bouquet of skills to offer that can help us go to new/different roles!

Tell us about your favorite vacation.

This has to be Scotland and Lake Districts in North England. Picture-post card perfect places, great weather and we had a lovely place to stay.

What is favorite weekend getaway?

I love the hills, so whenever I get a chance, I relish going into the mountains and spending some quality time.

Three things I learnt being a scrum master

Giuseppe Sorrentino | User Interface Engineer, Hotels.com in Rome

Originally published on The Hotels.com Technology Blog

Introduction

I am very happy to have had the opportunity to work in the Agile world for almost 4 years, that have been fantastic and challenging.

Being a Scrum master is an invaluable experience and makes you understand and reflect a lot about company processes and software development in general.

It is very hard to discover and address disfunctionalities in teams’ processes. In fact, disfunctionalities are often sneaky. Metrics and surveys can help you but you need to develop an insight to recognize them and this helps you improve a lot as person and professional.

I decided to share with you three thoughts I noted down in these years.

1. Training is not enough, make it real by being assertive (when necessary)

In these four years I did tons of training. Prepared tons of presentations on the various agile practices and artifacts: Kanban, Scrum, backlog and backlog refinement, pair programming are only examples.

One thing I learnt is that while training on agile is valuable, practice is more valuable. The capacity toward making practices real in day to day life is fundamental in the scrum master profession. In order to do that there are two different and antithetic approaches:

  • wait that a practice emerges in the team
  • be assertive and effectively contribute by pushing for the application toward some beneficial practices.

Being able to find the right balance between these two approaches is a fundamental key in a scrum master role. In a perfect world the Scrum master would always choose the first approach. But in the real world, this is not always feasible. For example, there could be situations where it is not possible to wait until the team becomes mature enough to adopt a practice. On these occasions, in my honest opinion, is when the Scrum master needs to be assertive.

2. If you want to go with Kanban, start with Scrum

I am assuming you are familiar with the Tuckman’s stages of group developmenthere.

The Tuckman’s stages of group development

It is harder to start directly with Kanban than starting with Scrum and transitioning to Kanban. In fact, Kanban requires much more discipline from the team than scrum. Pulling stories at the right time, limiting the amount of work-in-progress items, are very challenging tasks, even for a very small group of people. This makes Kanban more functional in the teams that are in the norming or performing phase or however not at their beginning. While scrum being more prescriptive, is perfect for a team in the forming and storming phase.

It is a good idea to start with Scrum and transition smoothly to Kanban when you feel the team is ready, or rather when the team is entering in the norming/performing phase. There are many indicators a team is transitioning toward the norming/performing phase:

  • stability in practices adopted
  • stability in team composition
  • continuous success of sprints
  • self-organization in main scrum ceremonies
  • stability in velocity and throughput.

3. Scrum application outside the software world often is not clear

While scrum is supposed to be an universal framework, in the sense it should be applicable outside of software world, this application is not always immediately clear.

In Hotels.com we give training on Agile to very different functions and we encountered difficulties in recognizing a way to apply scrum to certain realities outside of technology. For example there is not so much literature on how backlog items should be documented. Neither is clear how to manage realities where we have mostly personal work rather than team work.

Conclusion

I had four challenging years as Scrum master and this opportunity make me grow as person as well as IT professional. During these years I had the opportunity to reflect on some aspect of the Scrum master practices.

Particularly I discovered that the Scrum master need to be assertive and effectively contribute by pushing for the application toward some beneficial practices when necessary. The natural emergence of all the team practices is simply a Scrum myth.

I, furthermore, think that Starting directly with Kanban for a new team can be counterproductive. My suggestion here is to evaluate Scrum as bootstrap for Kanban.

The last point: the fact that Scrum universality (its application outside of IT projects) is not crystal clear. Under this point of view a great community effort to make Scrum more accessible is needed.

Thanks to Gayathri Thiyagarajan.

Finatra in a Haystack

Originally published on The Hotels.com Technology Blog

Ryan Burke | Software Development Engineer, Hotels.com in London

Haystack is an Expedia-backed open source project to facilitate detection and remediation of problems with enterprise-level web services and websites. Haystack uses tracing data to help locate the source of problems, providing the ability to drill down to the precise part of a service transaction where failures or latency are occurring — and find the proverbial “needle in a haystack”. Once you know specifically where the problem is happening, it’s much easier to identify and understand the appropriate diagnostic data, find the problem, and fix it.

Finatra is a web framework created by Twitter built on top of TwitterServerand Finagle, it is the web framework of choice for the majority of Scala core services at Hotels.com. Recently, we wanted to integrate our services with Haystack in order to have distributed tracing information across service boundaries.

Finatra supports out of the box tracing using standard Zipkin X-B3-* HTTP headers. In order to report this data to Haystack we needed to publish the tracing data to a proxy service we have running which forwards it to both Zipkin and Haystack.


zipkin-finagle

Fortunately for us, zipkin-finagle provides functionality for reporting tracing information over a network. This library allows for tracing information to be sent via HTTP, Scribe, or published to a Kafka topic. Creating a new zipkin tracer is simple once you bring in zipkin-finagle as a project dependency:

val config = HttpZipkinTracer.Config.builder()
.host("zipkin-host:80")
.hostHeader("zipkin-host")
.initialSampleRate(0.0)
.compressionEnabled(true)
.build()
val tracer = HttpZipkinTracer.create(config, statsReceiver)

In the Finatra app’s HttpServer class you have the ability to set the tracer and label to be used in reporting by overriding the configureHttpServer function.

override def configureHttpServer(server: Http.Server): Http.Server =
server
.withLabel(“service-name”)
.withTracer(tracer)

After this, sending tracing headers to the service will result in the data being published to Haystack for visualisation. If you’re using Finagle clients to call other services as part of a request, these will automatically be propagated and all your dependencies will show up too.

Haystack tracing visualisation

Dealing with Futures

Finatra and Finagle are designed to operate in a non-blocking asynchronous way, allowing it to scale and keep the overhead of accepting a new request low. There is no global requests thread pool to configure, just don’t block when you’re handling the request. As such, when we are dealing with asynchronous code we don’t have the concept of a single request thread to do things like MDC, which is how you would normally keep track of per-request state such as tracing information.

When using Scala Future[T] we need some way to manually keep track of the tracing information between thread boundaries. We found there was no elegant way to do this without creating a wrapper around Future which copies a context between execution threads. Alternatively you can create a custom ExecutionContext in which the Future can run that provides the same functionality. Problems arise when you use a third party library or some bit of code that doesn’t allow you to define the ExecutionContext or the return type.

Twitter were an early adopter of Scala and provide a util library which duplicates and builds upon the Scala standard library features. This includes the Twitter Future, a cancellable Future with no ExecutionContext to manage and the built-in ability to keep track of a Context across thread boundaries. The Finatra server uses them at the edge and Finagle clients return Twitter Futures too. If you use them throughout your application instead of the standard Scala Future then you’ll get tracing propagation for free, at the expense of being a little more tied into the Twitter ecosystem.


Twitter Service Loader

One thing to watch out for is the zipkin-finagle library defining a service in the META-INF/services folder. Finatra uses Guice for dependency injection and if a library defines a file in the services folder then it will auto-magically be created for you and registered in the service registry. This can make it easier to integrate with Zipkin, you can ignore all the code changes above and instead set some environment variables to let the library create and register the service for you.

In my team we tend to prefer explicitly defining behaviour rather than relying on magic components of frameworks to do this for us. It’s why we moved away from Spring, manually wire everything, try to avoid internal shared libraries and write our own request filter logic.

Once we manually wired the tracer using withTracer we assumed that this would override the one being created from the service loader, but we were wrong. Both were being created and running at the same time, causing the unconfigured default tracer to throw errors (it defaults to sending data to localhost). In order to disable this we have to modify our Docker file to add an additional Java opt:

ENTRYPOINT [“/bin/sh”, “-c”, “exec java $JAVA_OPTS -Dcom.twitter.finagle.util.loadServiceDenied=zipkin2.finagle.http.HttpZipkinTracer -jar service.jar $0 $@”]

This is a bit nasty, we have a hard coded class name in our Docker file and if it ever changes name then it’ll start loading two HttpZipkinTracer instances again. That’s the cost of being able to define the tracer ourselves.


Shameless plug

We are are hiring! If you’re passionate about software engineering and what we do sounds interesting check out our roles!