Explore the journey of Honeycomb.io, a Series B startup in the observability space, as they evaluate and implement arm64 processor architecture to optimize cost and performance of their telemetry ingest and indexing workload. Dive into the process of setting up the evaluation, full migration, and improvements made to the ecosystem over a year-long period. Learn how 92% of all compute workloads were successfully migrated to arm64, resulting in a 40% drop in compute costs and modest improvements in end-user visible latency. Discover the roadblocks and challenges faced, including lack of full software compatibility, hidden performance quirks, and additional complexity. Gain insights into the history of processor architectures, the efficiency of ARM, and the importance of Service Level Objectives (SLOs) in user flows. Explore the service architecture, including the Shepherd ingest API service and Retriever, and understand the steps taken to migrate production environments. Examine the impact of AWS instance availability and Kafka on the migration process. Conclude with valuable lessons learned, including setting measurable goals, acknowledging hidden risks, prioritizing team well-being, and optimizing for safety in large-scale migrations.
Read more