Build Telemetry for Distributed Services之Elastic APM

北战南征 提交于 2019-11-30 12:58:09

官网地址:https://www.elastic.co/guide/en/apm/get-started/current/index.html

 

 

 

Overview

Elastic APM is an application performance monitoring system built on the Elastic Stack. It allows you to monitor software services and applications in real time — collect detailed performance information on response time for incoming requests, database queries, calls to caches, external HTTP requests, and more. This makes it easy to pinpoint and fix performance problems quickly.

Elastic APM also automatically collects unhandled errors and exceptions. Errors are grouped based primarily on the stacktrace, so you can identify new errors as they appear and keep an eye on how many times specific errors happen.

Metrics are another important source of information when debugging production systems. Elastic APM agents automatically pick up basic host-level metrics and agent specific metrics, like JVM metrics in the Java Agent, and Go runtime metrics in the Go Agent.

 

Components and documentation

Elastic APM consists of four components: APM Agents, APM Server, Elasticsearch, and Kibana.

 

 

APM Agents

APM agents are open source libraries written in the same language as your service. You may only need one, or you might use all of them. You install them into your service as you would install any other library. They instrument your code and collect performance data and errors at runtime. This data is buffered for a short period and sent on to APM Server.

Each agent has its own documentation:

APM Server

APM Server is an open source application that receives performance data from your APM agents. It’s a separate component by design, which helps keep the agents light, prevents certain security risks, and improves compatibility across the Elastic Stack.

After the APM Server has validated and processed events from the APM agents, the server transforms the data into Elasticsearch documents and stores them in corresponding Elasticsearch indices. In a matter of seconds you can start viewing your application performance data in the Kibana APM UI.

The APM Server reference provides everything you need when it comes to working with the server. Here you can learn about installationconfigurationsecuritymonitoring, and more.

Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze large volumes of data quickly and in near real time. Elasticsearch is used to store APM performance metrics and make use of its aggregations.

APM Kibana UI

Kibana is an open source analytics and visualization platform designed to work with Elasticsearch. You use Kibana to search, view, and interact with data stored in Elasticsearch.

 Since application performance monitoring is all about visualizing data and detecting bottlenecks, it’s crucial you understand how to use the Kibana APM UI. The following sections will help you get started:

 

APM also has built-in integrations with Machine Learning. To learn more about this feature, refer to the Kibana UI documentation for Machine learning integration.

 

Visualizing Application Bottlenecks

Elastic APM captures different types of information from within instrumented applications:

  • Spans contain information about a specific code path that has been executed. They measure from the start to end of an activity, and they can have a parent/child relationship with other spans.
  • Transactions are a special kind of span that have extra metadata associated with them. You can think of transactions as the highest level of work you’re measuring within a service. As an example, a transaction could be a request to your server, a batch job, or a custom transaction type.
  • Errors contain information about the original exception that occurred or about a log created when the exception occurred.

Each of these information types have a specific page associated with them in the APM UI. These various pages display the captured data in curated charts and tables that allow you to easily compare and debug your applications

 For example, you can see information about response times, requests per minute, and status codes per endpoint. You can even dive into a specific request sample and get a complete waterfall view of what your application is spending its time on. You might see that your bottlenecks are in database queries, cache calls, or external requests. For each incoming request and each application error, you can also see contextual information such as the request header, user information, system values, or custom data that you manually attached to the request.

Having access to application-level insights with just a few clicks can drastically decrease the time you spend debugging errors, slow response times, and crashes.

 

Using APM

APM is designed to be as intuitive as possible, but you might come across certain terms or concepts that don’t feel native to you. Not to worry, we’ve created this guide to help you get the most out of Elastic APM.

APM is available via the navigation sidebar in Kibana.

Services overview

The Services overview gives you quick insights into the health and general performance of each service.

You can add services by setting the service.name configuration in each of the APM agents you’re instrumenting.

 

Traces overview

The Traces overview displays the entry transaction for all traces in your application. If you’re using Distributed tracing, this view is key to finding the critical paths within your application. Transactions with the same name are grouped together and only shown once in this table.

By default, transactions are sorted by Impact. Impact helps show the most used and slowest endpoints in your service - in other words, it’s the collective amount of pain a specific endpoint is causing your users. If there’s a particular endpoint you’re worried about, you can click on it to view the transaction details.

 

Distributed tracing

Elastic APM supports distributed tracing. Distributed tracing is a key feature of modern application performance monitoring as application architectures are shifting from monolithic to more distributed, service-based architectures.

Distributed tracing allows APM users to automatically trace requests all the way through the service architecture, and visualize those traces in one single view in the APM UI. This is accomplished by tracing all of the requests, from the initial web request to your front-end service, to queries made to your back-end services. This makes finding possible bottlenecks throughout your application much easier and faster.

By definition, a distributed trace includes more than one transaction. You can use the span timeline visualization to view a waterfall display of all of the transactions from individual services that are connected in a trace.

Distributed tracing is supported by all APM agents and there’s no additional configuration needed.

 

Transaction overview

transaction describes an event captured by an Elastic APM agent instrumenting a service. The APM agents automatically collect performance metrics on HTTP requests, database queries, and much more.

Selecting a service brings you to the transactions overview. The time spent by span type, transaction duration and requests per minutechart display information on all transactions associated with the selected service. The Transactions table, however, provides only a list of transaction groups for the selected service. In other words, this view groups all transactions of the same name together, and only displays one transaction for each group.

 Time spent by span type —  [beta] This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.Certain agents support breakdown graphs in the APM UI. This graph is an easy way to visualize where your application is spending most of its time. For example, is your app spending time in external calls, database processing, or application code execution?

The time a transaction took to complete is also recorded and displayed on the chart under the "app" label. "app" indicates that something was happening within the application, but we’re not sure exactly what. This could be a sign that the agent does not have auto-instrumentation for whatever was happening during that time.

It’s important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.

If the Time spent by span type chart is missing in the APM UI, it means your agent does not support this feature yet.

 

Transaction duration shows the response times for this service and is broken down into average, 95th, and 99th percentile. If there’s a weird spike that you’d like to investigate, you can simply zoom in on the graph - this will adjust the specific time range, and all of the data on the page will update accordingly.

 

 

Requests per minute is divided into response codes: 2xx, 3xx, 4xx, etc., and is useful for determining if you’re serving more of one code than you typically do. Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.

 

The Transactions table is similar to the traces overview and shows the name of each transaction occurring in the selected service. Transactions with the same name are grouped together and only shown once in this table. By default, transaction groups are sorted by Impact. Impact helps show the most used and slowest endpoints in your service - in other words, it’s the collective amount of pain a specific endpoint is causing your users. If there’s a particular endpoint you’re worried about, you can click on it to view the transaction details.

 

 

The transaction overview will only display helpful information when the transactions in your service are named correctly.

Elastic APM Agents come with built-in support for popular frameworks out-of-the-box. However, if you only see one route in the Transaction overview page, or if you have transactions named "unknown route", it could be a symptom that the agent either wasn’t installed correctly or doesn’t support your framework.

For further details, including troubleshooting and custom implementation instructions, refer to the documentation for each APM Agent you’ve implemented.

 

Transaction details

Selecting a transaction group will bring you to the transaction details. Transaction details include a high-level overview of the time spent by span type, transaction group duration, requests per minute, and transaction group duration distribution. It’s important to note that all of these graphs show data from every transaction within the selected transaction group

 A single sampled transaction is also displayed. This sampled transaction is based on your selection in the Transactions duration distribution. You can update the sampled transaction by selecting a new bucket in the transactions duration distribution graph. The number of requests per bucket is displayed when hovering over the graph, and the selected bucket is highlighted to stand out.

 

For a particular transaction sample, we can get even more information in the metadata tab:

  • Labels - Custom labels added by agents
  • HTTP request/response information
  • Host information
  • Container information
  • Service - The service/application runtime, agent, name, etc..
  • Process - The process id that served up the request.
  • Agent information
  • URL
  • User - Requires additional configuration, but allows you to see which user experienced the current transaction.
  • Custom - You can configure your agent to add custom contextual information on transactions.

All of this data is stored in documents in Elasticsearch. This means you can select "Actions - View sample document" to see the actual Elasticsearch document under the discover tab.

 

Span timeline

span is defined as the duration of a single event. Spans are automatically captured by APM agents, and you can also define custom spans. Each span has a type and is defined by a different color in the timeline/waterfall visualization.

The span timeline visualization is a bird’s-eye view of what your application was doing while it was trying to respond to the request that came in. This makes it useful for visualizing where the selected transaction spent most of its time.

 View a span in detail by clicking on it in the timeline waterfall. For example, in the below screenshot we’ve clicked on an SQL Select database query. The information displayed includes the actual SQL that was executed, how long it took, and the percentage of the trace’s total time. You also get a stack trace, which shows the SQL query in your code. Finally, APM knows which files are your code and which are just modules or libraries that you’ve installed. These library frames will be minimized by default in order to show you the most relevant stack trace.

 If your span timeline is colorful, it’s indicative of a distributed trace. Services in a distributed trace are separated by color and listed in the order they occur.

 

 Don’t forget, a distributed trace includes more than one transaction. When viewing these distributed traces in the timeline waterfall, you’ll see this 

 icon, which indicates the next transaction in the trace. These transactions can be expanded and viewed in detail by clicking on them.

After exploring these traces, you can return to the full trace by clicking View full trace in the upper right hand corner of the page

 

 

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!