During
the development of performance metrics visualization projects in real-time, we have
made a few changes to deploy the performance metric collector agent (Telegraf.exe)
on multiple Windows Servers. In the development and testing phase, we often approach
new ideas and metrics requirements, so the configuration of Telegraf agent changes
often. To reflect those changes, the configuration file needs to be re-deployed
again and Agent service needs a restart.
Another
challenge is that we need to deploy the agent onto multiple servers without
performing RDP to the remote server. Every time the Telegraf configuration is
updated for any reason, the Telegraf configuration file needs to be replaced
with changes on the remote server too. The Telegraf service needs to be restarted
with the new configuration file for those changes to take effect.
So,
we developed a PowerShell based deployment Script to perform the entire task on
a remote Windows Server regardless of the number of Servers.
Suitability of this deployment
Script:
There
are many applications of this automated deployment script on a remote server.
1.
New deployment of the Telegraf agent.
2.
Upgrading the Telegraf agent.
3.
Restarting/Stopping/removing the Telegraf agent.
4.
Updating “telegraf.conf” file.
5.
Deploying an updated version of the Telegraf agent.
6.
Changing the Telegraf Agent’s data posting URL, and so on.
The
provided script can be changed and enhanced according to your specific needs.
The dashboard search
requirements:
Following
is the dashboard we have developed to monitor more than 300+ production SQL
Servers in real-time. To identify a server, we have a filter mechanism on the Grafana
UI so that a user/viewer can bring a specific server into viewing. In Grafana,
it is called “Grafana Templating”. To do this, we created three templating
variables which are “Server”, “AppEnv” and “App”.
Figure #1: A SQL Server
Dashboard:
Figure # 2: Another SQL Server Dashboard:
The section “global_tags” of the configuration file “telegraf.conf” of Telegraf Agent “Telegraf.exe” needs to be changed for each server and needs to be placed in the remote server so the Telegraf Agent can read the specific configuration associated with this server and send the data to InfluxDB.
The section “global_tags” of the configuration file “telegraf.conf” of Telegraf Agent “Telegraf.exe” needs to be changed for each server and needs to be placed in the remote server so the Telegraf Agent can read the specific configuration associated with this server and send the data to InfluxDB.
The telegraf “telegraf.conf”
file:
The
Telegraf “telegraf.conf” file has a specific section known as [global_tags]
which contains Key = “Value” format data. In our case, the “telegraf.conf” file
has been renamed as “telegraf.baseconf” with the following dummy tag name and
from this template configuration file we will derive the required “telegraf.conf”
for deployment.
[global_tags]
Appenv =
"AppEnvName"
App =
"AppName"
The
PowerShell Script will read each key and will then change according to the
passing server parameters which comes from a predefined CSV file. The components
of the CSV file looks as follows:
"Server","AppEnv","App"
"FIN001", "Finance", "Budget Reconciliation Application"
"FIN002", "Finance", "Payment Disbursement"
"HR0101", "HR", "Employee Performance"
"HR0102\SQL2014", "HR", "Vendor Management"
"HR0103", "HR", "CRM portal"
"HR0201\SHARE2013","HR", "HR SharePoint"
"HR0201","HR", "Insight the Organization"
"FIN001", "Finance", "Budget Reconciliation Application"
"FIN002", "Finance", "Payment Disbursement"
"HR0101", "HR", "Employee Performance"
"HR0102\SQL2014", "HR", "Vendor Management"
"HR0103", "HR", "CRM portal"
"HR0201\SHARE2013","HR", "HR SharePoint"
"HR0201","HR", "Insight the Organization"
PowerShell Script:
The
powerShell scripts “TelegrafDeployV3.ps1” has two functions:
Correct-SQLPerf
Deploy-Telegraf
Script download Link:
“Correct-SQLPerf”:
The
“Correct-SQLPerf” is to correct any missing performance counters on a SQL
Server instance. It has only one parameter which is “$SQLServer”. The value can be default or named instance of SQL Server. This
function is called automatically by the main function internally. You can also
execute it separately.
Function Correct-SQLPerf
{
[CmdletBinding()]
param
(
[Parameter(Mandatory =
$True)][string]$SQLServer
)
“Deploy-Telegraf”:
The
main function of the PowerShell script is “Deploy-Telegraf” which requires
three parameters.
Function Deploy-Telegraf
{
[CmdletBinding()]
param
(
[Parameter(Mandatory =
$True)][string]$SourceFolder,
[Parameter(Mandatory =
$True)][string]$DeployType,
# N = New deployment, U = update existing, R= remove
everything
[Parameter(Mandatory =
$True)][string[]]$SeverList
)
$SourceFolder is the local folder from
where the telegraf will be deployed.
$DeployType is the option for
deployment type.
$SeverList Is the list of Server. The format
of this column is “WindowsServerName\SQLInstanceName”
How to deploy?
Make
sure that the WInRM Service is running on the Remote server. To deploy Telegraf
onto one or multiple remote servers, do the following:
1. The Script will create a
folder called “C:\PerfMonSQL” on a remote computer.
2. Create a folder on your
local desktop, such as “D:\TelefragDeploy”.
3. Copy the “telegraf.exe” into
this folder.
4. Create a text file “serverList.txt”
(CSV) with a list of servers, application environment and application name. An
environment can have multiple applications and an application may have one or
more servers.
5. Copy the telegraf “telegraf.conf”
as “telegraf.baseconf” into this folder.
6. Edit the “telegraf.baseconf”
and add the tags as appenv=”AppEnvName” and app=”AppName”.
The
“D:\TelegrafDeploy” folder should contain the following items:
Execute the Deployment Script:
To
execute the deployment script, open PowerShell ISE with administrator privileges.
Open the PowerShell Script “DeployTelegraf.ps1”. Change the three parameters
according to your needs and settings.
Example of Running the
Deployment Scripts:
To deploy
Telegraf Agent on a Remote Windows Server:
# Deploy telegraf for the first time on a (or multiple) remote
server
Deploy-Telegraf -DeployType "N"
`
-SourceFolder "D:\TelegrafDeploy\"
`
-SeverList "D:\TelegrafDeploy\ServerList.txt"
# Remove telegraf from a (or multiple) remote server
Deploy-Telegraf -DeployType "R"
`
-SourceFolder "D:\TelegrafDeploy\"
`
-SeverList "D:\TelegrafDeploy\ServerList.txt"
# Upgrade telegraf on a (or multiple) remote server
Deploy-Telegraf -DeployType "U"
`
-SourceFolder "D:\TelegrafDeploy\"
`
-SeverList "D:\TelegrafDeploy\ServerList.txt"
To
correct missing SQL PerfMon counters:
#Correct SQL Server missing Perfmon counter
Correct-SQLPerf -SQLServer "SHB_MAIN\SQL2016"
# this for named instance of SQL Server
Correct-SQLPerf -SQLServer "SHB_MAIN"
# this for default instance of SQL Server
Read and Learn More:
Grafana:
http://grafana.org/
Telegraf
and InfluxDB: https://www.influxdata.com/
Golden
Gate Latency monitoring with PowerShell, influxDB and Grafana – PowerShell to
InfluxDB
Sarjen, i've just been introduced to telegraf, grafana and influxdb. It seems that all of the MS Sql dashboards available for download don't name or use the same metrics that you have in your zip file. Is there any way to get a copy of the dashboard you created?
ReplyDelete