Overview on the Graylog
Graylog is a fully integrated open source log management platform for collecting, indexing, and analyzing both structured and unstructured data from almost any source.
If you need to make an analysis of logs, note that there is an open source tool called Graylog which can collect, index and analyze structured and unstructured data from various sources.
1. Started by Lennart Koopmann in his free time in 2010 (Graylog2 at that time)
2. TORCH GmbH founded as company behind Graylog in late 2012
3. Big rewrite that got released as 0.20 in Feb 2014
4. New US based company Graylog, Inc. founded in Jan 2015
5. Renamed from Graylog2 to Graylog
6. Graylog 1.0 release in Feb 2015
Configuration management tools allow us to manage our computing resources in an effective and consistent way .
They make it easy to run hundreds or thousands of machines without having to manually execute the same tasks over and over again.
By using shared modules/cookbooks it is pretty easy to end up with hundreds of managed resources like files, packages and services per node .
Nodes can be configured to check for updates and to apply new changes automatically.
This helps us to roll out changes to lots of nodes very easily but also makes it possible to quickly break our infrastructure resulting in outages.
So being able to collect, analyze, and monitor all events that happen sounds like Graylog.
Levels of Log Management:
Log management can able to grep maximum data over flat files, it is stored on its host computer system as an ordinary “flat file”.
To access the structure of the data and manipulate it.
Log management can be done on different levels:
Level1: Do not collect logs at all.
Level2: Collect logs. Mostly simple log files from email or HTTP servers.
Level3: Use the logs for forensics and troubleshooting. And also why email not sent out? Why was that HTTP 500 thrown?
Level4: Save searches. The most basic case would be to save a grep command you used.
Level5: share searches. Store that search command somewhere so co-workers can find and use it to solve similar problems.
Level6: Reporting. Easily generate reports from your logs. How many exceptions did we have this week, how many past weeks. People can use Charts, PDF’s.
Level7: Alerting. Automate some of your troubleshooting tasks. Be warned automatically instead of waiting for a user to complain.
Level8: Collect more logs. We may need more log sources for some use cases. Firewalls logs, Router logs, even physical access log.
Level9: Correlation. Manual analysis of all that new data may take too long. Correlate different sources.
Level10: Visual analysis, Pattern detection, interaction visualization, dynamic queries, anomaly detection, sharing and more sharing.
Then we need a central placed to send your logs, for this introduced
Then we need a central placed to make use of those logs, for this introduced
How to send logs:
Classic syslog via TCP/UDP
GELF via TDP/UDP
Both via AMQP or write your own input plugins.
GELF: Graylog Extended Log Format-Lets you structure your logs.
Many libraries for different systems and languages available.
‘short_message’: ’Something went wrong’,
‘facility’:’ some subsystem’,
‘full_message’: ’stacktrace and stuff’,
‘file’: ’some controller.rb’,
Log messages types:
There are 2 types of log messages.
Type1: Automatically generated from a service. Usually huge amount of structured but raw data. You have only limited control about what is logged.
Type2: Logs directly sent from within your applications. Triggered for example by a log.error() call or an exception catcher. Possible to send highly structured via GELF.
As presented in the above Graylog architecture, it is depending on components.
1. ElasticSearch: ElasticSearch is useful for storing logs and searching text.
2. MongoDB: MongoDB is useful for Metadata Management.
3. Graylog: Graylog can help you to better understand the use made within your applications, improve their security, and reduce costs.
There are a few rules of thumb when scaling resources for Graylog:
1. graylog-server nodes should have a focus on CPU power.
2. Elasticsearch nodes should have as much RAM as possible and the fastest disks you can get. Everything depends on I/O speed here.
3. MongoDB is only being used to store configuration and the dead letter messages, and can be sized fairly small.
4. graylog-web-interface nodes are mostly waiting for HTTP answers of the rest of the system and can also be rather small.
5. graylog-radio nodes act as workers. They don’t know each other and you can shut them down at any point in time without changing the cluster state at all.
Also keep in mind that messages are only stored in Elasticsearch. If you have data loss on Elasticsearch, the messages are gone – except if you have created backups of the indices.
MongoDB is only storing meta information and will be abstracted with a general database layer in future versions. This will allow you to use other databases like MySQL instead.
This is a minimum Graylog setup that can be used for smaller, non-critical, or test setups. None of the components is redundant but it is easy and quick to setup.
Bigger Production Setup:
This is a setup for bigger production environments. It has several graylog-server nodes behind a load balancer that share the processing load. The load balancer can ping the graylog-server nodes via REST/HTTP to check if they are alive and take dead nodes out of the cluster.
Refer Links: 1.Installation Steps of Graylog-Part1