分类 IT运维 下的文章

Replica Errors During Replication

If a statement produces the same error (identical error code) on both the source and the replica, the error is logged, but replication continues.
主从都出现相同的错误,同步继续

If a statement produces different errors on the source and the replica, the replication SQL thread terminates, and the replica writes a message to its error log and waits for the database administrator to decide what to do about the error. This includes the case that a statement produces an error on the source or the replica, but not both. To address the issue, connect to the replica manually and determine the cause of the problem. SHOW SLAVE STATUS is useful for this. Then fix the problem and run START SLAVE. For example, you might need to create a nonexistent table before you can start the replica again.

主从产生不同的错误,复制语句中断,写入错误日志并等待管理员操作,错误在主或者从产生,需要手动连接到备库并处理。 查看状态SHOW SLAVE STATUS;启动 START SLAVE

If a temporary error is recorded in the replica's error log, you do not necessarily have to take any action suggested in the quoted error message. Temporary errors should be handled by the client retrying the transaction. For example, if the replication SQL thread records a temporary error relating to a deadlock, you do not need to restart the transaction manually on the replica, unless the replication SQL thread subsequently terminates with a nontemporary error message.

临时错误不需要处理,例如死锁

If this error code validation behavior is not desirable, some or all errors can be masked out (ignored) with the --slave-skip-errors option.
如果不需要验证数据,可以使用--slave-skip-errors

For nontransactional storage engines such as MyISAM, it is possible to have a statement that only partially updates a table and returns an error code. This can happen, for example, on a multiple-row insert that has one row violating a key constraint, or if a long update statement is killed after updating some of the rows. If that happens on the source, the replica expects execution of the statement to result in the same error code. If it does not, the replication SQL thread stops as described previously.
非事务数据库中, 如果有sql导致错误,主从错误不一致,会停止同步进程。

If you are replicating between tables that use different storage engines on the source and replica, keep in mind that the same statement might produce a different error when run against one version of the table, but not the other, or might cause an error for one version of the table, but not the other. For example, since MyISAM ignores foreign key constraints, an INSERT or UPDATE statement accessing an InnoDB table on the source might cause a foreign key violation but the same statement performed on a MyISAM version of the same table on the replica would produce no such error, causing replication to stop.

主从表用不同的存储引擎,可能导致主从停止
Stored functions execute with the same NOW() value as the calling statement. However, this is not true of stored procedures.
存储函数now() 调用相同的语句,但是存储过程不会。
Deterministic loadable functions must be applied on the replicas.
确定性函数

Table definitions must be (nearly) identical on source and replica. See Section 16.4.1.10, “Replication with Differing Table Definitions on Source and Replica”, for more information.
主从表要一样

Advantages of row-based replication
All changes can be replicated. This is the safest form of replication.
所有更改都能复制

Statements that update the information in the mysql system database, such as GRANT, REVOKE and the manipulation of triggers, stored routines (including stored procedures), and views, are all replicated to replicas using statement-based replication.

For statements such as CREATE TABLE ... SELECT, a CREATE statement is generated from the table definition and replicated using statement-based format, while the row insertions are replicated using row-based format.
DDL 主要从语句复制、DML主要是行复制
Fewer row locks are required on the source, which thus achieves higher concurrency, for the following types of statements:
以下语句实现高并发产生更少的行锁
INSERT ... SELECT

INSERT statements with AUTO_INCREMENT

UPDATE or DELETE statements with WHERE clauses that do not use keys or do not change most of the examined rows.
Fewer row locks are required on the replica for any INSERT, UPDATE, or DELETE statement.

Disadvantages of row-based replication
行复制的缺点
RBR can generate more data that must be logged. To replicate a DML statement (such as an UPDATE or DELETE statement), statement-based replication writes only the statement to the binary log. By contrast, row-based replication writes each changed row to the binary log. If the statement changes many rows, row-based replication may write significantly more data to the binary log; this is true even for statements that are rolled back. This also means that making and restoring a backup can require more time. In addition, the binary log is locked for a longer time to write the data, which may cause concurrency problems. Use binlog_row_image=minimal to reduce the disadvantage considerably.
大量数据操作的DML语句会产生更多的BINLOG,备份和恢复需要更多的时间,锁定BINLOG的时间更长,无法写入数据,可能影响并发性能,binlog_row_image=minimal 可以大大减少缺点。但是生产系统上不建议使用,原因有2,一是大表更新过程中取消会有问题,闪回等功能无法使用。因为该参数省略了修改前结果的保存。

原理:

Replication is based on the replication source server keeping track of all changes to its databases (updates, deletes, and so on) in its binary log. The binary log serves as a written record of all events that modify database structure or content (data) from the moment the server was started. Typically, SELECT statements are not recorded because they modify neither database structure nor content.
复制是基于源库BINLOG中记录的所有改变。

Each replica that connects to the source requests a copy of the binary log. That is, it pulls the data from the source, rather than the source pushing the data to the replica. The replica also executes the events from the binary log that it receives. This has the effect of repeating the original changes just as they were made on the source. Tables are created or their structure modified, and data is inserted, deleted, and updated according to the changes that were originally made on the source.
每个恢复连接进程请求一份源的BINARY LOG复制,这是备从源拉取,而不是源主动推送的,备考重做源库的所有操作。

Because each replica is independent, the replaying of the changes from the source's binary log occurs independently on each replica that is connected to the source. In addition, because each replica receives a copy of the binary log only by requesting it from the source, the replica is able to read and update the copy of the database at its own pace and can start and stop the replication process at will without affecting the ability to update to the latest database status on either the source or replica side.
由于恢复的I/O进程是从库主动发起的,主库和从库是独立的,主从不会影响主库和从库的正常运行。

复制方式(格式)

Replication works because events written to the binary log are read from the source and then processed on the replica. The events are recorded within the binary log in different formats according to the type of event. The different replication formats used correspond to the binary logging format used when the events were recorded in the source's binary log. The correlation between binary logging formats and the terms used during replication are:
一般情况用混合复制,特殊情况,可以根据需要指定基于行的或者语句的复制方式。
Advantages of statement-based replication
Disadvantages of statement-based replication

语句复制的优点

Proven technology.
成熟的技术
Less data written to log files. When updates or deletes affect many rows, this results in much less storage space required for log files. This also means that taking and restoring from backups can be accomplished more quickly.
当更新、删除多个行时,需要更少的空间写入日志文件,同时意味着更高效的备份和恢复
Log files contain all statements that made any changes, so they can be used to audit the database.
可以用于数据库审计

语句复制的缺点

Statements that are unsafe for SBR. Not all statements which modify data (such as INSERT DELETE, UPDATE, and REPLACE statements) can be replicated using statement-based replication. Any nondeterministic behavior is difficult to replicate when using statement-based replication. Examples of such Data Modification Language (DML) statements include the following:
不支持全部的sql语句
A statement that depends on a loadable function or stored program that is nondeterministic, since the value returned by such a function or stored program or depends on factors other than the parameters supplied to it. (Row-based replication, however, simply replicates the value returned by the function or stored program, so its effect on table rows and data is the same on both the source and replica.) See Section 16.4.1.16, “Replication of Invoked Features”, for more information.
例如函数和存储过程
DELETE and UPDATE statements that use a LIMIT clause without an ORDER BY are nondeterministic. See Section 16.4.1.17, “Replication and LIMIT”.
使用limit但未包含order by的语句
Deterministic loadable functions must be applied on the replicas.
复制到备库的确定性的函数
不支持的函数:
LOAD_FILE()

UUID(), UUID_SHORT()

USER()

FOUND_ROWS()

SYSDATE() (unless both the source and the replica are started with the --sysdate-is-now option)

GET_LOCK()

IS_FREE_LOCK()

IS_USED_LOCK()

MASTER_POS_WAIT()

RAND()

RELEASE_LOCK()

SLEEP()

VERSION()
语句使用基于sql的复制,失败的报错如下;

[Warning] Statement is not safe to log in statement format.

INSERT ... SELECT requires a greater number of row-level locks than with row-based replication.
INSERT ... SELECT 比基于行的复制产生更多的行级锁

UPDATE statements that require a table scan (because no index is used in the WHERE clause) must lock a greater number of rows than with row-based replication.
没有索引的update 语句会产生大量行锁
For InnoDB: An INSERT statement that uses AUTO_INCREMENT blocks other nonconflicting INSERT statements.
innodb中,自增的insert会阻塞其他的insert

For complex statements, the statement must be evaluated and executed on the replica before the rows are updated or inserted. With row-based replication, the replica only has to modify the affected rows, not execute the full statement.
对于复杂语句,需要影响更多的行,而基于行的复制,只影响被修改的行。

Advantages of row-based replication
Disadvantages of row-based replication
行复制的缺点

https://www.processon.com/view/link/60e4f6d65653bb46e90ece0b
参考:https://dev.mysql.com/doc/refman/5.7/en/replication-implementation.html

1.查看所有数据库容量大小

select
table_schema as '数据库',
sum(table_rows) as '记录数',
sum(truncate(data_length/1024/1024, 2)) as '数据容量(MB)',
sum(truncate(index_length/1024/1024, 2)) as '索引容量(MB)'
from information_schema.tables
group by table_schema
order by sum(data_length) desc, sum(index_length) desc;
  

2.查看所有数据库各表容量大小

select
table_schema as '数据库',
table_name as '表名',
table_rows as '记录数',
truncate(data_length/1024/1024, 2) as '数据容量(MB)',
truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
order by data_length desc, index_length desc;
  

3.查看指定数据库容量大小
例:查看mysql库容量大小

select
table_schema as '数据库',
sum(table_rows) as '记录数',
sum(truncate(data_length/1024/1024, 2)) as '数据容量(MB)',
sum(truncate(index_length/1024/1024, 2)) as '索引容量(MB)'
from information_schema.tables
where table_schema='mysql';
  

4.查看指定数据库各表容量大小
例:查看mysql库各表容量大小

select
table_schema as '数据库',
table_name as '表名',
table_rows as '记录数',
truncate(data_length/1024/1024, 2) as '数据容量(MB)',
truncate(index_length/1024/1024, 2) as '索引容量(MB)'
from information_schema.tables
where table_schema='mysql'
order by data_length desc, index_length desc;

概述

数据库的性能瓶颈通常是由于某些sql语句效率不高,在大数据量或者高并发的情况下,会拖慢整体性能。要找到这些语句可以通过查找报告中的“SQL Statistics”章节,此章节按各种资源分别列出对资源消耗最严重的SQL语句,并显示它们所占统计期内全部资源的比例,这给出我们调优指南。例如在一个系统中,CPU资源是系统性能瓶颈所在,那么优化buffer gets最多的SQL语句将获得最大效果。在一个I/O等待是最严重事件的系统中,调优的目标应该是physical IOs最多的SQL语句。

awr报告中SQL优化主要查看如下内容:

SQL ordered by Elapsed Time
总时间消耗最长的sql语句

SQL ordered by CPU TimeCPU
总耗时最长的sql语句

SQL ordered by Gets
读取缓存最多的sql语句

常见sql调优方法:

1、根据查询条件创建索引。
2、用exists、not exists替代in和not in。
3、用表连接代替exists 。
4、减少order by 、union的使用,这两个关键字都很耗性能。
5、避免使用distinct,效率较低。
6、select 避免使用* 。
7、减少访问数据库的次数,绑定变量,预处理。
8、用where子句代替having子句。
9、避免在索引列上使用is null 和is not null。将会使索引失效。
10、避免在索引列上使用计算。如select…from pay where money*10>100

Cache Sizes
shared pool主要包括library cache和dictionary cache。library cache用来存储最近解析(或编译)后SQL、PL/SQL和Java classes等。library cache用来存储最近引用的数据字典。发生在library cache或dictionary cache的cache miss代价要比发生在buffer cache的代价高得多。因此shared pool的设置要确保最近使用的数据都能被cache。

Load Profile
显示数据库负载概况,将之与基线数据比较才具有更多的意义,如果每秒或每事务的负载变化不大,说明应用运行比较稳定。单个的报告数据只说明应用的负载情况,绝大多数据并没有一个所谓“正确”的值,然而Logons大于每秒1~2个、Hard parses大于每秒100、全部parses超过每秒300表明可能有争用问题。
Redo size:每秒/每事务产生的redo大小(单位字节),可标志数据库任务的繁重程序。
Logical reads:每秒/每事务逻辑读的块数
Block changes:每秒/每事务修改的块数
Physical reads:每秒/每事务物理读的块数
Physical writes:每秒/每事务物理写的块数
User calls:每秒/每事务用户call次数
Parses:SQL解析的次数
Hard parses:其中硬解析的次数,硬解析太多,说明SQL重用率不高。
Sorts:每秒/每事务的排序次数
Logons:每秒/每事务登录的次数
Executes:每秒/每事务SQL执行次数
Transactions:每秒事务数
Blocks changed per Read:表示逻辑读用于修改数据块的比例
Recursive Call:递归调用占所有操作的比率
Rollback per transaction:每事务的回滚率
Rows per Sort:每次排序的行数
注:
Oracle的硬解析和软解析
  提到软解析(soft parse)和硬解析(hard parse),就不能不说一下Oracle对sql的处理过程。当你发出一条sql语句交付Oracle,在执行和获取结果前,Oracle对此sql将进行几个步骤的处理过程:
  1、语法检查(syntax check)
  检查此sql的拼写是否语法。
  2、语义检查(semantic check)
  诸如检查sql语句中的访问对象是否存在及该用户是否具备相应的权限。
  3、对sql语句进行解析(parse)
  利用内部算法对sql进行解析,生成解析树(parse tree)及执行计划(execution plan)。
  4、执行sql,返回结果(execute and return)
  其中,软、硬解析就发生在第三个过程里。
  Oracle利用内部的hash算法来取得该sql的hash值,然后在library cache里查找是否存在该hash值;
  假设存在,则将此sql与cache中的进行比较;
  假设“相同”,就将利用已有的解析树与执行计划,而省略了优化器的相关工作。这也就是软解析的过程。
  诚然,如果上面的2个假设中任有一个不成立,那么优化器都将进行创建解析树、生成执行计划的动作。这个过程就叫硬解析。
创建解析树、生成执行计划对于sql的执行来说是开销昂贵的动作,所以,应当极力避免硬解析,尽量使用软解析。

Instance Efficiency Percentages (Target 100%)

本节包含了Oracle关键指标的内存命中率及其它数据库实例操作的效率。
Buffer Nowait表示在内存获得数据的未等待比例。
buffer hit表示进程从内存中找到数据块的比率,监视这个值是否发生重大变化比这个值本身更重要。对于一般的OLTP系统,如果此值低于80%,应该给数据库分配更多的内存。
Redo NoWait表示在LOG缓冲区获得BUFFER的未等待比例。如果太低(可参考90%阀值),考虑增加LOG BUFFER。
library hit表示Oracle从Library Cache中检索到一个解析过的SQL或PL/SQL语句的比率,当应用程序调用SQL或存储过程时,Oracle检查Library Cache确定是否存在解析过的版本,如果存在,Oracle立即执行语句;如果不存在,Oracle解析此语句,并在Library Cache中为它分配共享SQL区。低的library hit ratio会导致过多的解析,增加CPU消耗,降低性能。如果library hit ratio低于90%,可能需要调大shared pool区。
Latch Hit:Latch是一种保护内存结构的锁,可以认为是SERVER进程获取访问内存数据结构的许可。要确保Latch Hit>99%,否则意味着Shared Pool latch争用,可能由于未共享的SQL,或者Library Cache太小,可使用绑定变更或调大Shared Pool解决。
Parse CPU to Parse Elapsd:解析实际运行时间/(解析实际运行时间+解析中等待资源时间),越高越好。
Non-Parse CPU :SQL实际运行时间/(SQL实际运行时间+SQL解析时间),太低表示解析消耗时间过多。
Execute to Parse:是语句执行与分析的比例,如果要SQL重用率高,则这个比例会很高。该值越高表示一次解析后被重复执行的次数越多。
In-memory Sort:在内存中排序的比率,如果过低说明有大量的排序在临时表空间中进行。考虑调大PGA。
Soft Parse:软解析的百分比(softs/softs+hards),近似当作sql在共享区的命中率,太低则需要调整应用使用绑定变量。

Shared Pool Statistics
Memory Usage %:对于一个已经运行一段时间的数据库来说,共享池内存使用率,应该稳定在75%-90%间,如果太小,说明Shared Pool有浪费,而如果高于90,说明共享池中有争用,内存不足。
SQL with executions>1:执行次数大于1的sql比率,如果此值太小,说明需要在应用中更多使用绑定变量,避免过多SQL解析。
Memory for SQL w/exec>1:执行次数大于1的SQL消耗内存的占比。

Top 5 Timed Events
显示了系统中最严重的5个等待,按所占等待时间的比例倒序列示。当我们调优时,总希望观察到最显著的效果,因此应当从这里入手确定我们下一步做什么。例如如果‘buffer busy wait’是较严重的等待事件,我们应当继续研究报告中Buffer Wait和File/Tablespace IO区的内容,识别哪些文件导致了问题。如果最严重的等待事件是I/O事件,我们应当研究按物理读排序的SQL语句区以识别哪些语句在执行大量I/O,并研究Tablespace和I/O区观察较慢响应时间的文件。如果有较高的LATCH等待,就需要察看详细的LATCH统计识别哪些LATCH产生的问题。
在这里,log file parallel write是相对比较多的等待,占用了7%的CPU时间。
通常,在没有问题的数据库中,CPU time总是列在第一个。

概述

性能调优最重要的是通过检查、观察和分析找出性能瓶颈,从而对症下药,解决性能问题。通常引起性能问题的情况如下:
1、中间件、数据库的参数配置不当。
2、jdk选择错误。
3、数据库负载过高,通常是数据量大或者并发量大程序sql较慢引起。
4、Weblogic等负载过高,在参数已调整情况下,通常是由于客户端、服务端交互过多,或者服务端程序存在死循环等问题引起。
5、网络带宽限制,带宽较小,服务端和客户端传输数据量较大,导致等待时间较长。
6、网络问题,网络存在问题,如丢包,无法连接等问题也会导致长时间请求无响应的性能问题。
7、客户端性能不佳时,杀毒软件会严重影响性能,导致客户端运行效率低下。
8、服务器或者客户端机器性能低下,无法支撑系统的正常运行,只能通过更换新机器或者服务器集群来解决。