[Eisfair] Fehlermeldung beim Einsatz von rsync - NEU
Jürgen Witt
j-witt at web.de
Mi Mär 13 04:33:00 CET 2019
Hallo NG,
ich habe vor wenigen Tagen zu dem Thema im Betreff hier schon etwas
geschrieben und war guter Hoffnung, daß das Thema erledigt wäre. Stimmt
aber leider nicht.
Der rsync der Daten hat nur genau 1x funktioniert Und zwar nachdem ich
den Ordner, in dem es zu dem Abbruch kam (TDAMP), auf dem
Ziel/Backup-Server in TDAMP- umbenannt und danach den Ordner TDAMP neu
angelegt habe. Am folgenden Abend ist es wieder mit der selben
Fehlermeldung in die Hose gegangen. Aber auch nach dem Austausch der
defekten Festplatte (1 pending sector) heute, ist die Sicherung per
rsync wieder an gleicher Stelle/Datei abgebrochen.
Email vom Backup-Server eis mit dem Betreff "fcron <root at eis>
/usr/bin/server-sichern"
rsync: connection unexpectedly closed (2266062707 bytes received so far)
[receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(235)
[receiver=3.1.3]
rsync: connection unexpectedly closed (32617488 bytes received so far)
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(235)
[generator=3.1.3]
Neue Platte /dev/sdb:
eis # smartctl -a /dev/sdb
smartctl 7.0 2018-12-30 r4883 [i686-linux-3.16.62-eisfair-1-PAE] (local
build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD20EFRX-68EUZN0
Serial Number: WD-WCC4M6NCH89F
LU WWN Device Id: 5 0014ee 2bb0ff2cc
Firmware Version: 82.00A82
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Mar 13 02:22:50 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection:
Enabled.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 1
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age
Always - 17
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 1
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 0
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always
- 1
194 Temperature_Celsius 0x0022 118 116 000 Old_age Always
- 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age
Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 17
-
# 2 Short offline Completed without error 00% 17
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Vorhandene Festplatte /dev/sda, die mit /dev/sdb ein Software-Raid-1 bildet
eis # smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [i686-linux-3.16.62-eisfair-1-PAE] (local
build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD20EFRX-68EUZN0
Serial Number: WD-WCC4M4FF0JJZ
LU WWN Device Id: 5 0014ee 2b92310de
Firmware Version: 82.00A82
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Wed Mar 13 02:26:24 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x85) Offline data collection activity
was aborted by an interrupting
command from host.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
Always - 0
3 Spin_Up_Time 0x0027 177 177 021 Pre-fail
Always - 4125
4 Start_Stop_Count 0x0032 100 100 000 Old_age
Always - 10
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age
Always - 0
9 Power_On_Hours 0x0032 081 081 000 Old_age
Always - 13892
10 Spin_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 10
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
- 3
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always
- 292
194 Temperature_Celsius 0x0022 116 108 000 Old_age Always
- 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
- 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
- 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
- 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 13892
-
# 2 Short offline Completed without error 00% 13892
-
# 3 Short offline Completed without error 00% 13869
-
# 4 Short offline Completed without error 00% 13845
-
# 5 Short offline Completed without error 00% 13821
-
# 6 Extended offline Completed without error 00% 13803
-
# 7 Short offline Completed without error 00% 13797
-
# 8 Short offline Completed without error 00% 13773
-
# 9 Short offline Completed without error 00% 13749
-
#10 Short offline Completed without error 00% 13725
-
#11 Short offline Completed without error 00% 13701
-
#12 Short offline Completed without error 00% 13677
-
#13 Short offline Completed without error 00% 13653
-
#14 Extended offline Completed without error 00% 13635
-
#15 Short offline Completed without error 00% 13629
-
#16 Short offline Completed without error 00% 13605
-
#17 Short offline Completed without error 00% 13581
-
#18 Short offline Completed without error 00% 13557
-
#19 Short offline Completed without error 00% 13533
-
#20 Short offline Completed without error 00% 13437
-
#21 Short offline Completed without error 00% 13414
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Per cron wird um 23:30 Uhr auf dem Server eis (10.20.3.100) täglich das
Script server-sichern gestartet.
eis # cat /usr/bin/server-sichern
#! /bin/bash
ssh 10.20.3.101 touch /public/TDAMP/DS/end.txt
ssh 10.20.3.101 sleep 45
rsync -avzu --delete 10.20.3.101:/home /data >/var/log/rsync-home_log
rsync -avzu --delete 10.20.3.101:/public /data >/var/log/rsync-public_log
rsync -avzu --delete 10.20.3.101:/var/www/htdocs/webdav /var/www/htdocs
>/var/log/rsync-webdav_log
ssh 10.20.3.101 rm /public/TDAMP/DS/end.txt
rm -f /public/TDAMP/DS/end.txt
/usr/bin/server-sichern-report
Dieses Script sorgt für den Report, der per Email zugestellt wird.
eis # cat /usr/bin/server-sichern-report
#!/bin/sh
( echo "To: root"
echo "Subject: Praxis Borgweg Abgleich Server + Backup-Server"
echo "Groesse und Datum der log-Dateien"
ls -l /var/log/rsync-*
echo "----------------------------------------"
echo "Sicherung von /home"
# cat /var/log/rsync-home_log
echo "-----------------------------------------"
echo "Zeitpunkt der Sicherung"
ls -la /var/log/rsync-home_log
echo "-----------------------------------------"
echo "Sicherung von /public"
cat /var/log/rsync-public_log
echo "-----------------------------------------"
echo "Zeitpunkt der Sicherung"
ls -la /var/log/rsync-public_log
echo "-----------------------------------------"
) | /usr/lib/sendmail root
Ergebnis einer unvollständigen Sicherung:
Anfang des zugesendeten Report:
Groesse und Datum der log-Dateien
-rw-r--r-- 1 root root 3314 Mar 12 23:30 /var/log/rsync-home_log
-rw-r--r-- 1 root root 186499 Mar 13 00:33 /var/log/rsync-public_log
-rw-r--r-- 1 root root 174 Mar 13 00:33 /var/log/rsync-webdav_log
...
public/TDAMP/DS/daten/prax1/PSI.DBF
public/TDAMP/DS/daten/prax1/QMETA.DBF
public/TDAMP/DS/daten/prax1/QMETA.FPT
-----------------------------------------
Zeitpunkt der Sicherung
-rw-r--r-- 1 root root 186499 Mar 13 00:33 /var/log/rsync-public_log
Hier endet der Report.
Der zugesendete Report endet wieder mit der selben Datei QMETA.FPT in
einem Unterordner von public. Diese Datei ist 2700 MB groß. Der Report
des dritten rsync-Prozesses (webdav) wird nicht gesendet, obwohl er ja
vorhanden ist und erst nach dem Abruch des zweiten rsync-Prozesses
(public) ausgeführt wird.
Auszug aus /var/log/messages auf dem Produktiv-Server debian (10.20.30.101)
Mar 12 23:30:01 debian sshd[30074]: Accepted publickey for root from
10.20.3.100 port 48907 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:01 debian sshd[30074]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 12 23:30:01 debian sshd[30074]: Received disconnect from 10.20.3.100
port 48907:11: disconnected by user
Mar 12 23:30:01 debian sshd[30074]: Disconnected from user root
10.20.3.100 port 48907
Mar 12 23:30:01 debian sshd[30074]: pam_unix(sshd:session): session
closed for user root
Mar 12 23:30:01 debian sshd[30233]: Accepted publickey for root from
10.20.3.100 port 48908 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:01 debian sshd[30233]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 12 23:30:46 debian sshd[30233]: Received disconnect from 10.20.3.100
port 48908:11: disconnected by user
Mar 12 23:30:46 debian sshd[30233]: Disconnected from user root
10.20.3.100 port 48908
Mar 12 23:30:46 debian sshd[30233]: pam_unix(sshd:session): session
closed for user root
Mar 12 23:30:47 debian sshd[31049]: Accepted publickey for root from
10.20.3.100 port 48910 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:47 debian sshd[31049]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 12 23:30:53 debian sshd[31049]: Received disconnect from 10.20.3.100
port 48910:11: disconnected by user
Mar 12 23:30:53 debian sshd[31049]: Disconnected from user root
10.20.3.100 port 48910
Mar 12 23:30:53 debian sshd[31049]: pam_unix(sshd:session): session
closed for user root
Mar 12 23:30:53 debian sshd[31109]: Accepted publickey for root from
10.20.3.100 port 48911 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 12 23:30:53 debian sshd[31109]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 13 00:33:12 debian sshd[31109]: Received disconnect from 10.20.3.100
port 48911:11: disconnected by user
Mar 13 00:33:12 debian sshd[31109]: Disconnected from user root
10.20.3.100 port 48911
Mar 13 00:33:12 debian sshd[31109]: pam_unix(sshd:session): session
closed for user root
Mar 13 00:33:14 debian sshd[25188]: Accepted publickey for root from
10.20.3.100 port 48976 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 13 00:33:14 debian sshd[25188]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 13 00:33:14 debian sshd[25188]: Received disconnect from 10.20.3.100
port 48976:11: disconnected by user
Mar 13 00:33:14 debian sshd[25188]: Disconnected from user root
10.20.3.100 port 48976
Mar 13 00:33:14 debian sshd[25188]: pam_unix(sshd:session): session
closed for user root
Mar 13 00:33:15 debian sshd[25242]: Accepted publickey for root from
10.20.3.100 port 48977 ssh2: RSA
SHA256:We20RCnHw5WERqLEu5nbmonNrhYB/AgkeIT+yGXBFLY
Mar 13 00:33:15 debian sshd[25242]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Mar 13 00:33:15 debian sshd[25242]: Received disconnect from 10.20.3.100
port 48977:11: disconnected by user
Mar 13 00:33:15 debian sshd[25242]: Disconnected from user root
10.20.3.100 port 48977
Mar 13 00:33:15 debian sshd[25242]: pam_unix(sshd:session): session
closed for user root
Mar 13 00:39:32 debian su: pam_unix(su:session): session closed for user
root
Auszug aus /var/log/messages auf dem Backup-Server eis (10.20.3.100)
Mar 12 23:30:00 eis fcron[9517]: Job '/usr/bin/server-sichern' started
for user root (pid 9526)
Mar 13 00:05:00 eis fcron[11911]: Job '/var/install/bin/smartmon-plot'
started for user root (pid 119
Mar 13 00:05:10 eis fcron[11911]: Job '/var/install/bin/smartmon-plot'
completed
Mar 13 00:25:02 eis su: (to root) root on none
Mar 13 00:25:02 eis su: pam_unix(su:session): session opened for user
root by (uid=0)
Mar 13 00:33:16 eis fcron[9517]: Job '/usr/bin/server-sichern' completed
(mailing output)
Während ich das hier schreibe, habe ich auf dem Backup-Server wieder den
Ordner TDAMP in TDAMP- umbenannt, danach TDAMP wieder neu angelegt und
mein Script server-sichern neu gestartet. Nun sind wieder knapp 86000 MB
(so groß ist der Ordner TDAMP auf dem Produktiv-Server) per rsync auf
den Backup-Server zu holen. Inzwischen sind 57000 MB übertragen und die
Datei QMETA.FPT (die letzte Zeile im rsync-Report, wenn es bisher
fehlgeschlagen ist) ist dabei. Sieht aus, als wenn es jetzt
funktionieren könnte.
Verstehe aber nicht, weshalb die Sicherung per rsync auch mit heilen
Festplatten im Backup-Server nicht funktioniert hat.
Hat jemand eine Idee dazu?
Mehr Informationen über die Mailingliste Eisfair