Discussion:
[gt-user] Transfer stats: format
Brock Palen
2014-09-11 19:17:45 UTC
Permalink
I made a logstash parser for transfer logs, but ran into issues because it appears that that there are Transfer stats
Entries that do not have full information:

Eg:
What I expect:
[15298] Wed Jul 16 11:08:42 2014 :: Transfer stats: DATE=20140716150842.633689 HOST=flux-xfer1.engin.umich.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140716150841.618581 USER=brockp FILE=/tmp/brockp/linux_x64_64_sfx.exe BUFFER=87380 BLOCK=262144 NBYTES=280590752 VOLUME=/ STREAMS=4 STRIPES=1 DEST=[141.212.30.10] TYPE=RETR CODE=226

What I find and choke on:
[22304] Mon Sep 8 16:45:52 2014 :: Transfer stats: DATE=20140908204552.177237 HOSTS=1944 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.212.30.10] TYPE=STOR CODE=226


What are the meanings of these two different formats? Is the first one, with the transfer size etc, the final summary of the transfer? Can I just ignore the others if I want to collect statistics on our gridftp servers?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
***@umich.edu
(734)936-1985
Brock Palen
2014-09-11 19:54:12 UTC
Permalink
Ok now I am really confused,

I have Transfer stats entries with:

START DATE NBYTES
DATE NBYTES
DATE

Why are there three types and why are some not complete information?

Im tracking the performance of our transfers, NYBTES/(DATE - START) but when they are not there kinda hard todo.

We are using the package:
globus-gridftp-server-6.43-1gt.x86_64

Part of the globusonline service.

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
***@umich.edu
(734)936-1985
Post by Brock Palen
I made a logstash parser for transfer logs, but ran into issues because it appears that that there are Transfer stats
[15298] Wed Jul 16 11:08:42 2014 :: Transfer stats: DATE=20140716150842.633689 HOST=flux-xfer1.engin.umich.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140716150841.618581 USER=brockp FILE=/tmp/brockp/linux_x64_64_sfx.exe BUFFER=87380 BLOCK=262144 NBYTES=280590752 VOLUME=/ STREAMS=4 STRIPES=1 DEST=[141.212.30.10] TYPE=RETR CODE=226
[22304] Mon Sep 8 16:45:52 2014 :: Transfer stats: DATE=20140908204552.177237 HOSTS=1944 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.212.30.10] TYPE=STOR CODE=226
What are the meanings of these two different formats? Is the first one, with the transfer size etc, the final summary of the transfer? Can I just ignore the others if I want to collect statistics on our gridftp servers?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Brock Palen
2014-09-16 19:53:48 UTC
Permalink
Is nobody able to comment on this?

Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
***@umich.edu
(734)936-1985
Post by Brock Palen
Ok now I am really confused,
START DATE NBYTES
DATE NBYTES
DATE
Why are there three types and why are some not complete information?
Im tracking the performance of our transfers, NYBTES/(DATE - START) but when they are not there kinda hard todo.
globus-gridftp-server-6.43-1gt.x86_64
Part of the globusonline service.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Post by Brock Palen
I made a logstash parser for transfer logs, but ran into issues because it appears that that there are Transfer stats
[15298] Wed Jul 16 11:08:42 2014 :: Transfer stats: DATE=20140716150842.633689 HOST=flux-xfer1.engin.umich.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140716150841.618581 USER=brockp FILE=/tmp/brockp/linux_x64_64_sfx.exe BUFFER=87380 BLOCK=262144 NBYTES=280590752 VOLUME=/ STREAMS=4 STRIPES=1 DEST=[141.212.30.10] TYPE=RETR CODE=226
[22304] Mon Sep 8 16:45:52 2014 :: Transfer stats: DATE=20140908204552.177237 HOSTS=1944 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.212.30.10] TYPE=STOR CODE=226
What are the meanings of these two different formats? Is the first one, with the transfer size etc, the final summary of the transfer? Can I just ignore the others if I want to collect statistics on our gridftp servers?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Michael Link
2014-09-16 21:42:33 UTC
Permalink
Hi Brock,

It appears that the second log you show is missing the middle. We don't
log a 'HOSTS' entry, so I believe that 'S' is the end of NBYTES.

However, I'm not aware of a bug like that or how that might happen. How
often do you see these corrupted or incomplete entries? Do you detect
any pattern, such as the timestamps of the nearby entries being the
same, or the pid changing right before or after? Can you show the
server config you're using?

Mike
Post by Brock Palen
Ok now I am really confused,
START DATE NBYTES
DATE NBYTES
DATE
Why are there three types and why are some not complete information?
Im tracking the performance of our transfers, NYBTES/(DATE - START) but when they are not there kinda hard todo.
globus-gridftp-server-6.43-1gt.x86_64
Part of the globusonline service.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Post by Brock Palen
I made a logstash parser for transfer logs, but ran into issues because it appears that that there are Transfer stats
[15298] Wed Jul 16 11:08:42 2014 :: Transfer stats: DATE=20140716150842.633689 HOST=flux-xfer1.engin.umich.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140716150841.618581 USER=brockp FILE=/tmp/brockp/linux_x64_64_sfx.exe BUFFER=87380 BLOCK=262144 NBYTES=280590752 VOLUME=/ STREAMS=4 STRIPES=1 DEST=[141.212.30.10] TYPE=RETR CODE=226
[22304] Mon Sep 8 16:45:52 2014 :: Transfer stats: DATE=20140908204552.177237 HOSTS=1944 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.212.30.10] TYPE=STOR CODE=226
What are the meanings of these two different formats? Is the first one, with the transfer size etc, the final summary of the transfer? Can I just ignore the others if I want to collect statistics on our gridftp servers?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Brock Palen
2014-09-18 19:38:04 UTC
Permalink
It appears to almost never happen eg:

[***@flux-xfer1 log]# grep "Transfer stats" gridftp-go.log > /tmp/go.log
[***@flux-xfer1 log]# grep NBYTES /tmp/go.log | wc -l
519370
[***@flux-xfer1 log]# grep START /tmp/go.log | wc -l
519370
[***@flux-xfer1 log]# grep DATE /tmp/go.log | wc -l
519372

So of all the entries there appears to be 2 that have DATE but no NBYTES or START

This is what they appear like, and this appears to be totally reasonable:

[7581] Wed Sep 17 12:35:23 2014 :: Transfer stats: DATE=20140917163523.993[7612] Wed Sep 17 12:35:24 2014 :: Starting to transfer "/scratch/dposselt_flux/hefei/fv.0720x0361.L30.ws.new.ushear.ununi.5.0.5/bld/history_defaults.mod".

[7652] Wed Sep 17 12:35:45 2014 :: Transfer stats: DATE=20140917163545.563085 HOST=flux-xfer1.engin.umich.edu PROG=globus-grid[7651] Wed Sep 17 12:35:45 2014 :: cli.globusonline.org:36507: [SERVER]: 150 Beginning transfer.

So why of the 519,370 transfers do we have these two odd balls?

BTW I attached what we created, there is more below, but we now know exactly where transfers are coming from, and can search and debug them quickly (almost real time).



Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
***@umich.edu
(734)936-1985
Post by Michael Link
Hi Brock,
It appears that the second log you show is missing the middle. We don't log a 'HOSTS' entry, so I believe that 'S' is the end of NBYTES.
However, I'm not aware of a bug like that or how that might happen. How often do you see these corrupted or incomplete entries? Do you detect any pattern, such as the timestamps of the nearby entries being the same, or the pid changing right before or after? Can you show the server config you're using?
Mike
Post by Brock Palen
Ok now I am really confused,
START DATE NBYTES
DATE NBYTES
DATE
Why are there three types and why are some not complete information?
Im tracking the performance of our transfers, NYBTES/(DATE - START) but when they are not there kinda hard todo.
globus-gridftp-server-6.43-1gt.x86_64
Part of the globusonline service.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Post by Brock Palen
I made a logstash parser for transfer logs, but ran into issues because it appears that that there are Transfer stats
[15298] Wed Jul 16 11:08:42 2014 :: Transfer stats: DATE=20140716150842.633689 HOST=flux-xfer1.engin.umich.edu PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20140716150841.618581 USER=brockp FILE=/tmp/brockp/linux_x64_64_sfx.exe BUFFER=87380 BLOCK=262144 NBYTES=280590752 VOLUME=/ STREAMS=4 STRIPES=1 DEST=[141.212.30.10] TYPE=RETR CODE=226
[22304] Mon Sep 8 16:45:52 2014 :: Transfer stats: DATE=20140908204552.177237 HOSTS=1944 VOLUME=/ STREAMS=2 STRIPES=1 DEST=[141.212.30.10] TYPE=STOR CODE=226
What are the meanings of these two different formats? Is the first one, with the transfer size etc, the final summary of the transfer? Can I just ignore the others if I want to collect statistics on our gridftp servers?
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
(734)936-1985
Loading...