APPE bug.

Think you've found a bug? Post a description here.

Moderator: Serin

Post Reply
mdj
Moderator
Posts: 656
Joined: Mon Aug 18, 2003 4:00 am
Location: Denmark
Contact:

APPE bug.

Post by mdj » Thu Aug 13, 2009 3:23 am

I *think* I have found another bug, this is gonna be long, I am afraid...

The setup: I have a 512kbps DSL link and a netgear router with the habbit of closing idle connections after 10 mintes. I try to upload a huge file (0.5-1GB) to my ftp, using FileZilla. Here is the first extract from the log, cleaned for irrelevant entries.

[2009-08-12 21:00:51]:CONNECT [ 161] - Incoming connection request on interface 192.168.1.100
[2009-08-12 21:00:51]:CONNECT [ 161] - Connection request accepted from 87.x.x.x
[2009-08-12 21:00:52]:COMMAND [ 161] - USER x
[2009-08-12 21:00:52]: REPLY [ 161] - 331 User x, password please

[2009-08-12 21:00:52]:COMMAND [ 161] - PASS ***********
[2009-08-12 21:00:52]: REPLY [ 161] - 230 Password Ok, User logged in

[2009-08-12 21:00:52]:COMMAND [ 161] - CLNT FileZilla
[2009-08-12 21:00:52]: REPLY [ 161] - 200 Command okay

[2009-08-12 21:00:52]:COMMAND [ 161] - OPTS UTF8 ON
[2009-08-12 21:00:52]: REPLY [ 161] - 220 UTF8 support on

[2009-08-12 21:00:52]:COMMAND [ 161] - CWD /x
[2009-08-12 21:00:52]: REPLY [ 161] - 250 Change directory ok

[2009-08-12 21:00:52]:COMMAND [ 161] - TYPE I
[2009-08-12 21:00:52]: REPLY [ 161] - 200 Type Binary

[2009-08-12 21:00:52]:COMMAND [ 161] - PASV
[2009-08-12 21:00:52]: REPLY [ 161] - 227 Entering Passive Mode (80,62,39,18,49,178)

[2009-08-12 21:00:52]:COMMAND [ 161] - MLSD
[2009-08-12 21:00:52]: REPLY [ 161] - 150 Opening data connection

[2009-08-12 21:00:52]: REPLY [ 161] - 226 Transfer complete

[2009-08-12 21:00:53]:COMMAND [ 161] - PASV
[2009-08-12 21:00:53]: REPLY [ 161] - 227 Entering Passive Mode (80,62,39,18,49,179)

[2009-08-12 21:00:53]:COMMAND [ 161] - STOR File
[2009-08-12 21:00:53]: REPLY [ 161] - 150 Opening data connection

So far so good. Upload begins. After 10 minutes, my router cuts the control connection, and FileZilla Panicks!

[2009-08-12 21:10:49]:CONNECT [ 162] - Incoming connection request on interface 192.168.1.100
[2009-08-12 21:10:49]:CONNECT [ 162] - Connection request accepted from 87.x.x.x
[2009-08-12 21:10:49]:COMMAND [ 162] - USER x
[2009-08-12 21:10:49]: REPLY [ 162] - 331 User x, password please

[2009-08-12 21:10:49]:COMMAND [ 162] - PASS ***********
[2009-08-12 21:10:49]: REPLY [ 162] - 230 Password Ok, User logged in

[2009-08-12 21:10:49]:COMMAND [ 162] - CLNT FileZilla
[2009-08-12 21:10:49]: REPLY [ 162] - 200 Command okay

[2009-08-12 21:10:49]:COMMAND [ 162] - OPTS UTF8 ON
[2009-08-12 21:10:49]: REPLY [ 162] - 220 UTF8 support on

[2009-08-12 21:10:49]:COMMAND [ 162] - CWD /x
[2009-08-12 21:10:49]: REPLY [ 162] - 250 Change directory ok

[2009-08-12 21:10:49]:COMMAND [ 162] - TYPE I
[2009-08-12 21:10:49]: REPLY [ 162] - 200 Type Binary

[2009-08-12 21:10:49]:COMMAND [ 162] - PASV
[2009-08-12 21:10:50]: REPLY [ 162] - 227 Entering Passive Mode (80,62,39,18,49,180)

[2009-08-12 21:10:50]:COMMAND [ 162] - MLSD
[2009-08-12 21:10:50]: REPLY [ 162] - 150 Opening data connection

[2009-08-12 21:10:50]: REPLY [ 162] - 226 Transfer complete

[2009-08-12 21:10:50]:COMMAND [ 162] - PASV
[2009-08-12 21:10:50]: REPLY [ 162] - 227 Entering Passive Mode (80,62,39,18,49,181)

[2009-08-12 21:10:51]:COMMAND [ 162] - APPE File
[2009-08-12 21:10:51]: REPLY [ 162] - 150 Opening data connection

[2009-08-12 21:10:51]: ERROR [ 162] - Unable to open file: 'The process cannot access the file because it is being used by another process.
'
(What happened with that end quote/eoln? ;-) )
[2009-08-12 21:10:51]: REPLY [ 162] - 500 File transfer failed

- and then FileZilla retries with new passive data connections like pearls on a string (1 per 1-2 seconds) until:

[2009-08-12 21:12:19]:COMMAND [ 162] - PASV
[2009-08-12 21:12:19]: REPLY [ 162] - 227 Entering Passive Mode (80,62,39,18,50,56)

[2009-08-12 21:12:19]:COMMAND [ 162] - MLSD
[2009-08-12 21:12:19]: REPLY [ 162] - 150 Opening data connection

[2009-08-12 21:12:19]: REPLY [ 162] - 226 Transfer complete

[2009-08-12 21:12:20]:COMMAND [ 162] - PASV
[2009-08-12 21:12:20]: SYSTEM [ 161] - File transfer complete
[2009-08-12 21:12:20]: ERROR [ 161] - Unable to send data
[2009-08-12 21:12:20]: ERROR [ 161] - Error closing connection: An existing connection was forcibly closed by the remote host.

[2009-08-12 21:12:20]: REPLY [ 162] - 227 Entering Passive Mode (80,62,39,18,50,57)

[2009-08-12 21:12:20]:COMMAND [ 162] - APPE File
[2009-08-12 21:12:20]: REPLY [ 162] - 150 Opening data connection

[2009-08-12 21:12:20]:CONNECT [ 161] - Connection terminated.

- and then we have 10 more minutes to upload, before the same scenario repeats. It is not Cerberus' fault that 90 seconds must pass until it discovers the broken data connection, this is simply due to the connection being cut "silently", at least on the inner side of the router. When I reproduce this without the router, the 90 seconds don't happen.

BUT: During all these attempts, about 30MB are uploaded every time, so the file goes to 30, then 60, then 90, then 120MB and so on, but sometimes, the file is reset to 0 instead of being appended:

[2009-08-13 00:56:43]:COMMAND [ 193] - PASV
[2009-08-13 00:56:43]: REPLY [ 193] - 227 Entering Passive Mode (80,62,39,18,59,230)

[2009-08-13 00:56:43]:COMMAND [ 193] - MLSD
[2009-08-13 00:56:43]: REPLY [ 193] - 150 Opening data connection

[2009-08-13 00:56:43]: REPLY [ 193] - 226 Transfer complete

[2009-08-13 00:56:44]:COMMAND [ 193] - PASV
[2009-08-13 00:56:44]: REPLY [ 193] - 227 Entering Passive Mode (80,62,39,18,59,231)

[2009-08-13 00:56:44]:COMMAND [ 193] - APPE File
[2009-08-13 00:56:44]: REPLY [ 193] - 150 Opening data connection

[2009-08-13 00:56:44]: SYSTEM [ 192] - File transfer complete
[2009-08-13 00:56:45]: ERROR [ 192] - Unable to send data
[2009-08-13 00:56:45]: ERROR [ 192] - Error closing connection: An existing connection was forcibly closed by the remote host.

[2009-08-13 00:56:46]:CONNECT [ 192] - Connection terminated.

(Note, this is the ONLY time during all the attempts, that the sequence is APPE - 150 - Error closing connection - [Open file = Success], all other attempts have been APPE - 150 - [Open file = Unable to open file] - Error closing connection - APPE - 150 - [Open file = Success], and that timing could be relevant, but I have also seen the issue without recognizing this sequence.)

There is nothing odd about this so far, but let's take a look at the Filemon log from these two points in time, where APPE is issued:

21:12:19 CerberusGUI.exe:908 QUERY INFORMATION x:\x\File SUCCESS Attributes: A
21:12:19 CerberusGUI.exe:908 OPEN x:\x\File SHARING VIOLATION Options: Open Access: All
21:12:19 CerberusGUI.exe:908 CREATE x:\x\File SHARING VIOLATION Options: OverwriteIf Access: All
21:12:20 CerberusGUI.exe:908 CLOSE x:\x\File SUCCESS
21:12:20 CerberusGUI.exe:908 QUERY INFORMATION x:\x\File SUCCESS Attributes: A
21:12:20 CerberusGUI.exe:908 OPEN x:\x\File SUCCESS Options: Open Access: All
21:12:20 CerberusGUI.exe:908 QUERY INFORMATION x:\x\File SUCCESS Length: 30981792
21:12:20 CerberusGUI.exe:908 CLOSE x:\x\File SUCCESS
21:12:20 CerberusGUI.exe:908 OPEN x:\x\File SUCCESS Options: OpenIf Access: All
21:12:20 CerberusGUI.exe:908 SET INFORMATION x:\x\File SUCCESS Length: 30981792
21:12:20 CerberusGUI.exe:908 SET INFORMATION x:\x\File SUCCESS Length: 30981792
21:12:20 CerberusGUI.exe:908 WRITE x:\x\File SUCCESS Offset: 30981792 Length: 5520

- and the upload continues. The last time this happens:

00:56:43 CerberusGUI.exe:908 QUERY INFORMATION x:\x\File SUCCESS Attributes: A
00:56:43 CerberusGUI.exe:908 OPEN x:\x\File SHARING VIOLATION Options: Open Access: All
00:56:43 CerberusGUI.exe:908 CREATE x:\x\File SHARING VIOLATION Options: OverwriteIf Access: All
00:56:44 CerberusGUI.exe:908 QUERY INFORMATION x:\x\File SUCCESS Attributes: A
00:56:44 CerberusGUI.exe:908 OPEN x:\x\File SHARING VIOLATION Options: Open Access: All
00:56:45 CerberusGUI.exe:908 CLOSE x:\x\File SUCCESS
00:56:46 CerberusGUI.exe:908 CREATE x:\x\File SUCCESS Options: OverwriteIf Access: All
00:56:46 CerberusGUI.exe:908 WRITE x:\x\File SUCCESS Offset: 0 Length: 65536

Oups!!! The file is OVERWRITTEN and not appended to. This Filemon log does NOT match the Cerberus log. At 56:43 MSLD causes a QUERY INFORMATION, (the one logged at 56:44 though, I think), and the following open fails due to a sharing violation. At 56:45, [192] discovers that the line is down and closes the file, AND THEN at 56:46, the 150 Opening data connection from [193] issued at 56:44 results in a CREATE, instead of QUERY INFORMATION for Length / OPEN / SET INFORMATION, and the file is restarted. The client knows nothing of this, and apparently sends only the last part of the file (I can't figure out where it got the length from, though, but the upload "finished" normally soon after, and the client was satisfied).

I have saved both logs, if you need more info from them.
Morten Due Jørgensen
http://www.mdjnet.dk

User avatar
Serin
Site Administrator
Posts: 1785
Joined: Sat Jan 01, 2005 6:57 pm
Location: United States
Contact:

Re: APPE bug.

Post by Serin » Fri Aug 14, 2009 10:55 pm

Well, this certainly gives me something to do this weekend :)

I've had one other person report a potential problem with the APPE command so there may very well be something to this. Some sort of race condition.

More to come when I get to this a little later.


Thanks for the detailed analysis!

User avatar
Serin
Site Administrator
Posts: 1785
Joined: Sat Jan 01, 2005 6:57 pm
Location: United States
Contact:

Re: APPE bug.

Post by Serin » Sat Aug 15, 2009 1:45 pm

I've found a few problems with the APPE command that may be contributing to what you were seeing. I've made a few changes that may solve the issue. I'm not convinced we've nailed it but the changes are definitely for the better. We have a more reliable file size checker that supports file sizes greater than 4GB. The APPE command wasn't updated before to 64bit like the rest of the upload and download functions, so that was one potential issue. We've also implemented a more reliable method of checking the file size, even when the file is being used. Hopefully you won't see another file size 0 APPE.

This will be in the official release 3.02 coming this weekend.

mdj
Moderator
Posts: 656
Joined: Mon Aug 18, 2003 4:00 am
Location: Denmark
Contact:

Re: APPE bug.

Post by mdj » Sun Aug 16, 2009 4:34 pm

Great, I will try it out, and give it the same run. The file size I used was 5-600MB and another one at 1GB, not close to 4GB, but I guess is it a good idea to have it 64 bit in any case. :-)

Oh, BTW, you write the "upcoming release 3.02", but that release is dated August 13, 2 days earlier than the time of writing? :-)

- and when I download and install it, it claims to be 3.0.1 in the beginning of the log and in the about box, with a build date of July 31.
Morten Due Jørgensen
http://www.mdjnet.dk

User avatar
Serin
Site Administrator
Posts: 1785
Joined: Sat Jan 01, 2005 6:57 pm
Location: United States
Contact:

Re: APPE bug.

Post by Serin » Sun Aug 16, 2009 9:03 pm

Ah, I updated the download page and accidentally uploaded it. If you check back now you will see it is correct, with the correct files available for download. I just uploaded everything about 20 minutes ago.

No, I didn't mean to imply the 64-bit fix would resolve the problem you were seeing. I actually did a bit more to make sure that the functions being used to calculate file size could do so even if the file was open for write access. Previously, they would fail when trying to determine file size if the file was open for writing. I would zero out the existing file size in that case and then later in the function attempt to open the file for appending. If the file is freed for writing between when I checked the file size (which would fail) and when I attempt to actually open the file for appending, then the file would get incorrectly overwritten. That might take care of the race condition where the file size can't be determined (so I set it to zero) and then the file is freed up for write access between when I check the file size and then attempt to append to it.

The alternative (which is probably safer) is to fail completely if the file is open for writing. I think that is probably the better option but the fix I have in now should be fairly reliable.

mdj
Moderator
Posts: 656
Joined: Mon Aug 18, 2003 4:00 am
Location: Denmark
Contact:

Re: APPE bug.

Post by mdj » Mon Aug 17, 2009 10:01 am

I tried the new version, it apparently is different. I see no file resets anymore, but rather FileZilla with fail the transfer completely (and it has to be restarted manually), but it does so significantly more often than the server earlier used to reset the file, so I don't really understand why it fails, and how it is connected with the fix. Anyway, FileZilla does NOT react too intelligently to this situation, so it is probably a FileZilla bug (too). At least the files are not reset anymore, so I am happy.
Morten Due Jørgensen
http://www.mdjnet.dk

User avatar
Serin
Site Administrator
Posts: 1785
Joined: Sat Jan 01, 2005 6:57 pm
Location: United States
Contact:

Re: APPE bug.

Post by Serin » Tue Aug 18, 2009 10:29 am

mdj,

I did all of my testing with Filezilla, though manually resuming was always necessary because I would cancel the transfer or kill the client to try to mimic the effect your were talking about. I agree, Filezilla does handle the situation very oddly.

I might be able to do something about the timeout issue and will look into it some more.

mdj
Moderator
Posts: 656
Joined: Mon Aug 18, 2003 4:00 am
Location: Denmark
Contact:

Re: APPE bug.

Post by mdj » Wed Aug 19, 2009 2:59 am

To mimic the situation, you should use tcpview and cut only the tcp connection for the control connection instead of killing it off entirely.I have made some more tests with FileZilla, and the log from there shows that timing is very different from case to case, and I am sure that is the reason for FileZilla's different behaviour. I am not sure changing any timing in Cerberus will make any difference.
Morten Due Jørgensen
http://www.mdjnet.dk

Post Reply