[OpenVPN home] [Date Prev] [Date Index] [Date Next]
[OpenVPN mailing lists] [Thread Prev] [Thread Index] [Thread Next]
Google
 
Web openvpn.net

Re: [Openvpn-users] OpenVPN tunnel corruption


  • Subject: Re: [Openvpn-users] OpenVPN tunnel corruption
  • From: "James Yonan" <jim@xxxxxxxxx>
  • Date: Tue, 7 Oct 2003 06:42:33 -0000

Brett Johnson <maillist@xxxxxxx> said:

> This is essentially the same setup I had under CIPE awhile back.  I never
had any corruption there.  OpenVPN was the only change because CIPE had
reconnect problems.  I have noticed that the corruptions have lessened
gradually with each new release of OpenVPN.

I really doubt that _any_ VPN package could corrupt an FTP session due to the
combination of cryptographic authentication and TCP checksumming.

As an experiment try this:

OpenVPN actually has the ability to corrupt packets, intentionally.  It is a
debugging mode that is used for testing, and can be activated with the
--gremlin option.

--gremlin will cause OpenVPN to randomly corrupt bits in received packets,
immediately after they are read from the network.

Now try an FTP session over the tunnel and see if the use of --gremlin can
corrupt the FTP transfer.

In fact I just tried this test myself, transfering a 360KB file over an
OpenVPN link with --gremlin specified on both sides of the connection.  I even
purposely turned off encryption/authentication, so that rather than being
dropped immediately due to a failed HMAC test, the packets are passed into the
TCP/IP stack and must be rejected by the IP checksumming code.

The result is just as one would expect.  The transfer goes very slowly,
because the packet corruption results in packets getting dropped, so they need
to be retransmitted.

But after the transfer is complete, I did a cmp of the file against the
original and they are identical.

Now I try the same test again with encryption and authentication enabled.

I get a lot of messages like this:

Authenticate/Decrypt packet error: packet HMAC authentication failed

but again, the transfer completes, slower than normal, and the transfered file
is identical to the original.

> As far as using tcpdump, I can't sit around watching hundreds of megabytes
of packets going through trying to catch the bad segment that happens once
every few days.

Obviously, but this is a pattern recognition problem and tcpdump has quite a
number of filters.  Since we are looking for corrupted packets with bad
checksums, that significantly cuts down on the number of packets which would
need to be looked at.

> I understand about the UDP layer having it's checksum, the VPN layer having
its checksum (sha), and the TCP connection within the VPN (for FTP in this
case) also having it's checksum.  In reality I find it hard to believe that
there could be a failure here, but this is the only place where the failures
are showing up.  The finger keeps pointing back to something withing the
OpenVPN line of events.  That's why I'm calling this a bug.  Believe me, this
isn't the first time I've sat down to troubleshoot this.  I've been running
OpenVPN for months now and have just put up with sporadic failures.

Well clearly there's a bug somewhere, but I'm sceptical it's an OpenVPN bug
for the reasons listed above.  Even when I intentionally turn on packet
corruption in OpenVPN using --gremlin, I am unable to corrupt an FTP session
running over the tunnel, even if I also disable encryption/authentication.

> Let's shift gears for a minute.  How and where could it possibly fail?  Can
>OpenVPN modify the data after it comes out of encryption?

It can but only does when you are using special purpose packet mangling
options, such as --gremlin or --mssfix.

>Does TCP checksumming really catch this?

TCP checksumming will (with a high degree of accuracy) catch corruption that
is picked up during packet processing and network transmission.  It will not,
however, protect against malicious corruption.  That is the job of HMAC-SHA1
to protect against.

> Could there be a problem with the tun device?

TCP checksumming would catch it.  Because it is inside the "window" that
exists after the checksum is applied by the sender and before it is verified
by the receiver.

>Could it flub up the packet?  Would the tap device be better to use?

The tap device on linux seems not as solid as the tun device, based on some
analysis I've done recently on tap device stalls.  It doesn't corrupt
packets, but it causes TCP connections to occasionally stall.

My best guess is that the corruption is occuring before the packets are
checksummed by TCP on the sending side, or after the packets are verified by
TCP on the receiving side.  That means a problem in the FTP client or server,
a problem in the TCP/IP stack, or even something like file system corruption.

I would try using other file transfer protocols such as rsync or scp, or other
ftp client/server implementations to see if you can replicate the corruption.

At this point I would say we need more data points before we can narrow down
the cause.

James



____________________________________________
Openvpn-users mailing list
Openvpn-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/openvpn-users