When I do iperf from my client to server over VPN (OpenVPN uses UDP, iperf uses UDP or TCP), speed is slow (6mbps). When I do it over TCP without VPN, speed is fast (300mbps). CPU is ok. This post summarizes debugging. Notable settings in my OpenVPN server
tun-mtu 1440
mtu-disc yes
mssfix 1400;
# no compression is enabled. not lzo, not lz4
Fast-io
fast-io
On both client and server didn’t make a difference (and I didn’t really expect it; CPU was not saturated)
Increasing snd/rcv buf
The below config adjusts send / receive buffer on both the server and the recipient (pushes the config to the client)
sndbuf 5120000
rcvbuf 5120000
push "sndbuf 5120000"
push "rcvbuf 5120000"
This made a slight difference, but not too big. But the upload is now faster (65mbps)
mtu-disc on the client
Server already had mtu-disc and mssfix, but client did not
mtu-disc yes
mssfix 1400;
That didn’t make a difference 🙁
Check systemctl buffers
I am starting to give up. In the act of desperation, I started looking at mem_max in the system.
Server config
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
net.ipv4.udp_mem = 762564 1016753 1525128
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
Client Config
net.core.rmem_max = 212992
net.core.wmem_max = 212992
net.ipv4.udp_mem = 87666 116889 175332
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
So it’s clear that server has larger memory buffer.
Changes
/etc/sysctl.conf:
net.core.rmem_max = 16777216 # 16MB
net.core.wmem_max = 16777216 # 16MB
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
sysctl --system
Now it at least gets 50mbps! Progress!
But why. The default setting of 200KB should be enough for a receive window. I am not sure I understand why increasing the receive buffer size increases throughput. For UDP, we should receive data on the socket and simply move on.
One thing is suspicious though. While I get 50mbps in vnstat
, iperf on udp reports only 1mbps
. But iperf on TCP goes to 15mbps. (I am running a background transfer, and that background transfer sped up from 6mbps to 40-50mbps, so it looks like it’s competing for resources now)
So progress.
Thinking about it more, could this still indicate fragmentation? If packets are fragmented, maybe more buffer is needed?
That UDP config for server is kinda large:
net.ipv4.udp_mem = 762564 1016753 1525128
So I am going to align it with the client:
net.ipv4.udp_mem = 87666 116889 175332
Reading now this
txqueuelen 1000
This seems to have helped with throughput by about another 20mbps (50->70). I have issues with all of those approaches as they mean that any retransmissions would take longer. I still don’t fully understand why the performance is what it is (i.e. where is the bottle neck) and it’s eating me alive.
Encryption?
I already confirmed that CPU is not a problem, but I couldn’t resist to test the openssl performance for aes-256 cbc and gcm.
[10.0.7.76] piotr@usa:~ $ openssl speed -evp aes-256-cbc
Doing AES-256-CBC for 3s on 16 size blocks: 166333872 AES-256-CBC's in 2.98s
Doing AES-256-CBC for 3s on 64 size blocks: 48983208 AES-256-CBC's in 2.99s
Doing AES-256-CBC for 3s on 256 size blocks: 12494769 AES-256-CBC's in 2.99s
Doing AES-256-CBC for 3s on 1024 size blocks: 3106704 AES-256-CBC's in 2.97s
Doing AES-256-CBC for 3s on 8192 size blocks: 392200 AES-256-CBC's in 2.98s
Doing AES-256-CBC for 3s on 16384 size blocks: 194062 AES-256-CBC's in 2.96s
version: 3.0.16
built on: Tue Apr 15 19:59:18 2025 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -DOPENSSL_TLS_SECURITY_LEVEL=2 -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/reproducible-path/openssl-3.0.16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-CBC 893067.77k 1048470.00k 1069786.24k 1071132.96k 1078155.17k 1074159.39k
[10.0.7.76] piotr@usa:~ $ openssl speed -evp aes-256-gcm
Doing AES-256-GCM for 3s on 16 size blocks: 7052334 AES-256-GCM's in 2.98s
Doing AES-256-GCM for 3s on 64 size blocks: 6881244 AES-256-GCM's in 2.97s
Doing AES-256-GCM for 3s on 256 size blocks: 6183174 AES-256-GCM's in 2.98s
Doing AES-256-GCM for 3s on 1024 size blocks: 4426453 AES-256-GCM's in 2.96s
Doing AES-256-GCM for 3s on 8192 size blocks: 1293950 AES-256-GCM's in 2.99s
Doing AES-256-GCM for 3s on 16384 size blocks: 721029 AES-256-GCM's in 2.99s
version: 3.0.16
built on: Tue Apr 15 19:59:18 2025 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -DOPENSSL_TLS_SECURITY_LEVEL=2 -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/reproducible-path/openssl-3.0.16=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-256-GCM 37864.88k 148282.70k 531171.99k 1531313.47k 3545163.34k 3950949.54k
It’s clear that AES is plenty fast in both variants.
So what is it!?!?
Worth noting that now TCP over OpenVPN (over UDP) is pretty fast (compared to where it was). Gets up to 80mbps. But UDP on the other hand is abysmal (around 1mbps). For my server, BBR is used for congestion control.
Anyway, I have now unified all the configs, and am getting 80-90mbps on TCP over OpenVPN. Time to call it a day:
Server
net.core.rmem_default=1048576
net.core.wmem_default=1048576
net.core.rmem_max=2097152
net.core.wmem_max=2097152
net.ipv4.udp_mem = 87666 116889 175332
tun-mtu 1440
mtu-disc yes
mssfix 1400
txqueuelen 1000
fast-io
sndbuf 2097152
rcvbuf 2097152
push "sndbuf 2097152"
push "rcvbuf 2097152"
And the same config for sysctl on the client. No config for sndbuf nor rcvbuf in the openvpn config, but fast-io is enabled.
Leave a Reply