For some use-cases, in particular link-local multicast, it may not be possible to use multicast routing, then I recommend trying out:
- Bridging networks, see bridge(8) or Linux bridge - how it works
- igmproxy, mcproxy, or
- OpenVPN in Layer-2, bridged mode
Make sure to check out the FAQ for the most common problems.
To understand multicast routing one needs to first understand how multicast on a LAN works. In this HowTo the LAN is usually referred to as Layer-2 and routing (between LANs) as Layer-3.
Layer-2 vs Layer-3
First, without any form of regulation multicast is broadcast, which is inherently bad. We don’t want to route broadcast, but even on a LAN we want to keep broadcast to a minimum. Before switches we had hubs which caused all traffic to be broadcast. We can do better, not just for the sake of security but also to increase bandwidth.
Enter IGMP and MLD, the control protocol for multicast on a LAN, for IPv4 and IPv6 respectively. Essentially they act as a subscription based protocol, every N seconds sending out a Query to all nodes on the LAN; “Anyone want multicast?", to which a node can reply; “Yes please, I’d like to have 126.96.36.199.". MLD (IPv6) works in a similar way.
Most managed switches today support IGMP Snooping which is just a fancy way of saying: this switch does not flood multicast like broadcast on all ports. However, it also means that everyone on a LAN that want multicast have to be able to request it.
Multicast Distribution Point
On a LAN with many IGMP Snooping capable switches a Querier election will take place. The switch (or router) with the lowest IP address wins, unless that address is 0.0.0.0. The elected IGMP querier on a LAN becomes the distribution point for all multicast. All other switches on the LAN must forward both known and unknown multicast to the Querier.
Usually the network IP plan (design) is set in such a way that the router has the lowest IP address on the LAN, so that it will always be the elected IGMP querier, hence receiving all multicast so it can route it as required.
Multicast Router Ports
In a network with redundant/multiple multicast routers one cannot rely solely on the IGMP querier election. Instead, most managed switches have a setting called Multicast Router Ports where you can configure on which ports to also flood all multicast.
This feature can also be used to forward multicast to nodes that do not speak IGMP/MLD. But since it will forward all multicast it is usually better to set up static FDB entries per multicast group on the switch instead.
Many switches are limited to filtering multicast based on the multicast MAC equivalent. In our case of 188.8.131.52 it would be mapped to 01:00:5e:01:02:03. For more on this, and the limitations it brings, see RFC1112.
PIM-SM :: IGMP v2 vs. PIM-SSM :: IGMP v3
In the beginning there was darkness and DVMRP conquered the earth. The God of all multicast, which is Steve Deering, was pleased. Then light came upon us, gods were overthrown and PIM-SM was invented.
The story continues but becomes a bit of a blur because so much happened in such a short time frame. There are RFCs that tell the tale better, go read up on them. What eventually came out of it was this:
IGMP v2 was an OK protocol, a client requested a group, 184.108.40.206, and it was unblocked by switches leading up to the Querier and multicast was received.
PIM-SM was OK, many routers could agree on what groups each of them had, set up a distribution tree with magic distribution point/routers, called rendez-vous points (RPs), for one or more, but not necessarily all multicast groups. This could be tweaked by hand as well to optimize distribution.
But what if we had multiple senders for the same group? A ridiculous thought at first, but as deployments grew a need for optimizing also based on the sender (source) grew as well. Enter PIM-SSM for layer-3 and IGMP v3 on layer-2. The biggest change is the ability to forward multicast based on source and group, called (S,G). An end node on a LAN could now send an IGMP report requesting (192.168.10.1, 220.127.116.11).
What is TTL and Why Can’t I Route Multicast?
The single most common problem with routing multicast that everybody runs into is the TTL.
The TTL is the the Time To Live field in the IP header of a frame.
$ tcpdump -i lo -vvv icmp tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes 20:36:50.146377 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.122.1 > 18.104.22.168: ICMP echo request, id 16746, seq 20, length 64 20:36:51.146380 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.122.1 > 22.214.171.124: ICMP echo request, id 16746, seq 21, length 64
The TTL for broadcast and multicast is by default 1, which means that a router should not forward the frame beyond the originating LAN. Again, without any regulation multicast is broadcast and we do not want to route broadcast!
The singular best way to fix this problem is for the sender to to set a higher TTL. Set it only as high as the number of hops you want this to be forwarded!
Multicast is used for A LOT of protocols, many of which was only ever intended to be either link-local or limited to the LAN. Check with the appropriate standard (RFCs are freely available online) before attempting something foolish that may cause unintended side effects.
For such cases when you want to connect two remote sites and make them into one big LAN you might be better off using, e.g. a bridged SSL VPN on layer-2.
However, when you absolutely cannot change the TTL at the sender and bridging the two LANs is out of the question, then you can try using the firewall. On Linux systems you can mangle matching frames with the following magic rule(s):
iptables -t mangle -A PREROUTING -d 126.96.36.199 -j TTL --ttl-inc 1
or, if the sender runs on a system with Linux you can change the frame as it egresses:
iptables -t mangle -A OUTPUT -d 188.8.131.52 -j TTL --ttl-set 128
The group can also be a range, so
184.108.40.206/8 is possible to enter,
and as is shown in these examples, either
be used to adjust the TTL value.
Reverse Path Forwarding
Dynamic multicast routing protocols like DVMRP and PIM-SM rely on something called Reverse Path Forwarding to build a multicast distribution tree.
The unicast routing table has the destination in focus, i.e. how to forward an inbound frame towards the destionation address.
A multicast router builds tables to instead find the reverse path, from the receiver (who requests multicast) to the source of the multicast distribution tree.
mrouted uses its built-in RIP to construct its distribution tree, and
pimd relies on an external routing protocol like OSPF, RIP, or even a
manually set up routing table on each router.
Note: A common problem with multicast forwarding on Linux based
rp_filter. Many systems has this by default set to
‘strict’ mode, to protect against DDOS attacks, which may cause major
mrouted. If the reverse path to the source
of multicast cannot be determined the frames will be dropped by the
kernel. See the FAQ below for more information.
CORE Network Simulator
These days I do most of my multicast testing with CORE, which is readily available i Debian/Ubuntu as simple as:
sudo apt-get install core-gui
CORE is very simple to get started with:
- Fire up the GUI
- Drag and drop a few router icons to the grid
- Connect them
- BOOM you now have IP addresses automatically assigned!
- Press the Play button – routers are now starting up
Play around a bit to try it out, it’s awesome! I usually start
smcroute manually with a script. (You can access the
shell of each router by right-clicking on them.) The host file system
is reachable from each router in CORE, even though they are isolated and
have their own network namespaces.
Roll your own Cloud
The below setup is done using four Ubuntu 12.04 LTS virtual machines running the linux-virtual kernel package. In the HowTo I mention both pimd and mrouted, since they work out-of-the-box w/o any config changes, but you could just as easily use SMCRoute for the same purpose.
When setting up virtual machines and virtual networking there are many
pitfalls and several requirements for the host to consider. The most
important one, that needs pointing out, is a bug in the IGMP snooping
code in the Linux bridging code: the bridge handles the special case
224.0.0.* well, but all unknown multicast streams outside of that
segment should also be forwarded as-is to all multicast routers. Since
this does not work with the current IGMP snooping code in the Linux 3.13
kernel bridge code you must disable snooping:
host# echo 0 > /sys/devices/virtual/net/virbr1/bridge/multicast_snooping host# echo 0 > /sys/devices/virtual/net/virbr2/bridge/multicast_snooping host# echo 0 > /sys/devices/virtual/net/virbr3/bridge/multicast_snooping
Disabling IGMP snooping on the hosts’
virbr3 is not really necessary,
but is done anyway for completeness, and also because I re-use the
same setup in other test cases as well.
It is of course not recommended to disable IGMP snooping on a bridge, but if it’s buggy you really don’t have a choice. Please check this for yourself since it depends on the kernel you run.
R1 R2 R3 R4 .--------. .--------. .--------. .--------. |eth0 | 172.16.12.0/24 |eth0 | 172.16.10.0/24 |eth0 | 10.1.0.0/24 |eth0 | | .1|----------------|.2 .1|----------------|.2 .1|----------------|.2 | | eth1| virbr1 | eth1| virbr2 | eth1| virbr3 | | '--------' '--------' '--------' '--------'
This setup, or another more advanced, can be used for trying out SMCRoute, pimd and mrouted. Remember there is only one multicast routing socket, so for each router (Rn) you have to choose one of:
pimd -c pimd.conf
mrouted -c mrouted.conf
smcroute -f smcroute.conf
The default configuration files delivered with
usually suffice, see their respective manual pages or the comments in
.conf file for help.
When you start
mrouted, you’re usually ready to go immediately. But
in the case of
pimd, wait for routers to peer. Then you can test your
setup using multicast
ping from R1 to a
tcpdump on R4:
R1# ping -I eth1 -t 3 220.127.116.11 R4# tcpdump -i eth0
As soon as the PIM routers R2 and R3 have peered you should start seeing
ICMP traffic reaching R4. If you don’t, then check the underlying
routing protocol, e.g. RIP or OSPF, to make sure the reverse path is
known – this is required for PIM since, unlike DVMRP (
sort of has RIP built-in, PIM relies on a unicast routing protocol to
have populated the routing table.
Now, to the actual test case. The first command for R1 adds a route for
all multicast packets, that is necessary for all tools where you cannot
set the outbound interface for the multicast stream, in our case
R1# ip route add 18.104.22.168/4 dev eth1 R1# iperf -u -c 22.214.171.124 -T 3 R4# iperf -s -u -B 126.96.36.199
-T option is important since it tells
iperf to raise the TTL to
3, the default TTL for multicast is otherwise 1 due to its broadcast
The desired output from
iperf is as follows:
R1# iperf -u -c 188.8.131.52 -T 3 ------------------------------------------------------------ Client connecting to 184.108.40.206, UDP port 5001 Sending 1470 byte datagrams Setting multicast TTL to 3 UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 172.16.12.1 port 55731 connected with 220.127.116.11 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec [ 3] Sent 893 datagrams R4# iperf -s -u -B 18.104.22.168 ------------------------------------------------------------ Server listening on UDP port 5001 Binding to local address 22.214.171.124 Joining multicast group 126.96.36.199 Receiving 1470 byte datagrams UDP buffer size: 160 KByte (default) ------------------------------------------------------------ [ 3] local 188.8.131.52 port 5001 connected with 172.16.12.1 port 55731 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 0.268 ms 0/ 893 (0%)
To achieve the same using SMCRoute you need to setup
the multicast routing rules manually. The easiest way to do this is to
/etc/smcroute.conf and and start/restart
smcroute, or send
SIGHUP to an already running daemon. The below example makes use of
the source-less (*,G) approach, since we in our limited setup have full
control over all multicast senders. There is a slight setup cost
associated with this: the time it takes the kernel to notify SMCRoute
about a new source and before the the actual multicast route is written
to the kernel. In most cases this is acceptable.
smcroute.conf on R2:
mgroup from eth0 group 184.108.40.206 mroute from eth0 group 220.127.116.11 to eth1
smcroute.conf on R3:
mgroup from eth0 group 18.104.22.168 mroute from eth0 group 22.214.171.124 to eth1
smcroute on each of R2 and R4 and then proceed to start
iperf on R4 and R1, as described above. You should get the same result
That’s it. Have fun!
It doesn’t work?
Check the TTL.
Why does the TTL in multicast default to 1?
Because multicast is classified as broadcast, which inherently is dangerous. Without proper limitation, like switches with support for IGMP Snooping, multicast IS broadcast.
I cannot change the TTL of the multicast sender, what can I do?!
Ouch, then you may have to use some firewall mangling technique. Here is how you could do it on Linux with iptables:
iptables -t mangle -A PREROUTING -d GROUP[/LEN] -j TTL --ttl-set 64
GROUPis the multicast group address of the stream you want to change the TTL of, with an optional prefix length
LENif you want to specify a range of groups. From this RedHat mailing list entry.
I want to use the loopback interface, but it doesn’t show in
Some interfaces, like
tunN, do not have the
MULTICASTinterface flag set by default. It should work if you enable it:
ip link set lo multicast on ip link set tun1 multicast on
It still doesn’t work?!
Check your network topology, maybe a switch between the sender and the receiver doesn’t properly support IGMP snooping.
For virtual/cloud setups, see above for disabling IGMP snooping entirely in the Linux kernel bridge.
The PIM routers seem to have peered, and they list the multicast groups I want to forward, the TTL is OK, but I see no traffic?
Could be your underlying routing protocol (RIP/OSPF) does not know the reverse path to the source. Make sure the sender’s network is listed in the routing table on the receiving sides routing table.
Also, on Linux you might get bitten by the
rp_filter. It can be modified in your system
/etc/sysctl.conffile. Check it with:
# sysctl -ar '\.rp_filter' net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.eth0.rp_filter = 0 net.ipv4.conf.eth1.rp_filter = 0 net.ipv4.conf.lo.rp_filter = 1 net.ipv4.conf.pimreg.rp_filter = 0
What’s the routing performance of
N/A. Neither of them take active part in the actual forwarding of multicast frames. This is what the kernel or dedicated routing HW does. The routing daemons
smcrouteonly manipulate the multicast routing table(s) of the operating system’s kernel.
Do you have any example of how to set up GRE and
Yes, for all the gory details see this howto