MPI implementations are large, complex beasts. By definition, they span many different layers ranging from the user-level implementation all the way down to sending 1s and 0s across a physical network medium.
However, not all MPI implementations actually “own” the code at every level. Consider: a TCP-based MPI implementation only “owns” the user-level middleware. It cannot see or change anything in the TCP stack (or below). Such an implementation is limited to optimizations at the user space level.
That being said, there certainly are many optimizations possible at the user level. In fact, user space is probably where the largest number of optimizations are typically possible. Indeed, nothing can save your MPI_BCAST performance if you’re using a lousy broadcast algorithm.
However, lower-layer optimizations are just as important, and can deliver many things that simply cannot be effected from user space.