ByRef vs ByVal

| | Comments (13)

I could have sworn that I blogged on this topic before, but a peek at my backlog shows that I've not. I'll be dipped! Please accept my apologies about how this posting rambles on... I've had about eight hours of sleep in the past three days (I hate traveling), and so I'm rather incoherent.

ByRef and ByVal are two modifiers used with parameter list declarations that allow you to specify the way the arguments are treated by the compiler. On the face of things, when you pass something "ByVal", you are passing it "by value", and when you pass something "ByRef", you are passing it "by reference." Of course, this is slightly muddied by the fact that some things can only exist as a value type, and so they cannot be passed ByRef, and other things only exist as a reference type and so they cannot be passed ByVal.

So how do these things work? Let's talk about ByVal first, since that's a bit easier to understand. When you pass something ByVal (which is the default in REALbasic, when you don't specify any modifiers), what really happens is that you get a temporary variable in the method being called. So any modifications to the parameter are really happening on the temporary and don't affect the original. Let's say you have a method signature:

Sub SomeAwesomeMethod( foo as String, bar as Integer )

Now, when you try to call that method

SomeAwesomeMethod( someProperty, 12 )

You can imagine what really happens is this:


// SomeAwesomeMethod( someStringProperty, 12 )
Push someStringProperty's value, which is "Rock on!"
Push 12
Call SomeAwesomeMethod

//Sub SomeAwesomeMethod( arg1 as String, arg2 as Integer )
Dim bar as Integer = 12
Dim foo as String = "Rock on!"
// Do the work

You see, the arguments in the parameter list are actually just local variables that are temps. So if you make changes to bar or foo, then those changes are entirely local to the SomeAwesomeMethod method.

Let's contrast that with ByRef, which passes things by reference. When you pass something by reference, what you're really doing is saying "hey compiler! You can find the value for that thing over here." So the compiler doesn't pass the actual data -- it passes a reference to where the data can be found (those of you coming from C/C++ know this concept -- it's just pointers!). So modifying the example above to be:

Sub SomeAwesomeMethod( ByRef foo as String, ByRef bar as Integer )

Now we're passing in the parameters by reference. In this case, we have to call the method with things that have storage somewhere (like properties, or local variables -- not constants):


Dim numberTwelve as Integer = 12
SomeAwesomeMethod( someStringProperty, numberTwelve )

You can imagine that what the compiler does instead is:


// Dim numberTwelve as Integer = 12
AllocStackSpace( 4 bytes )
Move 12 into allocated space (at address 0x1234)

// SomeAwesomeMethod( someStringProperty, numberTwelve )
Push someStringProperty's addres (let's say 0x5678)
Push 0x1234 (the stack address where the numberTwelve variable lives)
Call SomeAwesomeMethod

// Sub SomeAwesomeMethod( ByRef foo as String, ByRef bar as Integer )
Dim bar as Ptr = 0x1234 (reference to numberTwelve)
Dim foo as Ptr = 0x5678 (reference to someStringProperty)

// Do the work


In this example, when someone assigns something to foo or bar, what they're really doing is assigning to bar.Integer( 0 ) or foo.String( 0 ) (because foo and bar "point" to where the values really live). The same goes for retrieving the value itself. Let's take a more concrete example where the caller passes in a reference, and the callee accesses it then modifies it. Finally, the caller will access the value.

dim foo as Integer = 55
DoSomething( foo )
MsgBox Str( foo )

Sub DoSomething( ByRef bar as Integer )
MsgBox Str( bar )
bar = bar + 10
End Sub


What will happen with this code is that we will see a message dialog with the value 55, then another one with 65. The reason why is because foo was passed by reference to DoSomething -- so it was able to modify the reference. Let's see how the compiler would do this:

// Dim foo as Integer = 55
AllocStackSpace( 4 bytes )
Move 55 into allocated space (at address 0x1234)

// DoSomething( foo)
Push 0x1234
Call DoSomething

// MsgBox Str( foo )
MsgBox Str( foo ) // How this works isn't important

// DoSomething( ByRef bar as Integer )
dim barTemp as Ptr = address where bar lives, 0x1234

// MsgBox Str( bar )
MsgBox Str( barTemp.Int32( 0 ) ) // How this works isn't important

// bar = bar + 10
barTemp.Int32( 0 ) = barTemp.Int32( 0 ) + 10

// End Sub


In the case of passing by reference, what the compiler really is doing is using Ptrs, and just dereferencing them for you. So that means that we're actually modifying the data that the parameter references -- which is why the caller can see the changes made within the DoSomething method. Had we been passing by value instead of by reference -- then the bar = bar + 10 line would have been modifying local, not the original data.

So how does this relate to things which are already references? In that case, the reference parameter (such as an object or an array) is passed by reference automatically, since it's already a reference in the first place!

An interesting thing to understand are the compiler errors (or lack thereof) for these modifiers. If you try to pass a constant as a ByRef, you'll get an error from the compiler. This is because the contract for the method cannot be satisfied. A constant has no storage location in memory, so the caller attempting to modify the argument has no reference to modify. Hence the reason you get a compile error -- it's impossible to do!

But in the case of passing a reference type ByVal, you don't get a compile error (or even a warning): why is that? Because the alternative sucks worse! REALbasic defaults to using the ByVal calling convention. If you had to be explicit, you'd have very verbose method parameter lists. But what's more, the concept makes no sense, but in such a way that there's a reasonable alternative. If you pass something which can only ever be as a reference, it's reasonable to assume that passing it as a reference is a sensible default behavior, right?

The final compile error seems to cause a few people some confusion -- why can you pass "SomeClassProperty", but not "Self.SomeClassProperty"? The explanation is: we want to keep things very, very simple. You cannot pass expressions as ByRef because there are some cases where it'd make sense and work, and other cases where it wouldn't work. So we make it very easy on you: no expressions! And you can consider anything with a dot in it as being an expression, even if it may not make sense. This makes using ByRef significantly more simple to use because you don't have to think "hmm, will this work? What would the compiler do?" Instead, just keep in mind: no dots, no constants!

So to make a long story longer, here are the take-home points for you to remember about ByRef and ByVal:


  • ByRef allows the method being called to modify the parameter such that the caller can see the modifications.

  • ByVal allows the method to modify the parameters in any way it wants without fear of the caller seeing the modifications. Except if the parameter is a reference type!!

  • It's worth repeating: objects and arrays are reference types, so there's no such thing as ByVal for them -- the method can modify these types, and the caller will notice the changes.

  • REALbasic defaults to ByVal passing, whereas VB6 and earlies defaulted to ByRef.

  • You cannot pass a constant or an expression ByRef. Only locals and simple properties can be passed ByRef.

Phew! That was a whole lot of prattling on about something that most people don't give a second thought to. Sorry about the meandering and incoherent explanations. If I've only muddied the waters for you, then please ask questions and I'll do my best to provide better answers for them. If I've cleared some things up for you, then it was a great use of not napping (which is what I'm going to go do now).

13 Comments

I admit I've been bit by the no ByRef "Self.SomeClassProperty" quirk in the past but it's easy enough to work around.

One thing you didn't hit on with objects is that when passing them ByRef the function can actually change the caller's object reference itself, whereas a normal ByVal object is safe from being re-assigned or re-New'ed.

@Frank -- Ah, yes! That's an excellent point. Since ByRef is dealing with the reference to the object, it's entirely possible to replace the reference, as well as alter its contents. But if you instead pass it ByVal, then the reference itself cannot be altered (though the contents of the object can be modified). That's a fine distinction, but still an excellent point.

I wanted to touch on Frank's comment of "bitten by no ByRef Self.SomeClassProperty" quickly. While this particular case might make sense, it's very easy to see why expressions can be extremely confusing. Some people think that you should be able to pass anything that can be an lvalue as a ByRef argument. However, this a) won't work, and b) is confusing because the cases where it won't work are hard to spot. Here's an excellent example: the following code works as an lvalue, but can't work as a ByRef param in the hypothetical case:

SomeMethod( ListBox1.Cell( row, col ) )

This won't work because Cell is a method, not a property. However, it can be an lvalue, because this is legal:

ListBox1.Cell( row, col ) = someValue

Given that computed properties and method getter/setter pairs can never work as ByRef params, it suddenly becomes hard to understand what can be passed and when. It's much easier just to say "no expressions" and be done with it, even though some expressions might make sense.

There are more rules not covered by "no expressions, no constants" when it comes to use with ByRef:
- shared properties do not work
- methods using the "Assigns" modifier

Without having looked very closely at shared properties, that feels like it might be a bug. They definitely have a storage location in the static data section... But "methods using assigns" is definitely covered as an expression -- all method calls are expressions.

So how does this relate to things which are already references? In that case, the reference parameter (such as an object or an array) is passed by reference automatically, since it's already a reference in the first place!

This is a common belief but it is not actually correct. The ByVal/ByRef parameter mode is completely independent of the value type / reference type distinction. It is confusing because we use the term "reference" in two different ways.

The value of a reference type is a pointer to a some object on the heap (and "object" here includes strings and arrays). It so happens that you can do things to the object through that pointer: but the variable's value is the pointer, not the object. The object's existence is independent of the variable; the variable's value is just the address of the object.

With ByVal, you pass a value; with ByRef, you pass a specific variable. The callee's parameter variable is bound to the caller's argument variable. This is why the argument must be a variable, and not an expression, and this is why the types must match exactly.

The variable's type is irrelevant here. The ByRef mechanism does not care whether the variable contains an object, an integer, a boolean, or anything else. Whatever the variable's type, when you pass a variable ByRef, the generated code simply pushes the address of the variable on the stack. Thus, when you pass an object byref, you are passing the address of a variable containing the address of the object data.

You're absolutely right Mars -- in my attempt to make the information accessible, I rather much missed on the technical front. Thanks for the well-phrased clarification (better than I could have done given my lack of sleep!).

Variants?

Are they handled like Objects?

@Paul -- Variants are objects under the hood, but they behave more like strings than objects when it comes to ByRef handling.

Whoops - I meant computed properties, not shared ones.

@Thomas -- ah, that makes much more sense. Computed properties, like methods, don't have backing storage -- so it won't make sense for them to be passed ByRef.

Any suggestions for string size and function frequency for when to pass ByRef instead of the default ByVal?

Are there any internal optimizations for this common scenario in RB?

@DeanG -- I wouldn't use ByRef as an optimization ever, except when passing structures (which isn't something most RB programmers should be using heavily anyways). When you make something ByRef, you break the encapsulation of it -- now it's possible for "invisible" things to modify variables entirely by accident, which causes hard to track down bugs.

Leave a comment

Disclaimer

I'm currently an employee of REAL Software. My blog is mine. The opinions represented in this blog are mine as well and may not represent my employer's opinions. All original material is copyrighted and property of the author.

REALbasic® is a registered trademark of REAL Software, Inc. REAL SQL Server™ and Lingua™ are pending trademarks of REAL Software, Inc. All rights reserved.