Troubleshooters.Com, Code Corner and Ruby Revival Present

The Ruby_Newbie Guide to Symbols
Copyright (C) 2005 by Steve Litt

Note: All materials in Ruby Revival are provided AS IS. By reading the materials in Ruby Revival you are agreeing to assume all risks involved in the use of the materials, and you are agreeing to absolve the authors, owners, and anyone else involved with Ruby Revival of any responsibility for the outcome of any use of these materials, even in the case of errors and/or omissions in the materials. If you do not agree to this, you must not read these materials.

To the 99.9% of you honest readers who take responsibility for your own actions, I'm truly sorry it is necessary to subject all readers to the above disclaimer.

CONTENTS

Introduction
What do symbols look like?
What do they resemble in other languages?
How are symbols implemented?
What are symbols?
What are symbols not?
What can symbols do for you?
What are the advantages and disadvantages of symbols?
Summary

Introduction

Overwhelmingly, Ruby conforms to Eric Raymond's Rule of Least Surprise. However, the concept of Ruby Symbols recently precipitated a rather lively 29 participant, 97 post (and counting) thread on the ruby-talk@ruby-lang.org mailing list, complete with disagreements and killfile pronouncements. Perhaps some documentation is called for :-)

I'm writing this documentation for a specific audience: People who want to use Ruby but are not Ruby veterans. Maybe they've used Ruby, maybe they haven't, but they're not Ruby veterans. For the understanding of this specific audience, this documentation is written with a minimum of Ruby specific content. Instead, this documentation relies on general programming concepts. In the end, this document will enable the Ruby Newbie to use symbols correctly, every time, so that their code runs and does what they intend it to do. That is the sole goal of this documentation.

Real Ruby veterans understand symbols intuitively, so they don't need this documentation. Indeed, a Ruby veteran might look at the documentation you're now reading and call it inaccurate, because the concepts introduced in this documentation do not match the actual Ruby implementation of symbols. This document does not claim to explain the Ruby implementation of symbols, but instead explains how an application programmer can think of them and use them to attain the desired results, as well as to read code containing symbols.

Symbols can be viewed on many levels:

What do symbols look like?
What do they resemble in other languages?
How are symbols implemented?
What are symbols?
What are symbols not?
What can symbols do for you?
What are the advantages of symbols?

Some of these levels are explained in this document, and several are not necessary to correct use of symbols. Read on...

What do symbols look like?

This is the one area where everyone agrees. Most symbols looks like a colon followed by a non-quoted string:

:myname

Another way to make a symbol is with a colon followed by a quoted string, which is how you make a symbol whose string representation contains spaces:

:'Steve was here and now is gone'

The preceding is also a symbol. Its string representation is:

"Steve was here and now is gone"

#!/usr/bin/env ruby
puts :'I love Ruby.'
puts :'I love Ruby.'.to_i

sssss[slitt@mydesk slitt]$ ./test.rb
I love Ruby.
10263
[slitt@mydesk slitt]$

When using quotes in a symbol, you can use either single or doublequotes, as long as the beginning and ending quotes are the same type. Single or double, the string and numeric representations are identical, and the object_id is the same.

Symbols are immutable. Their value remains constant during the entirety of the program. They never appear on the left side of an assignment. You'll never see this:

:myname = "steve"

If you were to try that, you'd get the following error message:

[slitt@mydesk slitt]$ ./test.rb
./test.rb:37: parse error, unexpected '=', expecting $
:myname = "steve"
         ^
[slitt@mydesk slitt]$

Symbols ARE used like this:

mystring = :steveT

Or this:

mystring = :steveT.to_s

Or this:

myint = :steveT.to_i

Or this:

attr_reader :steveT

Now you at least know what we're talking about. Naturally, you still have plenty of questions. Read on...

What do they resemble in other languages?

I'm not qualified to answer this question. In the long run, it doesn't matter. Trying to answer this question at the start of your Ruby career can muddle the issue.

How are symbols implemented?

The only really authoritative answer to this question is to read the C code from which Ruby (actually the ruby executable) is built. However, if you're new to Ruby, or a person who uses Ruby because he likes it but doesn't need to be a foremost Ruby authority, this is an answer you do not need at this time, and it's probably best for the time being to ignore all discussions of how symbols are implemented in Ruby.

What are symbols?

It's a string. No it's an object. No it's a name.

There are elements of truth in each of the preceding assertions, and yet in my opinion they are not valuable, partially because they depend on a deep knowledge of Ruby to understand their significance. I prefer to answer the question "what are symbols" in a language independent manner:

A Ruby symbol is a thing that has both a number (integer) representation and a string representation.

In its actual Ruby implementation, the symbol does not contain either a string or a number -- the string and number are kept somewhere else. That's not important for understanding how it works, however, so feel free to think of the symbol as containing the string and number if that's easier to visualize. In your code, you can derive the number representation with the :mysymbol.to_i syntax, and the string representation with the :mysymbol.to_s syntax. In most situations, a symbol yields the string representation even without the to_s conversion.

The string representation of the number is MUCH more important than the number part. As a matter of fact, the number part is seldom used.

Let's explore further using code:

#!/usr/bin/env ruby

puts :steve
puts :steve.to_s
puts :steve.to_i
puts :steve.class

The preceding code prints four lines. The first line prints the string representation because that's how the puts() method is set up. The second line is an explicit conversion to string. The third is an explicit conversion to integer. The fourth prints the type of the symbol. The preceding code results in the following output:

[slitt@mydesk slitt]$ ./test.rb 
steve
steve
10257
Symbol
[slitt@mydesk slitt]$

The first line shows the string representation of the symbol. Note that the string representation is identical to the string following the colon. The second line first converts the symbol to a String object using to_s, and then prints it. The output is the same, but the explicit conversion to String object offers some added capabilities you might (or might not) need. This is discussed later in this document.

The third line shows the integer representation of the symbol. It is a non-meaningful and pretty much non-useful number that cannot be changed. The fourth line shows that the symbol is an object of the Symbol class.

Now let's explore some code that proves that the symbol's value cannot be changed at runtime:

#!/usr/bin/env ruby
:steve = "Big Steve"

[slitt@mydesk slitt]$ ./test.rb
./test.rb:2: parse error, unexpected '=', expecting $
:steve = "Big Steve"
        ^
[slitt@mydesk slitt]$

Well, that failed miserably. Maybe if we explicitly change the string representation:

#!/usr/bin/env ruby
:steve.to_s = "Big Steve"

[slitt@mydesk slitt]$ ./test.rb
./test.rb:2: undefined method `to_s=' for :steve:Symbol (NoMethodError)
[slitt@mydesk slitt]$

No go on strongarming the string part. What about the integer?

[slitt@mydesk slitt]$ ./test.rb
./test.rb:2: undefined method `to_i=' for :steve:Symbol (NoMethodError)
[slitt@mydesk slitt]$

[slitt@mydesk slitt]$ ./test.rb
./test.rb:2: undefined method `to_i=' for :steve:Symbol (NoMethodError)
[slitt@mydesk slitt]$

Can't strongarm the integer. Of course, to_i and to_a were never meant to be set methods -- they're get methods (actually they're conversions, but you needn't consider that right now), but it's pretty obvious that a symbol cannot be changed at runtime. In computer science speak, it's immutable.

One last point. In a single program, every occurrence of an identically named symbol is actually the same object. This is not true of strings. Watch this:

[#!/usr/bin/env ruby

puts :myvalue.object_id
puts :myvalue.object_id
puts "myvalue".object_id
puts "myvalue".object_id

[slitt@mydesk slitt]$ ./test.rb
2625806
2625806
537872172
537872152
[slitt@mydesk slitt]$

As you can see, both times :myvalue was used, it had the same object ID. As you can see, the object IDs of two uses of "myvalue" produced two different object IDs. This is how symbols can save memory.

Based on what's been presented in this section, we can add to the language independent answer to the question "what is a symbol":

A Ruby symbol is a thing that has both a number (integer) representation and a string representation.
The string representation is much more important and used much more often.
The value of a Ruby symbol's string part is the name of the symbol, minus the leading colon.
A Ruby symbol cannot be changed at runtime.
Multiple uses of the same symbol have the same object ID and are the same object.

Now let's inject just a little bit of Ruby specific terminology. Almost everything in Ruby is an object, and symbols are no exception. They're objects.

What are symbols not?

A Symbol is Not a String

A Ruby symbol is not a string. Ruby string objects have methods such as capitalize, and center. Ruby symbols have no such methods:

#!/usr/bin/env ruby
mystring = :steve.capitalize
puts mystring

[slitt@mydesk slitt]$ ./test.rb
./test.rb:2: undefined method `capitalize' for :steve:Symbol (NoMethodError)
[slitt@mydesk slitt]$

As an aside, if you want to capitalize the string representation of a symbol, you can first convert it to a string:

#!/usr/bin/env ruby
mystring = :steve.to_s.capitalize
puts mystring

[slitt@mydesk slitt]$ ./test.rb
Steve
[slitt@mydesk slitt]$

A Symbol is not (Just) a Name

The following illustrates the the use of a symbol as a name:

attr_reader :length

You're naming both a get method (length()) and an instance variable (@length).

However, symbols can be used to hold any sort of immutable string. It could be used as a constant (but you'd probably use an identifier starting with a capital letter instead. The point is, symbols are not restricted to just names.

That being said, symbols are used as names quite often, so although equating a symbol to a name is not correct, saying symbols are often used to hold names is a reasonable assertion.

A Symbol is an Object, but So What?

No doubt about it, a symbol is an object, but so what. Almost everything in Ruby is an object, so saying a symbol is an object says nothing distinctive about symbols.

What can symbols do for you?

A symbol is a way to pass string information, always assuming that:

The string needn't be changed at runtime.
The string doesn't need methods of class String.

Because a symbol can be converted to a string with the .to_s method, you can create a string with the same value as the symbol's string representation, and then you can change that string at will and use all String methods.

A great many applications of symbols could be handled by strings. For instance, you can do either the customary:

attr_writer :length

Or you can do the avant-garde:

attr_writer "length"

Both preceding code statements create a setter method called length, which in turn creates an instance variable called @length. If this seems like magic to you, then keep in mind that the magic is done by attr_writer, not by the symbol. The symbol (or the string equivalent) just functions as a string to tell attr_writer what it should name the method it creates, and what that method should name the instance variable it creates.

To see, in a simplified manner, how attr_writer does its "magic", check out this program:

#!/usr/bin/env ruby

def make_me_a_setter(thename)
	eval <<-SETTERDONE
	def #{thename}(myarg)
		@#{thename} = myarg
	end
	SETTERDONE
end

class Example
	make_me_a_setter :symboll
	make_me_a_setter "stringg"

	def show_symboll
		puts @symboll
	end

	def show_stringg
		puts @stringg
	end
end

example = Example.new
example.symboll("ITS A SYMBOL")
example.stringg("ITS A STRING")
example.show_symboll
example.show_stringg

In the preceding, function make_me_a_setter is a greatly simplified version of attr_writer. It does not implement the equal sign, so to use the setter you must put the argument in parentheses instead of after an equal sign. It does not iterate through multiple arguments, so each make_me_a_setter can take only one argument, which is why we call it individually for both :symboll and "stringg".

With the setters made, the application programmer can access the setters as example.symboll("ITS A SYMBOL"). The following is the output of the program:

[slitt@mydesk slitt]$ ./test.rb
ITS A SYMBOL
ITS A STRING
[slitt@mydesk slitt]$

In most situations, you could use a string instead of a symbol. Perhaps using a string would decrease performance to some degree. If a literal string is used repeatedly, it will certainly consume more memory than its symbol counterpart. Perhaps using a string would be less readable, or less customary. But you can usually use a string in any situation you can use a symbol. With one exception...

If you need a "string" (term used loosely) that must not be changed, then you need a symbol, because a symbol's value cannot be changed at runtime.

What are the advantages and disadvantages of symbols?

Symbols generally have performance benefits. Each symbol is identified to the programmer by its name (for instance, :mysymbol), but the program can identify it by its numeric representation, which of course is a quicker lookup.

When two strings are compared, somewhere in some abstraction layer pointers must walk both strings, looking for a mismatch. When two Ruby symbols are compared, if their numeric representation is equal, then the symbols are equal. If you were to use :mysymbol twenty times in your program, every usage of :mysymbol would refer to exactly the same object with exactly the same numeric representation and exactly the same string representation. This can enhance performance.

Because every :mysymbol refers to exactly one object and yet "defines" (I use the term loosely) a literal string, it saves considerable memory over using a literal string every time, because each true literal string consumes memory, whereas once a symbol is defined, additional usages consume no additional memory. So if you use the same literal string tens or hundreds of times, substitute symbols. Hash keys are an excellent example.

The granddaddy of all advantages is also the granddaddy of advantages: symbols can't be changed at runtime. If you need something that absolutely, positively must remain constant, and yet you don't want to use an identifier beginning with a capital letter (a constant), then symbols are what you need.

The big advantage of symbols is that they are immutable (can't be changed at runtime), and sometimes that's exactly what you want.

Sometimes that's exactly what you don't want. Most usage of strings requires manipulation -- something you can't do (at least directly) with symbols.

Another disadvantage of symbols is they don't have the String class's rich set of instance methods. The String class's instance method make life easier. Much easier.

Summary

Ruby symbols generated a 97 post thread on the ruby-talk@ruby-lang.org mailing list. There were many disagreements, some of which got a little heated. Twenty people, most of them smarter than me, had conflicting views of how to explain symbols. So if anyone tells you he has the one true way^(TM) to explain symbols, he's probably wrong.

No matter what their understanding of symbols, Ruby veterans know how to use them to get the desired results. The problem is, Ruby newbies don't know how to use symbols to get the desired results, and yet they must listen to and try to learn from the often conflicting explanations from Ruby veterans. Ruby veterans often base their explanations on Ruby specific customs, riffs or even internal implementations, further distancing their explanations from the Ruby newbie.

This document is aimed specifically at the Ruby newbie. It uses very little Ruby specific implementation in its explanation of symbols. It does not argue fine points, but instead bestows the information the newbie REALLY needs to know in order to USE symbols to accomplish his coding goals, as well as to read code containing symbols.

The following statements are handy in using (or not using) symbols:

A Ruby symbol looks like a colon followed by characters. (:mysymbol)
A Ruby symbol is a thing that has both a number (integer) and a string.
The value of a Ruby symbol's string part is the name of the symbol, minus the leading colon.
A Ruby symbol cannot be changed at runtime.
Neither its string representation nor its integer representation can be changed at runtime.
Ruby symbols are useful in preventing modification.
Like most other things in Ruby, a symbol is an object.
When designing a program, you can usually use a string instead of a symbol.

Except when you must guarantee that the string isn't modified.

Symbol objects do not have the rich set of instance methods that String objects do.
After the first usage of :mysymbol all further useages of :mysymbol take no further memory -- they're all the same object.
Ruby symbols save memory over large numbers of identical literal strings.
Ruby symbols enhance runtime speed to at least some degree.

Troubleshooters.Com * Code Corner * Linux Library