Data Types in OCaml

Data Types in OCaml

Let's learn data types in OCaml from basic to advanced

ยท

14 min read

Introduction

In the previous article we discussed about how OCaml is different than other languages, its expression oriented nature, its static type system & some other aspects.

In this article we will mainly discuss about the data-types in this language.

But before we directly head on to the main topic, let me share some insights about the tools we will be using.

Running & Compiling

๐Ÿ’ก
We won't be using any IDE(There's no Integrated Development Environment(just like Intellij IDEa for java) available for OCaml. The closest thing we can get similar to an IDE is VScode with the OCaml Platform extension enabled or Neovim with the merlin plugin) or text editor for now and some of the upcoming articles. We'll be entering our codes in an enhanced Read-Eval-Print-Loop(REPL) for OCaml which is called utop.

Utop(Universal Top-Level for OCaml) is similar to IPython3 for python with some different features.

Although you can manually install utop with the default package manager apt(assuming you are using ubuntu,. If you have windows computer, use WSL2 instead). I recommend you to follow each and every step described in the webpage(link below) to install opam, OCaml, and setup an opam switch (similar to virtual environments of python3).

Click here to see the steps. It may seem time consuming, but trust me it's worth it.

now once you have installed & set up ocaml, opam and utop, you must know how to run an existing ocaml file.

Run without compiling:

If you're not using any other library rather than the standard library, you can then only run without compiling the code. To do that -

ocaml filename.ml #place the appropriate path to the ocaml file in place of `filename.ml`

Compiling and running:

The compiler we use to compile and run ocaml programs is ocamlc. The compiler first compiles the source file into byte code then we manually run the binary byte file.

ocamlc -o filename.byte filename.ml
./filename.byte

If you observe enough you'll see, two more files are also generated, filename.cmo and filename.cmi. These are not used in running code. These are for different purposes. We don't need them now. So clean them using -

rm filename.cmi filename.cmo

Heading Over to the Main Topic

Oh Yeah GIFs | Tenor

NOTE: We can write multiline comments in OCaml inside (* *) - this syntax.

e.g. - (* This is a comment *)

Enter utop in the terminal and get ready to write code.

You should see a similar looking interface after running utop(img above).

๐Ÿ’ก
Remember : In utop, expressions end with a ;; (double semicolon). Whenever you'll evaluate an expression in utop, it will show the resulting value and type in the next one or few lines.

Primitive Expression Types

The primitive types are unit, int, char, float, bool, and string.

Unit: Singleton Type

The unit type is the simplest type in OCaml. It contains one element: ( ). Seems stupid, right? Actually not!

In an expression oriented language, every expression must return a value. Then what about those expressions which perform side effects?

( ) is used as the value of a procedure that makes any side-effect. It is similar to the void data type in C.

print_endline "Let's Learn OCaml";;
(*  This expression prints the specified string to the screen.
    Printing something to screen is seen as a side-effect.
    So, this expression will return a unit. *)

Int: Integers

This is the type of signed Integers. All positive integers(1,2,3,4, ...), all negative integers(... ,-4,-3,-2,-1) and 0 are recognised as integers.

OCaml integers range from -

$$-2^{62}\ \ to\ \ 2^{62} - 1$$

on modern computer systems.

let num = 5;; (* integer expression *)
val num : int = 5 (* utop output *)
๐Ÿ’ก
Apart from decimals, the OCaml compiler can also recognise int literals specified in octal, binary or hexadecimal form.
  • int described in binary - starts with 0b

  • int described in octal - starts with 0o

  • int described in hexadecimal - starts with 0x

Float: Floating-Point Numbers

The syntax of a floating point requires a decimal point, an exponent (base 10) denoted by an โ€˜Eโ€™ or โ€˜eโ€™. A digit is required before the decimal point, but not after. Let's look at some examples -

31.415926E-1;; (* float value *)
- : float = 3.1415926 (* utop output *)

let number = 2e7;; (* float expression *)
val number : float = 20000000. (* utop output *)

(* float expression with unnecessary type annotation*)
let floating:float = 0.01;;
val floating : float = 0.01 (* utop output *)

Char: Characters

The expression type char belongs to the ASCII character set. The syntax for a character constant uses the single quote symbol. e.g. - 'a', 'x', 'F', ' ' etc.

But there's more to know! Escape Sequences though commonly associated with strings, they're also expressed as char.

Must Know Escape Sequences:

SequencesDefinition
'\\'The backslash character
'\''The single-quote character
'\t'The tab character
'\r'The carriage-return character
'\n'The newline character
'\ddd'The decimal escape sequence

A decimal escape sequence must have exactly three decimal characters. It specifies the ASCII character with the given decimal code.

Let's see some examples -

let ch = 'x';; (* char expression *)
val ch : char = 'x' (* utop output *)

'\123';; (* decimal escape sequence value *)
- : char = '{' (* utop output *)

'\121';; (* decimal escape sequence value *)
- : char = 'y' (* utop output *)

String: Character Strings

In OCaml, strings are a primitive type represented by character sequences delimited by double quotes. Unlike C, OCaml strings are not arrays of characters and do not employ the null-character '\000' for termination. Strings in OCaml support escape sequences for specifying special characters, akin to those used for individual characters.

let str = "Hello\n World!";; (* string expression *)
val str : string = "Hello\n World!" (* utop output *)

(* The Absolute Nightmare way to write an helloworld program *)
let greet = "\072\101\108\108\111\044 \087\111\114\108\100\033";;
val greet : string = "Hello, World!"

Bool: Boolean Values

The bool type includes true and false, and logical negation is done via the not function. Comparison operations (=, ==, !=, <>, <, <=, >=, >) return true if the relation holds; == is used for checking physical equality, while = implies structural equality.

Boolean ExpressionWhat does it signify
x = yx is equal to y
x <> yx is not equal to y
x == yx is "identical" to y
x != yx is not "identical" to y
x > yx is strictly greater than y
x >= yx is greater than or equal to y
x < yx is strictly less than y
x <= yx is less than or equal to y

If you're someone experienced in python, java or C++, you have to practice using = in conditions, instead of ==.

5.1 = 5.1;; (* boolean expression checking structural equality *)
- : bool = true (* utop output *)

5.1 != 5.1;; (* boolean expression checking physical inequality *)
- : bool = true (* utop output *)

Type Conversion

OCaml provides some functions to convert some primitive types to another.

From _ to int :

โœ… use - int_of_string

int_of_string "145";;
- : int = 145

โœ… use - int_of_char

int_of_char 'o';;
- : int = 111

โœ… use - int_of_float

int_of_float 1.9999999;; (* returns the floor value of the float *)
- : int = 1

โœ… use - Char.code

Char.code 'd';; (* Char is a module which has a function named 
                  `code` to do this *)
- : int = 100

From _ to float :

โœ… use - float_of_int

float_of_int 52;;
- : float = 52.0

โœ… use - float_of_string

float_of_string "5";;
- : float = 5.

float_of_string "0.5";;
- : float = 0.5

From _ to char:

โœ… use - char_of_int

char_of_int 55;;
- : char = '7'

char_of_int 97;;
- : char = 'a'

char_of_int 67;;
- : char = 'C'

โœ… use - Char.chr

Char.chr 45;;
- : char = '-'

Char.chr 105;;
- : char = 'i'

From _ to string:

โœ… use - string_of_int

string_of_int 746;;
- : string = "746"

โœ… use - string_of_bool

string_of_bool true;;
- : string = "true"

โœ… use - string_of_float

string_of_float 45.0;;
- : string = "45."

From _ to bool:

โœ… use - bool_of_string

let wrong = bool_of_string "false";;
val wrong : bool = false
(* `bool_of_string` only works if the provided string is "false" or "true" *)
bool_of_string "";;
Exception: Invalid_argument "bool_of_string". (* throwing an exception/error *)

Custom Types

We can define custom data types using a type definition with the type keyword. These are also called variants.

Example -

(* Defining a type representing different days of the week *)
type day =
  | Monday
  | Tuesday
  | Wednesday
  | Thursday
  | Friday
  | Saturday
  | Sunday
;;
(* `|` is a symbol in OCaml that seperates different patterns or cases.
   It is mainly used in type definitions and pattern matching code.*)

(* utop output *)
type day = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday

Composite Data Types

Lists

Lists are homogeneous collections represented by square brackets. They are immutable and support powerful pattern matching operations, making them essential in functional programming.

(* Defining a list of integers *)
let numbers = [1; 2; 3; 4; 5];; 
val numbers : int list = [1; 2; 3; 4; 5]

Arrays

Arrays in OCaml, denoted by the array keyword, are fixed-size collections of elements of the same data type. They are zero-indexed and can be accessed using square brackets.

let numbers = [|1; 2; 3|];; (* defining an array of integers *)
val numbers : int array = [|1; 2; 3|] (* utop output *)

Tuples

Tuples are ordered collections of elements of different types. They offer a convenient way to group heterogeneous data. A parenthetical space () separates the tuple's components from one another.

(* Defining a tuple *)
let credentials = ("Debajyati", 6);;
val credentials : string * int = ("Debajyati", 6) (* utop output *)
(* matching the pattern to access individual elements *)
let (name, roll) = credentials;;
val name : string = "Debajyati" (* utop output *)
val roll : int = 6
(* printing a string with those values *)
Printf.printf "Roll no. of %s is %d\n" name roll;;
Roll no. of Debajyati is 6  (* utop output *)
- : unit = ()

Records

Records are labeled collections of fields, akin to structs in other languages. They allow for structured data representation and manipulation.

(* Defining a record representing a person *)
type person = {
  name : string;
  age : int;
};;
type person = { name : string; age : int; } (* utop output *)
(* Creating a person record *)
let john = { name = "John"; age = 30 };;
val john : person = {name = "John"; age = 30} (* utop output *)
(* Accessing fields of the record *)
let () = Printf.printf "%s is %d years old\n" john.name john.age;;
John is 30 years old  (* utop output *)

Algebraic Data Types (ADTs)

Algebraic data types in OCaml are a way of defining composite types by combining simpler types using constructors, through variant types and record types, respectively.

Variant Types

Variant types enable the creation of sum types, where a value can be one of several possibilities. They are particularly useful for modeling complex data structures and handling multiple cases in pattern matching. We already saw an example of Variant Types in OCaml in the Custom types section of this blog. Let's see another example -

(* Defining a variant type representing shapes *)
type shape =
  | Circle of float
  | Rectangle of float * float;;
type shape = Circle of float | Rectangle of float * float   (* utop output *)

(* Creating instances of shapes *)
let circle = Circle 5.0;;
val circle : shape = Circle 5.  (* utop output *)

let rectangle = Rectangle (3.0, 4.0);;
val rectangle : shape = Rectangle (3., 4.)  (* utop output *)

Recursive Types

Recursive variant types allow for the definition of recursive data structures, such as linked lists and binary trees. One basic example using linked lists -

(* Defining a recursive list type *)
type 'a mylist =
  | Empty
  | Cons of 'a * 'a mylist;;
type 'a mylist = Empty | Cons of 'a * 'a mylist (* utop output *)

(* Creating a list of integers *)
let rec int_list = Cons (1, Cons (2, Cons (3, Empty)));;
val int_list : int mylist = Cons (1, Cons (2, Cons (3, Empty))) (*utop output*)
๐Ÿ’ก
NOTE: In OCaml, 'a represents a type variable, indicating that the type of elements in the tree can be any type. It's a placeholder for a concrete type that will be specified when the tree is instantiated.
๐Ÿ’ก
NOTE: In OCaml, Empty & Cons are constructors. Empty represents the end of a list, indicating that there are no more elements left. Cons represents adding an element to the front of a list, combining the new element with the rest of the list.

Option Types

The option data type, denoted by the 'option' keyword, is used to represent values that may or may not be present. It is particularly useful for handling null or undefined values.

Example:

let maybe_number: int option = Some 42;;
val maybe_number : int option = Some 42 (* utop output *)

Module Types

Wait, there exists such thing like a Module Type?? Wow! ๐Ÿซก

Explore mind blown GIFs

In OCaml, modules provide a way to encapsulate related code, data, and types. They serve as containers for organizing and structuring code, much like namespaces in other languages. Module types, then, define the interface or signature of a module, specifying the types and functions that must be implemented by any module that conforms to it.

What is the Use Case? Why Even Use Module Types?

Module types play a crucial role in enforcing abstraction and modularity in OCaml programs. By defining interfaces through module types, developers can separate the concerns of implementation details from the external interface.

Defining Module Types

To define a module type, we use the module type keyword followed by a name and a set of specifications. These specifications include the types and functions that the module must provide. For instance, consider a module type defining the interface for a stack data structure:

module type StackType = sig
  type 'a t
  val empty : 'a t
  val push : 'a -> 'a t -> 'a t
  val pop : 'a t -> 'a option * 'a t
end
;;

(* utop output - same actually ๐Ÿฅฒ *)
module type StackType =
  sig
    type 'a t
    val empty : 'a t
    val push : 'a -> 'a t -> 'a t
    val pop : 'a t -> 'a option * 'a t
  end

Here, StackType is a module type specifying that any module implementing it must define a type 'a t representing a stack, as well as functions empty, push, and pop for stack manipulation.

Implementing the Module Type We Created Now

Once a module type is defined, we can create modules that adhere to it by providing concrete implementations for its specifications. For example, we can implement the Stack module type as follows:

module Stack : StackType = struct
  type 'a t = 'a list (* Instantiating the variant type *)
  let empty = []
  let push x s = x :: s
  (* pattern matching expression used to define an expression *)  
  let pop = function 
    | [] -> (None, [])
    | x :: xs -> (Some x, xs)  (* `::` is the Cons operator *)
end
;;

module Stack : StackType (* utop output *)

So, finally we are done. Now you know all the most important and noteworthy data types in OCaml.

Conclusion

Mastering data types in OCaml is essential for writing maintainable & efficient code. From primitive types to algebraic data types and module types, OCaml has many tools for data manipulation & abstraction.

In future blog posts, we will explore advanced topics such as recursion, higher-order functions, and OCaml's module system. Stay tuned!

Until then please follow me on twitter :) & share this article with your friends!

Most importantly, - Happy Coding! ๐Ÿง‘๐Ÿปโ€๐Ÿ’ป ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป

Did you find this article valuable?

Support Debajyati Dey by becoming a sponsor. Any amount is appreciated!

ย