This guide gives you information about message length and character
set handling of SMS messages. This is an important topic concerning costs.
Please read through it to make sure you are able to configure the system according
to your preferences.
Introduction
SMS is used to send text messages between mobile phones in most cases. When
a text is transmitted, there is a size limitation on the message length. If
English characters are used, the maximum message length is 160 characters.
When international characters are sent, the maximum length is 70 characters.
This size limit is determined by the character set used to transmit the message.
SMS segmentation and reassembely (SAR)
To increase the size limit of text messages, the SMS technology was improved to support longer
text messages. This improvement is called as the multipart SMS technology. This
technology referes to a so called segmentation and reassembly procedure. If
an english text message, that is longer then 160 characters is sent, it is
first segmented by the sending mobile and is transmitted through the GSM network
in several SMS messages. The recipient mobile phone, after receiving all message parts
reassembles the segments and displays the long text as a single message to the user.
Of course if internetional characters are used the segmentation starts when a message
text becomes longer then 70 characters.
When multipart technology is applied, the cost of each message can be calculated
by the number SMS messages used to transmit the text over the wireless network. For example
if a text message is 240 english characters it fits into two SMS, so the cost will
be twice as much as a single 160 character SMS.
One might expect that if a single SMS can hold 160 characters, a 320 character
message would take two physical SMSes. This is not the case. When multipart SMS technology
is used only 153 characters fit into a single SMS, because some space is needed
for the segmenation information, that can be used to reassemble the message parts
in correct order. So if a 320 character message is sent, it would take 3 SMS. The
first two would hold 153 characters, and the last one would hold 14 characters.
For international characters, 67 characters fit into a multipart SMS segment.
Terms and definitions, SAR technology in detail
To be able to give more exact information, the terms and definitions need
to be cleared. When I have mentioned english characters, I was refering to the
7 bit GSM SMS alphabet, that contains
english characters and a few international characters for Western Europe and Greece.
These characters are defined in the ETSI GSM 03.38 standard. When
I have mentioned international characters, I have refered to the unicode character set. The
unicode character set can be used to send special symbols and characters of
all languages including chinese, arabic, hebrew, cyrillic, special eastern european
characters, etc.
In GSM SMS system, an SMS message can contain up to 140 bytes (standard 8-bit bytes)
of message data. The 7 bit SMS alphabet makes
it possible to send 160 characters in this 140 bytes. This means that, when you
send a text message, as long as the text only contains characters that
are included in the GSM 7-bit character set, 160 7-bit characters are compressed
into 140 8-bit bytes to produce the 160 character limit that we are so familiar with.
(Note: 160 * 7 = 140 * 8).
It is worth noting that ETSI GSM 03.38 also defines a few characters that are
represented by two 7-bit characters when included in a text message. A table in
the URL referenced above shows these characters, but since there are only a few,
I will also list them here: "^", "{", "}", "\", "[", "]", "~", "" and "'".
If you want to send a message that contains characters that are not part of the
GSM 7-bit character set, such as Chinese, Arabic, Thai, Cyrillic, etc., then the
entire text of the SMS that actually goes out over the air needs to be encoded
in the Unicode UCS-2 character set. In the UCS-2 character set, each character is
encoded with 16-bits (or two 8-bit bytes). This means that an SMS message is
limited to 70 16-bit Unicode characters (70 * 16 = 140 * 8).
If a message is larger than 140 8-bit bytes, then there are segmentation and
reassembly standards defined, where a single logical message can be sent over
the air using multiple physical SMS messages. The receiving client then has the
ability to reassemble the segmented message so that it again appears as a
single message on the receiving device.
When a long text message is segmented into multiple physical SMS messages, a special
header is added to each physical SMS message so that the receiving client knows
that it is a multipart SMS message that must be reassembled by the client. These
headers are known as segmentation or concatenation headers or SAR headers. The SAR headers are 6 bytes (8-bits each).
They are included in each physical SMS message. These headers are placed in the
User Data Header (UDH) field of the message, but they do count against the overall
size limit of the message.
If you send a long text message containing only characters that are part of the
GSM 03.38 character set, then each SMS segment can contain up to 153 characters.
(140 bytes - 6 bytes for the concatenation header leaves 134 available bytes,
or 7 * 134 = 1072 bits. The most 7-bit characters that can be packed into 1072 bits is 153.)
If you send a long text message that includes any characters that require Unicode encoding,
then each SMS segment can contain up to 67 characters. (67 * 16 = 1072 bits)
Character conversions and character sets
When you use Ozeki NG SMS Gateway, you will send SMS messages from your PC.
The character set in your PC is a Windows or Unix charset, and is not going
to be the GSM 7 bit or the GSM unicode character set. For example you might
use UTF 8, ISO-8859-1, ISO-8859-2. In all cases some kind of character conversion
needs to take place to transfer your PC characters to the appropriate SMS characters.
This conversion will determine the type of message (SMS with english characters or
SMS with unicode characters) you can send through the GSM network. If this conversion
is not handled carefully, you might run into extra costs.
Ozeki NG SMS Gateway
Ozeki NG SMS Gateway will perform the character set conversion for you according
to the policy you select, and will do the segmentation and reassembly of long
text messages accordingly. To choose a prefered conversion policy you find the
following options in the "Charsets" tab of the configuration form of the SMS
service provider connection (e.g.: In the GSM modem configuration form).
Best match:
Convert to preferred character set if lossless conversion is possible.
(Character substitutions are not allowed.)
Transform:
Convert to preferred character set if possible.
(Character substitutions are allowed.)
Enforce:
Always use the preferred charset.
(Character substitutions and character losses are allowed.)
These options along with the prefered character set setting allow you to
configure the character set conversion. For example if you select "GSM 7 bit"
as your prefered charset and select "Enforce" as the character set encoding
policy on the configuration form of yous service provider connection (e.g.: GSM
Modem configuration form) you can be sure, that only english (160 character) long message encoding
will be used (Figure 1). This will mean lower message costs, but it will also mean, that some international characters and
special symbols will not be displayed correctly on the recipient handset, because
the GSM 7 bit alphabet does not have a corresponding character for all symbols.