Many of the Content Gateway applications are so old that they use ISO-8859-1 (Latin-1) or ISO-8859-15 (Latin-9) instead of UTF-8 as their Character Set. I’ve been tuning the http_adapter to better serve these applications.
While testing recent service migrations using the http_adapter we once again have run into problems with the scandinavian characters ÅÄÖ åäö. Things had been working fine for services that did not:
- expect other than characters A-Z or digits 0-9 in their input
- return a direct response to the handset as part of the response to the http-request (most services so far had sent their response in a separate MT message)
What was actually happening was:
- requests to the backend service were using UTF-8 (even though the request headers claimed it was ISO-8859…)
- the Content-Type header in the http response was ignored and the response was treated as UTF-8
- when the response was sent back through the
/send
service the Character encoding was once again ignored
Fixing Character Set in http-requests to backend services
This TODO
note has been in the source code from the beginning:
private static final String[] MOTemplate = {
// CGW style HTTP GET request
"GET ${TARGET_URL} HTTP/1.1",
"Accept: */*" ,
"Character-set: iso8859-1", // TODO: should this be configurable
"User-Agent: CGW Provider Server 4.0 http_adapter",
"Host: ${TARGET_HOST}",
""
};
I finally made it configurable. To be useful, we need to somehow get the suitable value to fill in. For this purpose I added two new config entries to the OpaaliAPIHandler:
- defaultReplyCharset
- targetCharset
[opaaliqr:qr]
# defaultReplyUrl is used to specify how the MT message for a Query Reply is sent,
# this can be the "cgw" service from the top of this configuration file
# notice how the $(MSG) macro is escaped by doubling the $,
# here the macros are expanded in two passes, once when creating the queued request
# (for sender/recipient) and again when the message content is available
defaultReplyUrl=http://localhost:8081/send?to=$(M)&from=$(R)&msg=$$(MSG)
# defaultReplyCharset specifies the character encoding to be used
# for the defaultReplyUrl request (the default is ISO-8859-15)
defaultReplyCharset=ISO-8859-15
# targetCharset specifies the character encoding to be used
# for requests to (external) targets (the default is ISO-8859-15)
targetCharset=ISO-8859-15
The defaults (ISO-8859-15) are chosen so that they should match most legacy Content Gateway services or the /send service of the http_adapter itself. In other words: you normally don’t need to enter these lines into the config file.
Using the Content-Type from the http-response
The first time a service (that we tested) returned something to be sent back to the handset, it used ISO-8859-1 encoding and correctly reported it in the Content-Type response header. However, the http_adapter just ignored the Content-Type header and assumed it was UTF-8 (a bug, I guess…). This is now fixed.
Recap of the character set configuration entries
cgw (send) service character set
The cgw service which is typically at the URL path /send
has a configuration entry cgwCharset
for choosing the expected character set.
- for legacy services which send MT messages using ISO-8859-15 character set you may want to set it explicitly (although this is the default)
- if you have a new service that uses UTF-8 then set it explicitly using
cgwCharset=UTF-8
in the [send:cgw] configuration section.
Character set for Opaali Notifications
There is a setting opaaliCharset
for receive and qr services, which was intended for selecting the character set for notification messages coming from Opaali. This is actually not implemented – it is assumed that they are always using UTF-8.
Character set for requests going to target services
There is now a targetCharset
setting for receive and qr services to set the encoding used when making a request to a back end service.
Default character set for MT response messages
Another new setting is defaultReplyCharset
which is used to set the character set for requests made to the defaultReplyUrl
(which is the URL used to send a MT response to a query).
This is typically the cgw service at /send
so it should match the cgwCharset
setting.
This time I did build a new binary release, bumping up the version number to 0.4.0.
