[jacorb-developer] jacorb-developer Digest, Vol 131, Issue 7

Gotthard Witsch Gotthard.Witsch at ith-icoserve.com
Wed Feb 19 08:30:31 CET 2014


Hi Nick,

I did one adjustment for size calculation: In CDROutputStream I change line #902 to the following:
size = 4 + 4 * valueLength + 1; as in the worst case 4 bytes are needed for each char. As in line 931 the size is recalculated, I don't think that anything else needs to be adjusted.

I will do some further testing this week and provide the tests I used.

Sincerely
Gotthard Witsch



-----Ursprüngliche Nachricht-----
Von: Nick Cross [mailto:jacorb at goots.org]
Gesendet: Montag, 17. Februar 2014 16:43
An: Gotthard Witsch
Betreff: Re: AW: [jacorb-developer] jacorb-developer Digest, Vol 131, Issue 7



Thats great thanks. I'll take a look as soon as I can.

Rest of my reply is inline.

On 17/02/14 14:14, Gotthard Witsch wrote:
> Hi Nick,
>
> I found a solution for the conversion problem.
>
> Instead of converting every single character of a string, I convert
> the whole String using the method String.getBytes(Charset charset).
> Therefore I did the following changes:
>
> In org.jacorb.orb.CDROutputStream I removed the loop in line 917 and
> instead call a new created write_string method on the codeSet object.
>
> In the org.jacorb.orb.giop.CodeSet class I created the new method
> write_string in line 579: public void write_string(OutputBuffer
> buffer, String s, boolean write_bom, boolean write_length, int
> giop_minor)

Delegating to the JVM seems reasonable.

> It's default implementation just converts each string's char in a loop
> with the write_char method.

Does the size calculation in cdroutputstream also need to be adjusted?

> However in Utf8CodeSet I override the write_string method and do the
> following. I use the mentioned method String.getBytes(Charset
> charset) to get the String's byte representation for the provided
> charset. The charset is created with Charset.forName(this.getName()),
> where this.getName() returns UTF8. After I received the bytes I just
> write them to the buffer with the OutputBuffer's write_byte method.
> That's it. Now the JVM takes care of converting the string to the
> needed byte array. :D Tested it in my application and it's working
> fine. Some rtesting still needs to be done.  I also thought about
> putting the implementation to the general method. However the

Perhaps its worthwhile extending the current JacORB Codeset tests to cover this scenario?

> following reasons prevented me from taking this step: The call
> Charset.forName('UCS2') throws an exception, hence UCS2 is no valid
> charset identifier for the JVM. However in
> org.jacorb.orb.CDRInputStream in line 1252 a String is created with
> the codeSet.getName(). This call won't be working too if
> codesetEnabled is true and getName returns UCS2! I don't know how this
> implementation compares to the write_char methods on the aspect of
> performance. Maybe you can do some testing?

Hmmm. That might be broken. As you say, more tests are required ;-) While the CDROutputStreamTest covers writing it, the CDRInputStreamTest does not have a corresponding read section. Might be possible to use a
UTF16 instead?

Do you have the tests you are using?  I think a good start would be to enhance the current JacORB CodeSet(On|Off)Test(s) to cover a wider range of scenarios/encodings. The BugJac461Test also attempts to bounce some Japanese text via UTF8.

Regards

Nick





> I attached the changed source files as a zip and will also upload them
> to the bug report http://www.jacorb.org/bugzilla/show_bug.cgi?id=969.
> I am looking forward for some feedback.
>
> Sincerely Gotthard
>
> -----Ursprüngliche Nachricht----- Von: Nick Cross
> [mailto:jacorb at goots.org] Gesendet: Freitag, 14. Februar 2014 13:20
> An: Gotthard Witsch Betreff: Re: [jacorb-developer] jacorb-developer
> Digest, Vol 131, Issue 7
>
>
> Hi,
>
> I'd be happy to work with you on any code / doc / tests etc.
>
> I was thinking of normalizing the
> org.jacorb.orb.standardInterceptors.IORInterceptorInitializer
> configuration so that it is automatically implied by
> jacorb.codeset=on.
>
> There are some existing tests at
> test/regression/src/test/java/org/jacorb/test/orb/Codeset* - perhaps
> they need enhancing ;-)
>
> Regards
>
> Nick
>
>
> On 14/02/14 11:43, Gotthard Witsch wrote:
>> Hi Nick,
>>
>> thanks for your reply.
>>
>> I'm referring to JacORB 3.3 documentation
>> (http://www.jacorb.org/releases/3.3/ProgrammingGuide.pdf). It only
>> says, that enabling jacorb.codeset will do translation on
>> marshalling. We've expected that if client and server both expect
>> UTF-8 and if JVMs file.encoding is UTF-8 there won't be any need to
>> translate. However I saw that the implemenation only expects ASCII if
>> jacorb.codeset is set to off. So you're right we have to enable this
>> option. It would be very helpful if we could mention that in the
>> documentation at the config's description.
>>
>> Thanks for the initialization hint:
>> org.omg.PortableInterceptor.ORBInitializerClass.standard_init=org.jacorb.orb.standardInterceptors.IORInterceptorInitializer.
>> I will have a look at it.
>>
>> So the only problem remaining, is the wrong conversion of UTF-8 4byte
>> characters in CDROutputStream.write_string in line 917 to 920. In
>> detail every character of the string is converted seperatly with the
>> CodeSet.write_char method. This works correctly for up to
>> 3 byte characters. But the implementation of
>> org.jacorb.orb.giop.CodeSet.Utf8CodeSet.write_char cannot handle 4
>> byte characters. It handles every character bigger than 0x07FF as a
>> 3 byte character. However 3 byte characters end at 0xFFFF.
>> Everything above is a 4 byte character and has to be converted
>> separately. Hence 4 byte characters are representated with two chars
>> in Java. So converting only one char will result in wrong encoded
>> characters.
>>
>> There's an article about handling 4 byte characters in Java, that
>> mentions some utitily methods for conversion:
>> http://www.oracle.com/technetwork/articles/javase/supplementary-14265
>> 4
>>
>>
.html I am actually working on a patch that can handle this. I should
>> be able to provide some code with examples next week.
>>
>> Sincerely, Gotthard
>>
>> -----Ursprüngliche Nachricht----- Von:
>> jacorb-developer-bounces~gotthard.witsch=ith-icoserve.com at lists.splin
>> e
>>
>>
.inf.fu-berlin.de
>> [mailto:jacorb-developer-bounces~gotthard.witsch=ith-icoserve.com at lis
>> t
>>
>>
s.spline.inf.fu-berlin.de] Im Auftrag von
>> jacorb-developer-request at lists.spline.inf.fu-berlin.de Gesendet:
>> Freitag, 14. Februar 2014 12:03 An:
>> jacorb-developer at lists.spline.inf.fu-berlin.de Betreff:
>> jacorb-developer Digest, Vol 131, Issue 7
>>
>> Send jacorb-developer mailing list submissions to
>> jacorb-developer at lists.spline.inf.fu-berlin.de
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develop
>> e
>>
>>
r
>>
>> or, via email, send a message with subject or body 'help' to
>> jacorb-developer-request at lists.spline.inf.fu-berlin.de
>>
>> You can reach the person managing the list at
>> jacorb-developer-owner at lists.spline.inf.fu-berlin.de
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of jacorb-developer digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: Error in String to byte conversion in CDROutputSream and byte
>> to String conversion in CDRInputStream class (Nick Cross)
>>
>>
>> ---------------------------------------------------------------------
>> -
>>
>>
>>
Message: 1
>> Date: Thu, 13 Feb 2014 13:21:10 +0000 From: Nick Cross
>> <jacorb at goots.org> To: Discussions concerning CORBA development with
>> JacORB <jacorb-developer at lists.spline.inf.fu-berlin.de>
>> Subject: Re: [jacorb-developer] Error in String to byte conversion in
>> CDROutputSream and byte to String conversion in CDRInputStream class
>> Message-ID: <52FCC6C6.6080203 at goots.org> Content-Type:
>> text/plain; charset=ISO-8859-1; format=flowed
>>
>>
>> Hi,
>>
>> My reply is inline.
>>
>>
>> On 13/02/14 09:57, Gotthard Witsch wrote:
>>> Dear developers,
>>>
>>> As we are having problems with sending and receiving Strings with
>>> JacOrb 3.3 in our java application, I started to analyse your code.
>>> I got aware of  two classes that cause errors in my
>>> opinion: org.jacorb.orb.CDROutputStream and
>>> org.jacorb.orb.CDRInputStream. Let's start with CDROutpuStream:
>>> The conversion from String to byte array is done in write_string
>>> methode in org.jacorb.CDROutputStream. However I think there are
>>> several bugs in this method: If we set codesetEnabled to true with
>>> the property jacorb.codeset at initialization, we can transmit
>>> Strings with ASCII, 2 byte and 3 byte characters. The transmission
>>> of 4 byte characters is not possible because the implementation of
>>> org.jacorb.orb.giop.CodeSet.Utf8CodeSet.write_char, that does not
>>> handle 4 byte characters. We've already opened a bug in your
>>> bugtracker but did not receive any feedback -
>>> http://www.jacorb.org/bugzilla/show_bug.cgi?id=969 Refering to the
>>
>> I have not had time currently to look in detail at this. I would be
>> happy to look at any pull requests you could provide.
>>
>>> documentation we are not supposed to use jacorb.codeset as Java
>>
>> Where are you referring to please?
>>
>>> receives UTF-8 Strings and the CORBA backend expects UTF-8 encoded
>>> Strings. So in my opinion there's no need for conversion.
>>> However if
>>
>> I think if you're expecting to send outside of the ASCII codeset you
>> should enabled codesets.
>>
>>> we turn jacorb.codeset off we can only transmit ASCII Strings, every
>>> other character not being part of ASCII Standard is encoded wrong. I
>>> think the problem is produced at line number 925 in
>>> org.jacorb.orb.CDROutputStream: value.getBytes(0, valueLength,
>>> buffer, pos); The used method getBytes is deprecated, due to the
>>> fact that it does not properly convert characters into bytes. In the
>>> comment the line before it is mentioned that this methode is
>>> explicitly used for better speed. Can you tell me why this method is
>>> better than String.getBytes(String charsetName)?
>>
>> I think from memory, it is because if codeset translation was
>> disabled, there was no requirement to utilise String charset
>> conversion - and therefore a direct copy into the result byte array
>> was all that was required.
>>
>>> On the outer side there could be a logical bug in the class
>>> org.jacorb.orb.CDRInputStream: In the method read_string the
>>> received bytes are converted to a String. In line number 1247
>>> codesetEnable is checked. If set to true a String is created with
>>> the following statement: result = new String (buffer, start, size,
>>> codeSet.getName()); -> JVM decides on how to create the String! If
>>> codesetEnabled is set to false than a conversion is done with a loop
>>> in line number 1267:  buf[i] = (char)(0xff & buffer[start + i]);
>>> After the conversion the String is created with the resulting byte
>>> array.
>>
>> I believe the idea there is that if codesets are disabled we do not
>> want to do any translation. Hence the loop. Alternatively, if they
>> are enabled, the internal String marshalling is utilised.
>>
>>> So if I understood your code correct the if block should be
>>> something like this:
>>>
>>> if (codesetEnable) { // convert bytes with loop from line 1267 }
>>> else { // create String and let JVM decide how to do this, with code
>>> from line 1252 }
>>>
>>> Can you give me some feedback if I'm wrong or if that's really a
>>> bug.
>>
>>
>> If you could submit a test (or even better adapt some existing test
>> cases) to demonstrate any problems that would be great. I'd be happy
>> to look at any pull requests / patches as well
>>
>>
>>> At initializing JacOrb we set the following properties:
>>
>> Should you not also set
>> org.omg.PortableInterceptor.ORBInitializerClass.standard_init=org.jac
>> o
>>
>>
rb.orb.standardInterceptors.IORInterceptorInitializer
>>
>>
>> Regards
>>
>> Nick
>>
>>
>>
>>> "ORBInitRef.NameService" "corbaloc:" + iiop:url_to_nameservice
>>> "org.omg.CORBA.ORBClass", "org.jacorb.orb.ORB"#
>>> "org.omg.CORBA.ORBSingletonClass", "org.jacorb.orb.ORBSingleton"
>>> "jacorb.codeset", "on" "jacorb.security.support_ssl", "on"
>>> "jacorb.security.jsse.trustees_from_ks", "on"
>>> "jacorb.security.keystore", keyStorePath
>>> "jacorb.security.keystore_password", keystorepass
>>> "jacorb.security.ssl.client.supported_options", "60"
>>> "jacorb.security.ssl.client.required_options", "01"
>>> "jacorb.ssl.socket_factory",
>>> "org.jacorb.security.ssl.sun_jsse.SSLSocketFactory"
>>> "jacorb.ssl.server_socket_factory",
>>> "org.jacorb.security.ssl.sun_jsse.SSLServerSocketFactory"
>>> "jacorb.maxManagedBufSize", "23" "jacorb.reference_caching", "true"
>>> "jacorb.retries", "30" "jacorb.retry_interval", "2000"
>>>
>>> Sincerely,
>>>
>>> Gotthard Witsch Syngo Share Software Development
>>>
>>> E-Mail:
>>> gotthard.witsch at ith-icoserve.com<mailto:stefan.daxenbichler at ith-icos
>>> e
>>>
>>>
r
>>> ve.com>
>>>
>>>
>> _____________________________________
>>> ITH icoserve technology for healthcare GmbH a siemens company - H CX
>>> HS INT CES ITH Innrain 98, 6020 Innsbruck, ?sterreich -
>>> www.ith-icoserve.com<http://www.ith-icoserve.com/> Rechtsform:
>>> Gesellschaft mit beschr?nkter Haftung - Firmensitz: 6020 Innsbruck,
>>> Innrain 98 Firmenbuchnummer: FN 174117f -
>>> Firmenbuchgericht: Innsbruck - DVR: 0983039
>>>
>>> _______________________________________________ jacorb-developer
>>> maillist  -  jacorb-developer at lists.spline.inf.fu-berlin.de
>>> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develo
>>> p
>>>
>>>
e
>>> r
>>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________ jacorb-developer
>> maillist  - jacorb-developer at lists.spline.inf.fu-berlin.de
>> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develop
>> e
>>
>>
r
>>
>>
>> End of jacorb-developer Digest, Vol 131, Issue 7
>> ************************************************
>> _______________________________________________ jacorb-developer
>> maillist  - jacorb-developer at lists.spline.inf.fu-berlin.de
>> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develop
>> e
>>
>>
r
>>
>



More information about the jacorb-developer mailing list