[jacorb-developer] WG: jacorb-developer Digest, Vol 131, Issue 7

Gotthard Witsch Gotthard.Witsch at ith-icoserve.com
Mon Feb 17 16:20:00 CET 2014


Hi Nick,

I found a solution for the conversion problem.

Instead of converting every single character of a string, I convert the whole String using the method String.getBytes(Charset charset). Therefore I did the following changes:

In org.jacorb.orb.CDROutputStream I removed the loop in line 917 and instead call a new created write_string method on the codeSet object.

In the org.jacorb.orb.giop.CodeSet class I created the new method write_string in line 579:
    public void write_string(OutputBuffer buffer, String s, boolean write_bom, boolean write_length, int giop_minor)

It's default implementation just converts each string's char in a loop with the write_char method. 

However in Utf8CodeSet I override the write_string method and do the following.
I use the mentioned method String.getBytes(Charset charset) to get the String's byte representation for the provided charset. The charset is created with Charset.forName(this.getName()), where this.getName() returns UTF8. After I received the bytes I just write them to the buffer with the OutputBuffer's write_byte method.

That's it. Now the JVM takes care of converting the string to the needed byte array. :D Tested it in my application and it's working fine. Some rtesting still needs to be done.
	
I also thought about putting the implementation to the general method. However the following reasons prevented me from taking this step:
The call Charset.forName('UCS2') throws an exception, hence UCS2 is no valid charset identifier for the JVM. However in org.jacorb.orb.CDRInputStream in line 1252 a String is created with the codeSet.getName(). This call won't be working too if codesetEnabled is true and getName returns UCS2!
I don't know how this implementation compares to the write_char methods on the aspect of performance.
Maybe you can do some testing?

I attached the changed source files as a zip and will also upload them to the bug report http://www.jacorb.org/bugzilla/show_bug.cgi?id=969.
I am looking forward for some feedback.

Sincerely Gotthard

-----Ursprüngliche Nachricht-----
Von: Nick Cross [mailto:jacorb at goots.org]
Gesendet: Freitag, 14. Februar 2014 13:20
An: Gotthard Witsch
Betreff: Re: [jacorb-developer] jacorb-developer Digest, Vol 131, Issue 7


Hi,

I'd be happy to work with you on any code / doc / tests etc.

I was thinking of normalizing the
org.jacorb.orb.standardInterceptors.IORInterceptorInitializer
configuration so that it is automatically implied by jacorb.codeset=on.

There are some existing tests at
test/regression/src/test/java/org/jacorb/test/orb/Codeset* - perhaps they need enhancing ;-)

Regards

Nick


On 14/02/14 11:43, Gotthard Witsch wrote:
> Hi Nick,
>
> thanks for your reply.
>
> I'm referring to JacORB 3.3 documentation (http://www.jacorb.org/releases/3.3/ProgrammingGuide.pdf). It only says, that enabling jacorb.codeset will do translation on marshalling. We've expected that if client and server both expect UTF-8 and if JVMs file.encoding is UTF-8 there won't be any need to translate. However I saw that the implemenation only expects ASCII if jacorb.codeset is set to off. So you're right we have to enable this option.
> It would be very helpful if we could mention that in the documentation at the config's description.
>
> Thanks for the initialization hint: org.omg.PortableInterceptor.ORBInitializerClass.standard_init=org.jacorb.orb.standardInterceptors.IORInterceptorInitializer. I will have a look at it.
>
> So the only problem remaining, is the wrong conversion of UTF-8 4byte characters in CDROutputStream.write_string in line 917 to 920. In detail every character of the string is converted seperatly with the CodeSet.write_char method.
> This works correctly for up to 3 byte characters. But the implementation of org.jacorb.orb.giop.CodeSet.Utf8CodeSet.write_char cannot handle 4 byte characters. It handles every character bigger than 0x07FF as a 3 byte character. However 3 byte characters end at 0xFFFF. Everything above is a 4 byte character and has to be converted separately. Hence 4 byte characters are representated with two chars in Java. So converting only one char will result in wrong encoded characters.
>
> There's an article about handling 4 byte characters in Java, that mentions some utitily methods for conversion:
> http://www.oracle.com/technetwork/articles/javase/supplementary-142654
> .html I am actually working on a patch that can handle this. I should 
> be able to provide some code with examples next week.
>
> Sincerely, Gotthard
>
> -----Ursprüngliche Nachricht-----
> Von: 
> jacorb-developer-bounces~gotthard.witsch=ith-icoserve.com at lists.spline
> .inf.fu-berlin.de 
> [mailto:jacorb-developer-bounces~gotthard.witsch=ith-icoserve.com at list
> s.spline.inf.fu-berlin.de] Im Auftrag von 
> jacorb-developer-request at lists.spline.inf.fu-berlin.de
> Gesendet: Freitag, 14. Februar 2014 12:03
> An: jacorb-developer at lists.spline.inf.fu-berlin.de
> Betreff: jacorb-developer Digest, Vol 131, Issue 7
>
> Send jacorb-developer mailing list submissions to
> 	jacorb-developer at lists.spline.inf.fu-berlin.de
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	
> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develope
> r
>
> or, via email, send a message with subject or body 'help' to
> 	jacorb-developer-request at lists.spline.inf.fu-berlin.de
>
> You can reach the person managing the list at
> 	jacorb-developer-owner at lists.spline.inf.fu-berlin.de
>
> When replying, please edit your Subject line so it is more specific than "Re: Contents of jacorb-developer digest..."
>
>
> Today's Topics:
>
>     1. Re: Error in String to byte conversion in CDROutputSream and
>        byte to String conversion in CDRInputStream class (Nick Cross)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 13 Feb 2014 13:21:10 +0000
> From: Nick Cross <jacorb at goots.org>
> To: Discussions concerning CORBA development with JacORB
> 	<jacorb-developer at lists.spline.inf.fu-berlin.de>
> Subject: Re: [jacorb-developer] Error in String to byte conversion in
> 	CDROutputSream and byte to String conversion in CDRInputStream class
> Message-ID: <52FCC6C6.6080203 at goots.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> Hi,
>
> My reply is inline.
>
>
> On 13/02/14 09:57, Gotthard Witsch wrote:
>> Dear developers,
>>
>> As we are having problems with sending and receiving Strings with 
>> JacOrb 3.3 in our java application, I started to analyse your code. I 
>> got aware of  two classes that cause errors in my opinion:
>> org.jacorb.orb.CDROutputStream and org.jacorb.orb.CDRInputStream.
>> Let's start with CDROutpuStream: The conversion from String to byte 
>> array is done in write_string methode in org.jacorb.CDROutputStream.
>> However I think there are several bugs in this method: If we set 
>> codesetEnabled to true with the property jacorb.codeset at 
>> initialization, we can transmit Strings with ASCII, 2 byte and 3 byte 
>> characters. The transmission of 4 byte characters is not possible 
>> because the implementation of 
>> org.jacorb.orb.giop.CodeSet.Utf8CodeSet.write_char, that does not 
>> handle 4 byte characters. We've already opened a bug in your 
>> bugtracker but did not receive any feedback -
>> http://www.jacorb.org/bugzilla/show_bug.cgi?id=969 Refering to the
>
> I have not had time currently to look in detail at this. I would be happy to look at any pull requests you could provide.
>
>> documentation we are not supposed to use jacorb.codeset as Java
>
> Where are you referring to please?
>
>> receives UTF-8 Strings and the CORBA backend expects UTF-8 encoded 
>> Strings. So in my opinion there's no need for conversion. However if
>
> I think if you're expecting to send outside of the ASCII codeset you should enabled codesets.
>
>> we turn jacorb.codeset off we can only transmit ASCII Strings, every 
>> other character not being part of ASCII Standard is encoded wrong. I 
>> think the problem is produced at line number 925 in
>> org.jacorb.orb.CDROutputStream: value.getBytes(0, valueLength, 
>> buffer, pos); The used method getBytes is deprecated, due to the fact 
>> that it does not properly convert characters into bytes. In the 
>> comment the line before it is mentioned that this methode is 
>> explicitly used for better speed. Can you tell me why this method is 
>> better than String.getBytes(String charsetName)?
>
> I think from memory, it is because if codeset translation was disabled, there was no requirement to utilise String charset conversion - and therefore a direct copy into the result byte array was all that was required.
>
>> On the outer side there could be a logical bug in the class
>> org.jacorb.orb.CDRInputStream: In the method read_string the received 
>> bytes are converted to a String. In line number 1247 codesetEnable is 
>> checked. If set to true a String is created with the following
>> statement: result = new String (buffer, start, size, 
>> codeSet.getName()); -> JVM decides on how to create the String! If 
>> codesetEnabled is set to false than a conversion is done with a loop 
>> in line number 1267:  buf[i] = (char)(0xff & buffer[start + i]); 
>> After the conversion the String is created with the resulting byte array.
>
> I believe the idea there is that if codesets are disabled we do not want to do any translation. Hence the loop. Alternatively, if they are enabled, the internal String marshalling is utilised.
>
>> So if I understood your code correct the if block should be something 
>> like this:
>>
>> if (codesetEnable) { // convert bytes with loop from line 1267 } else 
>> { // create String and let JVM decide how to do this, with code from 
>> line 1252 }
>>
>> Can you give me some feedback if I'm wrong or if that's really a bug.
>
>
> If you could submit a test (or even better adapt some existing test
> cases) to demonstrate any problems that would be great. I'd be happy 
> to look at any pull requests / patches as well
>
>
>   > At initializing JacOrb we set the following properties:
>
> Should you not also set
> org.omg.PortableInterceptor.ORBInitializerClass.standard_init=org.jaco
> rb.orb.standardInterceptors.IORInterceptorInitializer
>
>
> Regards
>
> Nick
>
>
>
>> "ORBInitRef.NameService" "corbaloc:" + iiop:url_to_nameservice 
>> "org.omg.CORBA.ORBClass", "org.jacorb.orb.ORB"# 
>> "org.omg.CORBA.ORBSingletonClass", "org.jacorb.orb.ORBSingleton"
>> "jacorb.codeset", "on" "jacorb.security.support_ssl", "on"
>> "jacorb.security.jsse.trustees_from_ks", "on"
>> "jacorb.security.keystore", keyStorePath 
>> "jacorb.security.keystore_password", keystorepass 
>> "jacorb.security.ssl.client.supported_options", "60"
>> "jacorb.security.ssl.client.required_options", "01"
>> "jacorb.ssl.socket_factory",
>> "org.jacorb.security.ssl.sun_jsse.SSLSocketFactory"
>> "jacorb.ssl.server_socket_factory",
>> "org.jacorb.security.ssl.sun_jsse.SSLServerSocketFactory"
>> "jacorb.maxManagedBufSize", "23" "jacorb.reference_caching", "true"
>> "jacorb.retries", "30" "jacorb.retry_interval", "2000"
>>
>> Sincerely,
>>
>> Gotthard Witsch Syngo Share Software Development
>>
>> E-Mail:
>> gotthard.witsch at ith-icoserve.com<mailto:stefan.daxenbichler at ith-icose
>> r
>> ve.com>
>>
>>
> _____________________________________
>> ITH icoserve technology for healthcare GmbH a siemens company - H CX 
>> HS INT CES ITH Innrain 98, 6020 Innsbruck, ?sterreich - 
>> www.ith-icoserve.com<http://www.ith-icoserve.com/> Rechtsform:
>> Gesellschaft mit beschr?nkter Haftung - Firmensitz: 6020 Innsbruck, 
>> Innrain 98 Firmenbuchnummer: FN 174117f - Firmenbuchgericht:
>> Innsbruck - DVR: 0983039
>>
>> _______________________________________________ jacorb-developer 
>> maillist  -  jacorb-developer at lists.spline.inf.fu-berlin.de
>> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develop
>> e
>> r
>>
>
>
> ------------------------------
>
> _______________________________________________
> jacorb-developer maillist  -  
> jacorb-developer at lists.spline.inf.fu-berlin.de
> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develope
> r
>
>
> End of jacorb-developer Digest, Vol 131, Issue 7
> ************************************************
> _______________________________________________
> jacorb-developer maillist  -  
> jacorb-developer at lists.spline.inf.fu-berlin.de
> https://lists.spline.inf.fu-berlin.de/mailman/listinfo/jacorb-develope
> r
>



More information about the jacorb-developer mailing list