Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Unicode Support #6748

Closed
wants to merge 128 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
1241b38
ucs2 implementation, work in progress
frabbit Jan 6, 2016
251e189
add unit tests for ucs2, utf8 and utf16 and make them pass
frabbit Aug 8, 2016
0d323df
more work on utf16
frabbit Aug 8, 2016
8f7f0b3
[i18n] work on conversions, add more unit tests
frabbit Aug 9, 2016
febe263
add unused unit tests (2 folders for easier testing)
frabbit Aug 9, 2016
af88955
only run i18n tests for now
frabbit Aug 9, 2016
7913fe8
port convertUtf8ToUtf16 complete
frabbit Aug 10, 2016
530b5e3
get more unit tests working on js/python
frabbit Aug 17, 2016
83b2e01
add getCodeSize, was missing in last commit
frabbit Aug 17, 2016
f46c1eb
get compilation working for java/cs/cpp (still failing unit tests)
frabbit Aug 17, 2016
d79b528
make sure the returned bytesdata length matches the bytes length
frabbit Nov 8, 2016
29c0964
get conversions between utf16 and utf8 on most platforms working
frabbit Nov 8, 2016
00f474b
add temporary files and fixes to make testing and development of i18n…
frabbit Nov 8, 2016
c438968
make more tests pass, fix some flash bugs
frabbit Nov 8, 2016
399d620
get ucs2 tests working
frabbit Nov 9, 2016
9cd7c22
fix Ucs2
frabbit Nov 9, 2016
d4a00f3
get most unit tests working, some cleanup and refactoring
frabbit Nov 9, 2016
4ac9417
get all string unit tests on most platforms working, further cleanup
frabbit Nov 9, 2016
f2046ac
further work on unicode, move utf8 related functions to Helper class,…
frabbit Nov 16, 2016
6f376fe
prepare Utf8Tools for impl change
frabbit Nov 16, 2016
df9d7b1
prepare impl change for utf8, add bytereader
frabbit Nov 16, 2016
a009b92
change impl of utf8, o(1) access to length
frabbit Nov 16, 2016
47231fc
more cleanup
frabbit Nov 17, 2016
a4ec4d3
use Utf8Tools without alias
frabbit Nov 17, 2016
f31a1ea
compile fix + cleanup
frabbit Nov 17, 2016
60bd2dc
work on utf16 refactoring
frabbit Nov 17, 2016
1b4fda3
further work on utf16
frabbit Nov 17, 2016
d1f5141
change impl of Utf16 (make length o(1)), fix conversion tests
frabbit Nov 21, 2016
1972290
add more inline methods to Utf16, start cleanup Ucs2 impl
frabbit Nov 21, 2016
ad8fb89
get rid of EncodingTools, refactoring
frabbit Nov 21, 2016
300f0ac
let Ucs2 Reader use the underlying string on platforms with native uc…
frabbit Nov 21, 2016
ea359d0
small cleanup + utf16.compare optimizations
frabbit Nov 21, 2016
5efa1bd
faster utf8.compare
frabbit Nov 21, 2016
a52b112
remove old utf8.compare
frabbit Nov 21, 2016
d48dc68
fix more bugs, add more unit tests
frabbit Nov 22, 2016
5dacc05
add a lot more tests, fix a lot of bugs
frabbit Nov 23, 2016
2f0a57a
minor refactoring, use ByteAccess directly in ucs2.(toLowerCase|toUpp…
frabbit Nov 23, 2016
810a7ff
minor cleanup, make Ucs2Reader abstract
frabbit Nov 23, 2016
b7d9ae2
more tweaks, refactoring
frabbit Nov 23, 2016
fb03dda
add basics for Utf32
frabbit Nov 23, 2016
ee21ea6
add setInt32 to js/ByteAccess
frabbit Nov 23, 2016
bbed755
whitespace fix
frabbit Feb 5, 2017
5ceddfc
work on conversions from and to utf32
frabbit Feb 5, 2017
a406209
cleanup
frabbit Feb 5, 2017
c7cf471
cleanup + refactoring
frabbit Feb 5, 2017
f2a0af5
remove no_code_motion, add byteaccess compare and use it for all enco…
frabbit Feb 18, 2017
df3a948
remove some whitespace and unused commented out compare code
frabbit Feb 18, 2017
894eca2
small cleanup
frabbit Feb 18, 2017
f1f9e4c
make Uint8ArrayTools a private class in ByteAccess.js
frabbit Feb 18, 2017
d6b9d30
make BytesDataTools a private class in ByteAccess
frabbit Feb 18, 2017
609e0e2
change order of classes, cleanup whitespace and imports
frabbit Feb 18, 2017
1831a9f
move Ucs2Reader and Ucs2Tools to Ucs2
frabbit Feb 18, 2017
e50ac15
move Utf8Tools and Utf8Reader to Utf8
frabbit Feb 18, 2017
9789d88
move Utf16Reader and Utf16Tools to Utf16
frabbit Feb 18, 2017
bdfa134
move Utf32Reader and Utf32Tools to Utf32
frabbit Feb 18, 2017
81a8ac5
move NativeStringTools to Tools, rename Encoding to Convert
frabbit Feb 18, 2017
5d7f48c
fix cs bug (Convert is Encoding), small cleanup
frabbit Feb 18, 2017
db6ef9a
use strToImplIndex instead of bit shift dircely
frabbit Feb 18, 2017
1e114e5
add benchmarks for string functions
frabbit Feb 18, 2017
e66ad2e
make Utf32.split and Ucs2.split a lot faster, add some inlines
frabbit Feb 18, 2017
9bc72b4
add 2 more split tests (see String.unit) and implement the special 'd…
frabbit Feb 19, 2017
cda2858
make Tools classes private
frabbit Feb 19, 2017
3b00e11
cleanup Benchmark and store test results to make sure that the code i…
frabbit Feb 19, 2017
e85e1c0
update hl Bytes to match current hl structure
frabbit Feb 19, 2017
887e769
improve benchmark code, add fastCodeAt for all encodings (at least te…
frabbit Feb 20, 2017
d6a697d
use helper function for toLowerCase and toUpperCase to move iteration…
frabbit Feb 20, 2017
88d5b02
fix for php7
frabbit Feb 20, 2017
d2c0780
add benchmark for php7, use luajit for lua
frabbit Feb 20, 2017
857d007
use fastCodeAt in toBytes
frabbit Feb 20, 2017
c4c7377
add test for php7
frabbit Feb 20, 2017
c7671e0
fix flash bug
frabbit Feb 20, 2017
afe89df
use Vector as underlying type for UTF32
frabbit Feb 20, 2017
c927bb4
fix Check file
frabbit Feb 20, 2017
b342275
add benchmarks for allocation (nativestring -> encoding)
frabbit Feb 20, 2017
34dfa8a
improve allocation of Utf32, use Utf16 allocation for Ucs2 (bytes are…
frabbit Feb 20, 2017
4b15533
replace strict flag enum with bool, improve eval performance
frabbit May 25, 2017
4c52f32
improve Benchmarks
frabbit May 25, 2017
d0a4a43
further optimizations on unicode, still needs a lot of cleanup
frabbit May 26, 2017
be0d62d
use relative paths for haxe executable
frabbit May 26, 2017
3b03347
further work on unicode performance
frabbit May 30, 2017
189fc77
unify names of wrap/unwrap functions in ByteAccess
frabbit May 30, 2017
07d1db9
do more inlining in js
frabbit May 30, 2017
e104c62
cleanup, directly add bytesData to buffer without creating Bytes
frabbit May 30, 2017
b0d5548
[unicode] major cleanup, minor improvements + fixes
frabbit May 30, 2017
467a348
[unicode] more cleanup, adjust licence
frabbit May 30, 2017
25a4417
[unicode] clearer separation of charCodes (codePoints) and codeUnits,…
frabbit May 30, 2017
47c165a
use codePoint wording to differentiate between code units and code po…
frabbit Jun 10, 2017
2bb6e2e
minor adjustments for benchmarks, remove eval from Makefile (it's int…
frabbit Jun 10, 2017
a7992c7
add benchmarks for charAt
frabbit Jun 10, 2017
e8201f4
add optimizations for Utf8 and Utf16 when we know that we only have s…
frabbit Jun 10, 2017
9240c23
remove comment
frabbit Jun 10, 2017
bc5d55e
use native string for utf32 impl on python
frabbit Jun 12, 2017
d3633e5
improve conversions from python utf32 to other encodings
frabbit Jun 12, 2017
df160d3
move implementation of to... functions to the target itself, allows b…
frabbit Jun 12, 2017
e686f6d
add benchmarks for conversions
frabbit Jun 12, 2017
8434d38
add lua to test-all
frabbit Jun 12, 2017
072619d
cleanup tests, add tests + bugfix for convertions from/to nativestrings
frabbit Jun 12, 2017
845ab1b
add substring optimizations for utf8 and utf16 when we have a fixed c…
frabbit Jun 12, 2017
cdccceb
use faster StringBuffer to create Ucs2 from ByteAccess
frabbit Jun 14, 2017
0dd6b30
utf16 is now based on string on ucs2 native platforms like (js, java,…
frabbit Nov 1, 2017
5c46681
only check hl interp for now
frabbit Nov 1, 2017
324e3c2
add iterators for ucs2, utf16, utf8 and utf32
frabbit Nov 1, 2017
59383ae
use little endian instead of big endian for decoding
frabbit Nov 1, 2017
0f4dc4e
add more tests (conversion to/from bytes, to/from native string
frabbit Nov 1, 2017
08ca952
improve addBytesData, a little bit cleaner/more type safe
frabbit Nov 2, 2017
fcee5ca
minor cleanup
frabbit Nov 2, 2017
7d48eaa
improve iterator code, use iterators to implement eachCode
frabbit Nov 2, 2017
27a9d58
never read single bytes in utf16, ucs2 or utf32, always read blocks o…
frabbit Nov 2, 2017
9c76aed
use little endian for underlying byte-based encodings and also for co…
frabbit Nov 2, 2017
84a151b
add more tests for toBytes
frabbit Nov 2, 2017
d6db4fb
cleanup, split Utf32 into two files
frabbit Nov 2, 2017
13e8cc9
minor cleanup
frabbit Nov 2, 2017
b3fc1c9
try fix cs compile error
frabbit Nov 2, 2017
a39aacc
add i18n classes to host for cppia to fix unit tests (Unknown functio…
frabbit Nov 2, 2017
cd3881c
also add ByteAccessBuffer to hostclasses
frabbit Nov 2, 2017
ce9aaaf
uncomment big endian functions for now
frabbit Nov 2, 2017
0c43003
move i18n bench test to tests/benchs/i18n
frabbit Nov 20, 2017
fd0d0cd
uncomment all tests unrelated to i18n
frabbit Nov 20, 2017
2decaa1
Merge remote-tracking branch 'origin/development' into unicode_vector
frabbit Nov 20, 2017
6422ea1
support same filenames in different packages like Utf8.unit.hx and ha…
frabbit Nov 20, 2017
2c83595
only call eqAbstract if both types are the same
frabbit Nov 20, 2017
7460490
remove some unused functions
frabbit Nov 20, 2017
b7d3e72
Move BytesDataTools into it's own Module, solves cppia issue with the…
frabbit Nov 20, 2017
ef89692
hide functions for js, should fix dox generation
frabbit Nov 20, 2017
dd79bc7
include impl classes, hopefully fix cppia unit test compilation error
frabbit Nov 21, 2017
a672a3a
fix stupid bug and remove trace
frabbit Nov 21, 2017
3472d6f
Merge remote-tracking branch 'origin/development' into unicode_vector
frabbit Nov 22, 2017
669b829
Merge branch 'development' into unicode_vector
Simn Apr 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,6 @@ tests/unit/pypy3-*
tmp.tmp

dev-display.hxml

tests/benchs/i18n/bin
.DS_Store
tests/sourcemaps/bin
12 changes: 12 additions & 0 deletions std/cpp/cppia/HostClasses.hx
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,13 @@ class HostClasses
"haxe.Serializer",
"haxe.Unserializer",

"haxe.i18n.ByteAccess",
"haxe.i18n.ByteAccessBuffer",
"haxe.i18n.BytesDataTools",
"haxe.i18n.Ucs2",
"haxe.i18n.Utf8",
"haxe.i18n.Utf16",
"haxe.i18n.Utf32",

"haxe.ds.ArraySort",
"haxe.ds.GenericStack",
Expand Down Expand Up @@ -202,6 +209,11 @@ class HostClasses
externs.set("sys.ssl._Socket.SocketOutput",true);
externs.set("haxe.ds.TreeNode",true);
externs.set("haxe.xml.XmlParserException",true);
externs.set("haxe.i18n._Ucs2.Ucs2Impl",true);
externs.set("haxe.i18n._Utf8.Utf8Impl",true);
externs.set("haxe.i18n._Utf16.Utf16Impl",true);
externs.set("haxe.i18n._Utf32.Utf32Impl",true);

for(e in classes)
externs.set(e,true);

Expand Down
26 changes: 25 additions & 1 deletion std/haxe/ds/Vector.hx
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,30 @@ abstract Vector<T>(VectorData<T>) {
dest.toData().blit(destPos,src.toData(), srcPos,len);
#elseif eval
src.toData().blit(srcPos, dest.toData(), destPos, len);
#elseif python
if (src == dest) {
if (srcPos < destPos) {
var i = srcPos + len;
var j = destPos + len;
for (k in 0...len) {
i--;
j--;
python.internal.ArrayImpl.unsafeSet(src.toData(), j, python.internal.ArrayImpl.unsafeGet(src.toData(), i));
}
} else if (srcPos > destPos) {
var i = srcPos;
var j = destPos;
for (k in 0...len) {
python.internal.ArrayImpl.unsafeSet(src.toData(), j, python.internal.ArrayImpl.unsafeGet(src.toData(), i));
i++;
j++;
}
}
} else {
for (i in 0...len) {
python.internal.ArrayImpl.unsafeSet(dest.toData(), destPos + i, python.internal.ArrayImpl.unsafeGet(src.toData(), srcPos + i));
}
}
#else
if (src == dest) {
if (srcPos < destPos) {
Expand Down Expand Up @@ -242,7 +266,7 @@ abstract Vector<T>(VectorData<T>) {
#if as3 @:extern #end
static public inline function fromArrayCopy<T>(array:Array<T>):Vector<T> {
#if python
return cast array.copy();
return fromData(array.copy());
#elseif flash10
return fromData(flash.Vector.ofArray(array));
#elseif java
Expand Down
185 changes: 185 additions & 0 deletions std/haxe/i18n/ByteAccess.hx
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
/*
* Copyright (C)2005-2012 Haxe Foundation
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/

package haxe.i18n;

import haxe.io.Bytes;
import haxe.io.BytesData;
import haxe.io.Error;

#if cpp
using cpp.NativeArray;
#end

abstract ByteAccess(BytesData) {

public var length(get, never):Int;

inline function new (length:Int) {
this = BytesDataTools.alloc(length);
}

/* constructors */

public static inline function alloc (length:Int) {
return new ByteAccess(length);
}

public static inline function ofData (data:BytesData):ByteAccess {
return fromImpl(data);
}

public static inline function fromBytes (b:Bytes):ByteAccess {
return fromImpl(b.getData());
}

/* gets */

public inline function get( pos : Int ) : Int {
return BytesDataTools.get(this, pos);
}

public inline function fastGet (pos:Int) {
return BytesDataTools.fastGet(this, pos);
}

public inline function getInt32LE( pos : Int ) : Int {
return get(pos) | (get(pos+1) << 8) | (get(pos+2) << 16) | (get(pos+3) << 24);
}

public inline function getInt16LE( pos : Int ) : Int {
var lower = get(pos);
var upper = get(pos+1) << 8;
return upper | lower;
}

public inline function getString( pos : Int, len : Int ) : String {
return BytesDataTools.getString(this, pos, len);
}

/* sets */

public inline function set( pos : Int, v : Int ) : Void {
BytesDataTools.set(this, pos, v);
}

public inline function setInt16LE( pos : Int, v : Int ) : Void {
BytesDataTools.set(this, pos, v & 0xFF );
BytesDataTools.set(this, pos+1, (v >> 8) & 0xFF );
}

public inline function setInt32LE( pos : Int, v : Int ) : Void {
BytesDataTools.set(this, pos, v & 0xFF );
BytesDataTools.set(this, pos+1, (v >> 8) & 0xFF );
BytesDataTools.set(this, pos+2, (v >> 16) & 0xFF );
BytesDataTools.set(this, pos+3, (v >> 24) & 0xFF );
}

/* sets end */

public inline function sub(i:Int, size:Int):ByteAccess {
return fromImpl(BytesDataTools.sub(this, i, size));
}

public inline function copy ():ByteAccess {
return fromImpl(BytesDataTools.sub(this, 0, length));
}

public inline function blit (pos : Int, src : ByteAccess, srcpos : Int, len : Int):Void {
return BytesDataTools.blit(this, pos, src.impl(), srcpos, len);
}

public inline function append (other : ByteAccess):ByteAccess {
var ba = alloc(length + other.length);
ba.blit(0, fromImpl(this), 0, length);
ba.blit(length, other, 0, other.length);
return ba;
}

/* compare, equal */

public function equal (other:ByteAccess) {
if (this == other.impl()) return true;

var a = fromImpl(this);
var b = other;

if (a.length != b.length) return false;

for (i in 0...a.length) {
if (a.fastGet(i) != b.fastGet(i)) return false;
}
return true;
}

public function compare (other:ByteAccess) {
if (this == other.impl()) return 0;
var a = fromImpl(this);
var b = other;

var min = a.length < b.length ? a.length : b.length;

for (i in 0...min) {
var b1 = a.fastGet(i);
var b2 = b.fastGet(i);
if (b1 < b2) return -1;
if (b1 > b2) return 1;
}
if (a.length < b.length) return -1;
if (a.length > b.length) return 1;
return 0;
}

/* conversions */

public function toString ():String {
var a = fromImpl(this);
var res = [];
for (i in 0...a.length) {

res.push(a.fastGet(i));
}
return res.join(",");
}

public inline function toBytes ():Bytes {
return Bytes.ofData(this);
}

@:allow(haxe.i18n) inline function getData ():BytesData {
return impl();
}

/* internal helpers */

static inline function fromImpl (b:BytesData):ByteAccess {
return cast b;
}

inline function impl ():BytesData {
return this;
}

inline function get_length ():Int {
return BytesDataTools.getLength(this);
}
}
Loading