uniconv(1)


NAME

   uniconv - convert text to native formats through unicode

SYNOPSIS

   uniconv  -out  output-file [ -decode input-encoding ] [ -encode output-
   encoding ] [ input-file ] [ -todos ] [ -fromdos ] [ -tomac ] [ -frommac
   ]

DESCRIPTION

   uniconv  program  decodes  scripts with a certain encoding encodes them
   with some other encoding.  The scipt is a 16,8 or  7  bit-byte  stream.
   The  converted  text  will be sent to the standard output, even in case
   of 16-bit encodings,unless the output file is  specified  by  the  -out
   option.

   The  -decode and -encode options are optional, the default converter is
   utf-8.  The program reads the Unicode map helper files (*.my) from  the
   default  directory  /usr/share/data.   Simple  1-to-1  encodings can be
   added on the fly by adding a a my-file, or setting your  yudit.datapath
   property            in           ~/.yudit/yudit.properties           or
   /usr/share/yudit/config/yudit.properties.           By          default
   /usr/share/yudit/data is searched.

   My-files  can be created by a program called The files can be converted
   between dos/unix/mac  line-ending  variants  with  -fromdos,  -frommac,
   -todos,  -tomac  options.  the  default  (not  scpecified one) is Unix.
   makeumap.

ENCODING

   If you received this program through the Yudit distribution, then as of
   today you can convert between the encodings below.

   utf-8  Yudit  recommends  this  format  for  international  information
          exchange.  ASCII text  will  get through   intact,  while  other
          unicode characters will get their 8th bit set and the length  of
          the  code  will depend on how far away they are in  the  Unicode
          space.   This  is the only transformation format that can encode
          both 16-bit (ucs-2) and 31-bit (ucs-4) unicode.

   utf-8-s
          Hackers utf-8 format - it does not give an error message when  a
          surrogate pair is decoded and it can encode a surrogate pair 'as
          is'.  This is not a recommended encoding  format  although  this
          format  is  used  to  encode/decode  clipboard data, in order to
          preserve input.

   utf-16 Although 16 is bigger than 8 this is still a compromise required
          by  OSes  like Windows that can not handle ucs-4 - this encoding
          produces 16-bit unicode streams.  In  addition  to  BMP  it  can
          convert  16  planes  using  the  Unicode  Surrogate  Area.  This
          encoding can not convert anything  above  U+10FFFF  (Plane  16).
          The  input  byte order is recognized by the first two characters
          BEM (byte-order-mark) U+FEFF. This format is used in Windows  NT
          for documents like notepad .txt files.

   utf-16-be
          Big endian utf-16 converter.

   utf-16-le
          Littlen endian utf-16 converter.

   utf-7  This  is  the  recommended  format for international information
          exchange, when 7-bit can only be used. It can only handle 16-bit
          (utf-16)  unicode,  for  ucs-4  (above  U+10FFFF) you should use
          utf-8 encoding.

   iso-8859-1
          This is the ISO 8859-1 character  encoding format.  It  is  also
          known as "Latin-1" encoding.

   iso-8859-2
          This   is   the ISO 8859-2 character encoding format. It is also
          known as "Central European" encoding.

   iso-8859-5
          This is the ISO 8859-5 character encoding  format.  It  is  also
          known as "Cyrillic" encoding.

   iso-8859-7
          This  is  the  ISO  8859-7 character encoding format. It is also
          known as "Greek" encoding.

   iso-8859-9
          This is the ISO 8859-9 character encoding  format.  It  is  also
          known as "Turkish" encoding.

   koi8-r This  is the KOI8-R character encoding format. It is mainly used
          in Russia.

   cp-1251
          This is the CP1251 cyrillic character  encoding  format.  It  is
          mainly used in Microsoft Windows and some web sites.

   iso-2022-jp
          This  is  a  Japanese  character  encoding format. It is a 7-bit
          encoding format.

   iso-2022-jp-3
          This is a Japanese character encoding  format.  It  is  a  7-bit
          encoding format. It is base upon  JIS X 0213 standard.

   euc-jp This  is  a  Japanese  character encoding format. It is an 8-bit
          encoding format.  Mainly used in UNIX systems.

   euc-jp-3
          The official name is EUC-JISX0213 - I just could not read  this.
          This  is  a  Japanese  character  encoding format. It is a 8-bit
          encoding format. It is base upon  JIS X 0213 standard.

   shift-jis
          This is a Japanese character encoding format.  It  is  an  8-bit
          encoding format. Mainly used in MSDOS/Windows.

   shift-jis-3
          The  official  name  is  Shift_JISX0213  - I just could not read
          this.  This is a Japanese character encoding format.  It  is  an
          8-bit encoding format. Mainly used in MSDOS/Windows.

   iso-2022-jp
          This  is  a  Japanese  7-bit  character  encoding  format.   The
          iso-2022-jp email messages can be decoded/encoded  are  in  this
          format.

   iso-2022-x11
          This  is a Japanese character encoding format.  It is also known
          as "COMPOUND_TEXT" encoding for the X  Window System. This is  a
          7-bit  encoding  format.  It can be derived from the ISO 2022-JP
          format with some differences.

   ksc-5601-x11
          This is a  Korean  character  encoding  format  used  by  the  X
          window  system(COMPOUND_TEXT  encoding)  to  encode  Korean(KS X
          1001) and US-ASCII. This is a 7bit encoding format compliant  to
          ISO-2022  specification for encoding of multiple character sets.
          Please, note that this is DIFFERENT from ISO-2022-KR (defined in
          IETF RFC 1557).

   euc-kr This   is   an  8bit  multibyte encoding for Korean.  It encodes
          US-ASCII(7bit) in single byte  range  and  characters  in  KS  X
          1001(formerly KS C 5601) in double byte range with MSB on(8bit).
          It's used in Unix and Internet. Korean  version of MS-DOS, MacOS
          and MS-Windows use compatible (most cases, identical) variant of
          this encoding.

   johab  This  is  a  Korean  encoding  specified  in  KS   X  1001(KS  C
          5601-1992),    Annex   3   as  a supplementary encoding.  Widely
          used in Korean MS-DOS until mid-1990's.   It  can   encode   all
          Hangul  syllables(11,172)  of  modern  Korean as well as all the
          special symbols and Hanja  (Chinese  ideograms  used  in  Korea)
          defined in KS X 1001.

   uhc    A   variant    of    EUC-KR    used    in    Korean   MS-Windows
          95/98(proprietary encoding of  Microsoft,CP949).  Its  character
          repertoire  includes  all  modern   syllables   of Hangul,Korean
          script as well as all the special  symbols  and  Hanja  (Chinese
          ideograms used in Korea) defined in KS X 1001.

   gb-18030
          This is a Chinese character encoding format based upon GB 18030.
          It  encodes  the  whole  U+0000..U+10FFFF  range,  while   being
          compatible with gb-2312.

   gb-2312-x11
          This  is a Chinese character encoding format based upon GB 2312.
          It is a 7-bit encoding format.

   gb-2312
          This is a Chinese character encoding format based upon GB  2312.
          It is an 8-bit encoding format.

   big-5  This  is  a  Chinese  character  encoding format based upon BIG5
          encoding.  It is an 8-bit encoding format.

   hz     This is a Chinese character encoding format based  upon  "Hanzi"
          encoding.  It is a 7-bit encoding format.

   viscii This is a Vietnamese character encoding format.

   ucs-2-be
          This  converts  16-bit unicode (ucs-2) streams. The format takes
          care of big-endian  variant.   Yudit  does  not  recommend  this
          format.

   ucs-2-le
          This  converts  16-bit unicode (ucs-2) streams. The format takes
          care of little-endian variant.  Yudit does  not  recommend  this
          format.

   ucs-2  This  converts  16-bit  unicode (ucs-2) streams.  The input byte
          order is recognized by the first two characters BEM (byte-order-
          mark) U+FEFF.  Yudit does not recommend this format.

   java   This  converts  \uxxxx  character  escapes.  When  encoding, all
          characters above U+0080 will  be  escaped  with  a  string  like
          '\u0080'.  When  decoding  the  same  format  is decoded but, in
          addition, utf-8 format is also recognized, so  it  can  also  be
          used   to   recover  data  accidentally  saved  with  the  wrong
          enconding. The U+10000..U+10FFFF area is converted to surrogates
          and vice versa.

   java-s This  converts  \uxxxx  character  escapes.  When  encoding, all
          characters above U+0080 will  be  escaped  with  a  string  like
          '\u0080'.  When  decoding  the  same  format  is decoded but, in
          addition, utf-8 format is also recognized, so  it  can  also  be
          used   to   recover  data  accidentally  saved  with  the  wrong
          enconding.  Surrogates  are   not   treated   specially   during
          conversion - this is why it is not a recommened conversion.

FILES

   ~/.yudit/yudit.properties or /usr/share/yudit/config/yudit.properties
          can  have  yudit.datapath  property. This is where the map files
          are kept.  By default /usr/share/yudit/data is searched.

SEE ALSO

    makeumap

AUTHOR

   This program  was written by gsinai@yudit.org (Gaspar Sinai), Tokyo,  2
   January, 2001.





Opportunity


Personal Opportunity - Free software gives you access to billions of dollars of software at no cost. Use this software for your business, personal use or to develop a profitable skill. Access to source code provides access to a level of capabilities/information that companies protect though copyrights. Open source is a core component of the Internet and it is available to you. Leverage the billions of dollars in resources and capabilities to build a career, establish a business or change the world. The potential is endless for those who understand the opportunity.

Business Opportunity - Goldman Sachs, IBM and countless large corporations are leveraging open source to reduce costs, develop products and increase their bottom lines. Learn what these companies know about open source and how open source can give you the advantage.





Free Software


Free Software provides computer programs and capabilities at no cost but more importantly, it provides the freedom to run, edit, contribute to, and share the software. The importance of free software is a matter of access, not price. Software at no cost is a benefit but ownership rights to the software and source code is far more significant.


Free Office Software - The Libre Office suite provides top desktop productivity tools for free. This includes, a word processor, spreadsheet, presentation engine, drawing and flowcharting, database and math applications. Libre Office is available for Linux or Windows.





Free Books


The Free Books Library is a collection of thousands of the most popular public domain books in an online readable format. The collection includes great classical literature and more recent works where the U.S. copyright has expired. These books are yours to read and use without restrictions.


Source Code - Want to change a program or know how it works? Open Source provides the source code for its programs so that anyone can use, modify or learn how to write those programs themselves. Visit the GNU source code repositories to download the source.





Education


Study at Harvard, Stanford or MIT - Open edX provides free online courses from Harvard, MIT, Columbia, UC Berkeley and other top Universities. Hundreds of courses for almost all major subjects and course levels. Open edx also offers some paid courses and selected certifications.


Linux Manual Pages - A man or manual page is a form of software documentation found on Linux/Unix operating systems. Topics covered include computer programs (including library and system calls), formal standards and conventions, and even abstract concepts.