these are the notes i wrote to myself as i was preparing to port a big and old app to utf-8. i do not claim they are correct but they worked for me. most of this is not original but derived and condensed from other web pages as noted below. the purpose of this list is as a cheat sheet or to-do list. feel free to leave comments but try to be polite and don’t yell at me if i got something wrong.

wordpress insists on displaying simple single quote and simple double quote characters in random open/close forms in the following. sorry. please ignore and imagine they were all just the simple vertical versions.

useful web sites

immediately after opening a mysql connection, either:

  • SET NAMES ‘utf8’;
  • or mysql_set_charset(‘utf8’, $connection_handle);

use <form accept-charset=”utf-8″> on every form

convert html, php, js, css and other text files

declare css files as utf-8: @charset “UTF-8”;

declare linked js files in html tag as utf-8

if using htmlspecialchars, use htmlspecialchars($s, ENT_COMPAT, ‘UTF-8’);

  • use ENT_COMPAT mode, e.g. so that if putting attribute values with ” into html tags from a script, it won’t screw up.

add to top of every script ?

  • $default_locale = setlocale(LC_ALL, ‘en_US.UTF-8’);
  • ini_set(‘default_charset’, ‘UTF-8’ );

and just before page output PHPLIBtemplates.inc.php:

  • header(‘Content-Type: text/html; charset=utf-8’);

in apache config

  • AddDefaultCharset utf-8

in php.ini

  • mbstring.func_overload=7
  • default_charset=UTF-8
  • mbstring.internal_encoding=UTF-8

mbstring.func_overload=7 covers ereg and some string functions as listed in mbstring functions and detailed below. many string functions are still not safe.

PCRE

  • all pregs need the utf8 u modifier: preg_match(‘/myregex/u’, $str)
  • avoid pcre i modifier
  • avoid \w \W \b \B

to find the byte count of a multi-byte string when you are using mbstring.func_overload 2 and UTF-8 strings:

  • mb_strlen($utf8_string, ‘latin1’);

to validate form input as utf8, http://devlog.info/2008/08/24/php-and-unicode-utf-8 says

  • (strlen($str) AND !preg_match(‘/^.{1}/us’, $str)) // true means bad utf-8

but http://www.phpwact.org/php/i18n/charsets says this cannot be trusted. so use mb_check_encoding() to get a true/false answer

to quietly sanitize utf8 input strings (http://blog.liip.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html):

  • $s = iconv(“UTF-8″,”UTF-8//IGNORE”,$s);

which quietly deals with bad utf-8 input. it’s safe to use the result but it doesn’t require adding code to send the form back to the users for re-entry.

test strings

$strs = array(
		'Iñtërnâtiônàlizætiøn',
		'החמאס: רוצים להשלים את עסקת שליט במהירות האפשרית',
		'ايران لا ترى تغييرا في الموقف الأمريكي',
		'独・米で死傷者を出した銃の乱射事件',
		'國會預算處公布驚人的赤字數據後',
		'이며 세계 경제 회복에 걸림돌이 되고 있다',
		'В дагестанском лесном массиве южнее села Какашура',
		'นายประสิทธิ์ รุ่งสะอาด ปลัดเทศบาล รักษาการแทนนายกเทศมนตรี ต.ท่าทองใหม่',
		'ભારતીય ટીમનો સુવર્ણ યુગ : કિવીઝમાં પણ કમાલ',
		'ཁམས་དཀར་མཛེས་ས་ཁུལ་དུ་རྒྱ་གཞུང་ལ་ཞི་བའི་ངོ་རྒོལ་',
		'Χιόνια, βροχές και θυελλώδεις άνεμοι συνθέτουν το',
		'Հայաստանում սկսվել է դատական համակարգի ձեւավորումը',
		'რუსეთი ასევე გეგმავს სამხედრო');

to be lazy, sanitize $_GET and $_POST input with

function clean_input(&$a) {
    if ( isset($a) && is_array($a) && !empty($a) )
        foreach ($a as $k => &$v)
            clean_input($v);
    elseif ( is_string($a) && !mb_check_encoding($a, 'UTF-8'))
        $a = iconv('UTF-8', 'UTF-8//IGNORE', $a);
	return true;
}

replacement for strtr()

function mystrtr($s, $p1, $p2=false) {
  if ( is_string($p1) && is_string($p2) 
        && mb_strlen($p1, 'UTF-8') == mb_strlen($p2, 'UTF-8') ) {
  $t = '';
  for ( $i=0; $i < mb_strlen($s, 'UTF-8'); $i++ )
    $t .= ($j = mb_strpos($p1, $c = substr($s, $i, 1), 0, 'UTF-8')) === false 
      ? $c 
      : mb_substr($p2, $j, 1, 'UTF-8');
    return $t;
  } elseif ( $p2 === false && is_array($p1) ) {
    return strtr($s, $p1);
  }
  trigger_error('mystrtr() called with bad parameters strlen(p1)=' . mb_strlen($p1, 'UTF-8') 
    . ' strlen(p2)=' . mb_strlen($p2, 'UTF-8'), E_USER_WARNING);
  return $s;
}

notes on specific functions learned from own tests, links noted above and in the table

addcslashes DO NOT USE
addslashes DO NOT USE
chop see rtrim
chr only use for ascii
chunk_split SUSPECT, probably works on byte strings
count_chars operates on byte strings, use only on ascii or 8859
crc32 see md5
crypt see md5
echo presumably mb-safe?
explode SAFE, but can use preg_split
fprintf DO NOT USE,
http://www.php.net/manual/en/function.sprintf.php#89020
fscanf DO NOT USE,
http://www.php.net/manual/en/function.sprintf.php#89020
html_entity_decode DO NOT USE, see htmlspecialchars
htmlentities DO NOT USE, see htmlspecialchars
htmlspecialchars OK but use htmlspecialchars($s, ENT_COMPAT, ‘UTF-8’)
implode probably OK?
join same as implode
lcfirst DO NOT USE, mb_convert_case
levenshtein SUSPECT, testing needed
localeconv ?
ltrim OK without a $charlist 2nd param. or use preg_replace(‘/^\s+/u’,
”, $s);
mb_strtolower DO NOT USE, confirmed buggy! mb_convert_case($s, MB_CASE_LOWER,
“UTF-8”)
mb_strtoupper DO NOT USE, confirmed buggy! mb_convert_case($s,
MB_CASE_UPPER, “UTF-8”)
md5_file probably ok
md5 probably ok, i guess it returns the MD5 of the byte
string, as one would want
metaphone SUSPECT
money_format ?
nl2br DO NOT USE, preg_replace(‘/\n/u’, ‘<br>’, $s);
number_format ?
ord only use for ascii
parse_str Use mb_parse_str
print presumably mb-safe?
printf RISKY. ONLY use on 7-bit ascii,
http://www.php.net/manual/en/function.sprintf.php#89020
quotemeta SUSPECT, preg_replace
rtrim OK without a $charlist 2nd param. or use preg_replace(‘/\s+$/u’,
”, $s);
setlocale ALWAYS USE
sha1_file see md5
sha1 see md5
similar_text SUSPECT
soundex SUSPECT
sprintf RISKY. ONLY use on 7-bit ascii,
http://www.php.net/manual/en/function.sprintf.php#89020
sscanf RISKY. ONLY use on 7-bit ascii,
http://www.php.net/manual/en/function.sprintf.php#89020
str_getcsv OK if local and LANG set correctly
str_ireplace DO NOT USE, preg_replace
str_pad DO NOT USE
str_repeat SUSPECT
str_replace SAFE, or use preg_replace
str_rot13 DO NOT USE except on 7-bit ascii only
str_shuffle DO NOT USE
str_split > mb_split or use preg_split instead
str_word_count SUSPECT
strcasecmp DO NOT USE
strchr SUSPECT, use mb_strpos or mb_strrichr
strcmp according to comments on php.net, ok if is locale set
right
strcoll according to bug reports, ok on posix systems, not
windows. but set locale
strcspn DO NOT USE
strip_tags DO NOT USE
stripcslashes DO NOT USE
stripos > mb_stripos
stripslashes DO NOT USE, preg_replace(array(‘/\x5C(?!\x5C)/u’,
‘/\x5C\x5C/u’), array(”,’\\’), $s)
stristr > mb_stristr
strlen > mb_strlen, OK unless you need byte length, e.g. to
save a file, then use mb_strlen($s, ‘latin1’);
strnatcasecmp SUSPECT
strnatcmp SUSPECT
strncasecmp SUSPECT
strncmp SUSPECT
strpbrk SUSPECT, use preg
strpos > mb_strpos
strrchr SUSPECT, use
strrev DO NOT USE
strripos > mb_strripos
strrpos > mb_strpos
strspn DO NOT USE, use preg_match
strstr > mb_strstr
strtok DO NOT USE
strtolower DO NOT USE. mb_strtoupper fails on some cases when
mb_convert_case($str, MB_CASE_UPPER, “UTF-8”) does not
strtoupper DO NOT USE. mb_strtolower fails on some cases when
mb_convert_case($str, MB_CASE_LOWER, “UTF-8”) does not
strtr DO NOT USE with 3-params. 2-param version ok with valid
utf-8.
substr_compare DO NOT USE
substr_count > mb_substr_count, or preg_match_all?
substr_replace DO NOT USE
substr > mb_substr, see also mb_strcut & mb_strimwidth
trim OK without a $charlist 2nd param. or
preg_replace(‘/(^\s+)|(\s+$)/’, ”, $s);
ucfirst DO NOT USE
ucwords DO NOT USE, mb_convert_case($str, MB_CASE_TITLE,
“UTF-8”)
vfprintf DO NOT USE,
http://www.php.net/manual/en/function.sprintf.php#89020
vprintf DO NOT USE,
http://www.php.net/manual/en/function.sprintf.php#89020
vsprintf DO NOT USE,
http://www.php.net/manual/en/function.sprintf.php#89020
wordwrap SUSPECT
urlencode OK
rawurlencode OK
urldecode SUSPECT
rawurldecode SUSPECT
utf8_encode only use on ascii or 8859-1
utf8_decode ?

Advertisements

i was in the 4/5 35+ race. the pace was pretty strong and i’m glad there were downhill stretches between the ups. it’s 63 miles with very little flat. nice course with good quality surfaces and safe wide downhills.

by 25 miles in there were only about a dozen riders left in the group i was in. having kept close the the front, i was under impression it was the lead group. at 35 miles i got a flat and pulled over to wait for the support vehicle. it never came.

eventually the support for the 4/5 open race drove by without acknowledging me. later the women came by and a vehicle stopped. an official said she had no support with her but took my number and said the wheel truck is only a minute behind. it too blew past me.

it seems that the error i made was to misconstrue the organizers’ promise of support, as stated in the flyer and then explained to us before the start of the race. i spoke to an official after the race and he explained that the support vehicle only supports the race leaders and vehicles aren’t supposed to help riders in other races.

so there must have been a break ahead of us that i was unaware of. though i rode near the front (i thought) until i flatted i didn’t see them go and i didn’t see the support vehicle pass. i guess it must have been a small number of riders in the lead group.

thus in a relentlessly hilly race like quabbin, in which the field necessarily gets strung out, it seems that when they say that support is provided, this has to be construed as meaning that no support is provided to 95% of the riders. unless confident of being in the money, you must assume that you’re on your own.

i wish i had known that in advance.

anyway, i chased the women’s support truck for 8 miles on a flat without catching it. i stopped to talk to the policeman at the turn in hardwick and asked if there was a way to contact the support crews. he said he had no idea and bemoaned that he had been completely unprepared, that nothing had been explained to him.

a back-marker from the 4/5 open race came past then and offered me co2. i remembered that i had sealant in my tires so i accepted and it worked. the tire stayed inflated to the finish. i’m very grateful for that. i rode on my own except for about the last 8 miles with one of women from the group i passed.

my other error was: forgetting to get the 3-hour bottle of perpetuem out of the cooler box before going to the start line. with spending half an hour waiting for imagined support i was out of water with more than an hour of hot riding to go and very thirsty. 3 bottles was not enough. i was getting bonkers towards the end. i have only myself to blame for that dumb error.

astonishingly, the results put me 60th out of 70 starters and 67 finishers, 45 minutes behind the winner. i though my ride was bad enough; i’d love to hear the stories of the 6 behind me.

Regular readers of Philip Dawdy’s excellent Furious Seasons web site will be familiar with his opinion of the DSM’s bipolar II diagnosis. In keeping with his idea of “a free market of ideas in the mental health world” I would like to contribute my opinions on this topic.

First, let me be clear: I admire Phillip’s work on Furious Seasons, have supported his fundraisers, and hope he keeps at it.

The opinion that causes some controversy is succinctly put in his interview with Christopher Lane in Psychology Today.

Here’s the quote in full:

I may be the only writer in America who thinks BP2 is controversial and I can hardly think of any doctors who do. For me, it’s a questionable classification and something of a cop-out by the DSM writers for a couple of reasons: One, BP2 isn’t bipolar disorder, properly understood. There’s no mania, there’s no hospitalization for mania, and there’s no one running naked down the street. The most prominent features of BP2 are depression (and that covers the vast majority of a person’s time who is diagnosed with BP2) and bursts of energy, broadly understood. To me, that sounds a whole lot more like depression and agitation than it does manic-depression.

Two, the minute someone gets hit with a bipolar disorder diagnosis of any subtype, then they are faced with a profoundly bad set of social assumptions; they get stigmatized by friends and family; and they lose their jobs. I know of multiple cases along these lines, including one of a sheriff’s deputy in King County, Washington who was fired from her job as soon as the brass learned she had BP2, even though she had a stellar track record as a cop and had done nothing wrong on the job. That hardly seems fair when we’re talking about a disorder that doesn’t involve hallucinations or psychosis and has none of the off-the-charts impulsivity of true manic-depression. While it’s nice of researchers and mental-health advocates to claim that we’ve got to end this kind of stigma, in the real world that would take generations and by then people with BP2 today will have reached the ends of their natural lives.


Why BP2 wasn’t called something else is beyond me, but the diagnosis has sure caused a lot of unfair social damage.

I have a BP2 diagnosis, the comical history of which you can read here, and Phillip’s description in the first paragraph doesn’t characterize my experience at all well. The reason I have a BP2 dx rather than BP is that I haven’t suffered “marked functional impairment” in any of my “hypomanic episodes”. If I had then DSM 4’s criteria would have me as BP.

Hospitalization is not a required criterion for diagnosis of mania or BP. Nor is running naked down the street. What I experienced included delusions (e.g. I once began planning to become Prime Minister), paranoia, demented spending (thankfully I had no lines of credit when the behavior was worst when I was younger or it would have been ruinous), crazy creativity with loss of my self-critical faculty, no sleep, ludicrous self-esteem and embarrassing incidents the memory of which make me wince decades hence. This is a bit more than a “burst of energy, broadly understood”. And there is suspicion of genetic evidence: my father’s odd behavior and suicide smacks of manic depression. I rather agree with my shrink that the criteria of mania and BP are met rather closely except that, because I never lost a job, got kicked out of school, got arrested or was hospitalized, it lacks “marked functional impairment”. In other words, I got away with it. Apparently that makes it BP2.

Nor is this behavior agitated depression. I have a lot of experience with that and it is entirely different. In agitated depression my mood is dysphoric, pessimistic and cynical but I can’t sleep, relax or let up with the negativity whereas in hypomania I am euphoric, self-confident, optimistic and at one with the world. There’s no way to confuse these states, in my experience.

On Philip’s second point, I don’t really disagree but the statement sounds a little sweeping. I’m sure some people have suffered negative and unfair social consequences but I’m not aware of any affecting me, at least not so far and certainly not within the first minute of diagnosis.

Whether or not a different name for this disorder would, on the whole, have been better for patients, I really don’t know. Would the social consequences for something called, say, Major Depression with Hypomania (with, as most new psychiatric disorders have, a three letter abbreviation, say MDH) be any better? I don’t find that very convincing but I honestly don’t know.

Moreover, I imagine there may be benefit to patients from the BP2 name. It seems clear from the reading I’ve done that it’s important to treat BP2 in basically the same way as bipolar, especially in regard to the dangers of antidepressants. I imagine that many (most?) physicians are aware of these concerns in bipolar. My own GP refused to prescribe an antidepressant because of his suspicion of bipolar. He sent me to a psychiatrist who refused to prescribe an antidepressant without first a robust mood stabilizer. It took two years to get that right before I was given the antidepressant. According to, for example, Husseini Manji, this is the safest approach. (He even prefers in cases of MDD that are familial.)

If BP2 had instead a name that failed to make the association with bipolar, I wonder if some physicians, especially those who aren’t psychiatrists, might be less likely to recognize these risks. Given that most BP2 patients present with depression, the association with the bipolar word may spare them some risk.

Bontrager inForm RXL saddle review

Summary: I tried out a Bontrager inForm RXL saddle for two weeks and took it on two 70+ mile rides. It was ok on short rides but after about 40 miles it started to hurt. By the end of the two long rides I was hurt so bad I needed a couple of days to recover. The saddle also has a fairly slippery cover that I also found undesirable. I prefer a saddle that presents more resistance to lateral forces so I don’t slide around unexpectedly.

Background and requirements: I am 44 years old, male, with 40+ years cycling experience. I ride long distance events and recently started road racing. On my long distance comfort bike I usually ride a Brooks B17. It is generally comfy but puts too much pressure on the perineum when riding low on the drop or on aero bars. I can start to feel my family assets go numb after only about 100 miles on a B17. That’s ok if I’m in no hurry because I can sit up more but I’m planning on riding the Saratoga 24-hour time trial this July and would like to do 400 miles if I can. A B17 isn’t going to work for that. I need a saddle that will be comfortable for 24 hours with a lot of that spent low on the drops or aero bars.

My racing bike has a Specialized Toupé saddle that is pretty good but also not comfortable enough for long rides. After about 80 miles the tissue under my public arch (the bone cyclists sit on) gets sore. So I’m looking to solve that problem too.

I was interested in the Bontrager inForm because of their claim to have put some formal scientific study into the physiology and biomechanics relating to saddle design. I was also attracted by their 90-day trial period. I was measured and chose the RXL medium width. It was good as far as reducing pressure on the perineum was concerned. The problem, like the Toupé, was with the tissue under the public arch. I became so sore after about 40 miles on both the longer rides that I found myself standing far too often just to relieve the pressure. The pain was present for a couple of days after both rides. It is a wonder that anyone could achieve such an uncomfortable saddle design. I returned it.

So I’m still looking for the right saddle. Fizik Airone has many followers, perhaps the Tri version. And I was recommended Sella Italia Flite Gel Flow and SLC Gel Flow. Any other ideas? Trial and error can get expensive in this game.