converting php/mysql/apache app from latin-1 to utf-8
April 27, 2009
these are the notes i wrote to myself as i was preparing to port a big and old app to utf-8. i do not claim they are correct but they worked for me. most of this is not original but derived and condensed from other web pages as noted below. the purpose of this list is as a cheat sheet or to-do list. feel free to leave comments but try to be polite and don’t yell at me if i got something wrong.
wordpress insists on displaying simple single quote and simple double quote characters in random open/close forms in the following. sorry. please ignore and imagine they were all just the simple vertical versions.
useful web sites
- http://www.phpwact.org/php/i18n/utf-8
- http://www.phpwact.org/php/i18n/charsets
- http://www.phpwact.org/php/i18n/utf-8/mysql
- http://devlog.info/2008/08/24/php-and-unicode-utf-8/
- http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/
- http://www.nicknettleton.com/zine/php/php-utf-8-cheatsheet
- http://www.cs.tut.fi/~jkorpela/chars.html
immediately after opening a mysql connection, either:
- SET NAMES ‘utf8′;
- or mysql_set_charset(‘utf8′, $connection_handle);
use <form accept-charset=”utf-8″> on every form
convert html, php, js, css and other text files
declare css files as utf-8: @charset “UTF-8″;
declare linked js files in html tag as utf-8
if using htmlspecialchars, use htmlspecialchars($s, ENT_COMPAT, ‘UTF-8′);
- use ENT_COMPAT mode, e.g. so that if putting attribute values with ” into html tags from a script, it won’t screw up.
add to top of every script ?
- $default_locale = setlocale(LC_ALL, ‘en_US.UTF-8′);
- ini_set(‘default_charset’, ‘UTF-8′ );
and just before page output PHPLIBtemplates.inc.php:
- header(‘Content-Type: text/html; charset=utf-8′);
in apache config
- AddDefaultCharset utf-8
in php.ini
- mbstring.func_overload=7
- default_charset=UTF-8
- mbstring.internal_encoding=UTF-8
mbstring.func_overload=7 covers ereg and some string functions as listed in mbstring functions and detailed below. many string functions are still not safe.
PCRE
- all pregs need the utf8 u modifier: preg_match(‘/myregex/u’, $str)
- avoid pcre i modifier
- avoid \w \W \b \B
to find the byte count of a multi-byte string when you are using mbstring.func_overload 2 and UTF-8 strings:
- mb_strlen($utf8_string, ‘latin1′);
to validate form input as utf8, http://devlog.info/2008/08/24/php-and-unicode-utf-8 says
- (strlen($str) AND !preg_match(‘/^.{1}/us’, $str)) // true means bad utf-8
but http://www.phpwact.org/php/i18n/charsets says this cannot be trusted. so use mb_check_encoding() to get a true/false answer
to quietly sanitize utf8 input strings (http://blog.liip.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html):
- $s = iconv(“UTF-8″,”UTF-8//IGNORE”,$s);
which quietly deals with bad utf-8 input. it’s safe to use the result but it doesn’t require adding code to send the form back to the users for re-entry.
test strings
$strs = array( 'Iñtërnâtiônàlizætiøn', 'החמאס: רוצים להשלים את עסקת שליט במהירות האפשרית', 'ايران لا ترى تغييرا في الموقف الأمريكي', '独・米で死傷者を出した銃の乱射事件', '國會預算處公布驚人的赤字數據後', '이며 세계 경제 회복에 걸림돌이 되고 있다', 'В дагестанском лесном массиве южнее села Какашура', 'นายประสิทธิ์ รุ่งสะอาด ปลัดเทศบาล รักษาการแทนนายกเทศมนตรี ต.ท่าทองใหม่', 'ભારતીય ટીમનો સુવર્ણ યુગ : કિવીઝમાં પણ કમાલ', 'ཁམས་དཀར་མཛེས་ས་ཁུལ་དུ་རྒྱ་གཞུང་ལ་ཞི་བའི་ངོ་རྒོལ་', 'Χιόνια, βροχές και θυελλώδεις άνεμοι συνθέτουν το', 'Հայաստանում սկսվել է դատական համակարգի ձեւավորումը', 'რუსეთი ასევე გეგმავს სამხედრო');
to be lazy, sanitize $_GET and $_POST input with
function clean_input(&$a) {
if ( isset($a) && is_array($a) && !empty($a) )
foreach ($a as $k => &$v)
clean_input($v);
elseif ( is_string($a) && !mb_check_encoding($a, 'UTF-8'))
$a = iconv('UTF-8', 'UTF-8//IGNORE', $a);
return true;
}
replacement for strtr()
function mystrtr($s, $p1, $p2=false) {
if ( is_string($p1) && is_string($p2)
&& mb_strlen($p1, 'UTF-8') == mb_strlen($p2, 'UTF-8') ) {
$t = '';
for ( $i=0; $i < mb_strlen($s, 'UTF-8'); $i++ )
$t .= ($j = mb_strpos($p1, $c = substr($s, $i, 1), 0, 'UTF-8')) === false
? $c
: mb_substr($p2, $j, 1, 'UTF-8');
return $t;
} elseif ( $p2 === false && is_array($p1) ) {
return strtr($s, $p1);
}
trigger_error('mystrtr() called with bad parameters strlen(p1)=' . mb_strlen($p1, 'UTF-8')
. ' strlen(p2)=' . mb_strlen($p2, 'UTF-8'), E_USER_WARNING);
return $s;
}
notes on specific functions learned from own tests, links noted above and in the table
| addcslashes | DO NOT USE |
| addslashes | DO NOT USE |
| chop | see rtrim |
| chr | only use for ascii |
| chunk_split | SUSPECT, probably works on byte strings |
| count_chars | operates on byte strings, use only on ascii or 8859 |
| crc32 | see md5 |
| crypt | see md5 |
| echo | presumably mb-safe? |
| explode | SAFE, but can use preg_split |
| fprintf | DO NOT USE, http://www.php.net/manual/en/function.sprintf.php#89020 |
| fscanf | DO NOT USE, http://www.php.net/manual/en/function.sprintf.php#89020 |
| html_entity_decode | DO NOT USE, see htmlspecialchars |
| htmlentities | DO NOT USE, see htmlspecialchars |
| htmlspecialchars | OK but use htmlspecialchars($s, ENT_COMPAT, ‘UTF-8′) |
| implode | probably OK? |
| join | same as implode |
| lcfirst | DO NOT USE, mb_convert_case |
| levenshtein | SUSPECT, testing needed |
| localeconv | ? |
| ltrim | OK without a $charlist 2nd param. or use preg_replace(‘/^\s+/u’, ”, $s); |
| mb_strtolower | DO NOT USE, confirmed buggy! mb_convert_case($s, MB_CASE_LOWER, “UTF-8″) |
| mb_strtoupper | DO NOT USE, confirmed buggy! mb_convert_case($s, MB_CASE_UPPER, “UTF-8″) |
| md5_file | probably ok |
| md5 | probably ok, i guess it returns the MD5 of the byte string, as one would want |
| metaphone | SUSPECT |
| money_format | ? |
| nl2br | DO NOT USE, preg_replace(‘/\n/u’, ‘<br>’, $s); |
| number_format | ? |
| ord | only use for ascii |
| parse_str | Use mb_parse_str |
| presumably mb-safe? | |
| printf | RISKY. ONLY use on 7-bit ascii, http://www.php.net/manual/en/function.sprintf.php#89020 |
| quotemeta | SUSPECT, preg_replace |
| rtrim | OK without a $charlist 2nd param. or use preg_replace(‘/\s+$/u’, ”, $s); |
| setlocale | ALWAYS USE |
| sha1_file | see md5 |
| sha1 | see md5 |
| similar_text | SUSPECT |
| soundex | SUSPECT |
| sprintf | RISKY. ONLY use on 7-bit ascii, http://www.php.net/manual/en/function.sprintf.php#89020 |
| sscanf | RISKY. ONLY use on 7-bit ascii, http://www.php.net/manual/en/function.sprintf.php#89020 |
| str_getcsv | OK if local and LANG set correctly |
| str_ireplace | DO NOT USE, preg_replace |
| str_pad | DO NOT USE |
| str_repeat | SUSPECT |
| str_replace | SAFE, or use preg_replace |
| str_rot13 | DO NOT USE except on 7-bit ascii only |
| str_shuffle | DO NOT USE |
| str_split | > mb_split or use preg_split instead |
| str_word_count | SUSPECT |
| strcasecmp | DO NOT USE |
| strchr | SUSPECT, use mb_strpos or mb_strrichr |
| strcmp | according to comments on php.net, ok if is locale set right |
| strcoll | according to bug reports, ok on posix systems, not windows. but set locale |
| strcspn | DO NOT USE |
| strip_tags | DO NOT USE |
| stripcslashes | DO NOT USE |
| stripos | > mb_stripos |
| stripslashes | DO NOT USE, preg_replace(array(‘/\x5C(?!\x5C)/u’, ‘/\x5C\x5C/u’), array(”,’\\’), $s) |
| stristr | > mb_stristr |
| strlen | > mb_strlen, OK unless you need byte length, e.g. to save a file, then use mb_strlen($s, ‘latin1′); |
| strnatcasecmp | SUSPECT |
| strnatcmp | SUSPECT |
| strncasecmp | SUSPECT |
| strncmp | SUSPECT |
| strpbrk | SUSPECT, use preg |
| strpos | > mb_strpos |
| strrchr | SUSPECT, use |
| strrev | DO NOT USE |
| strripos | > mb_strripos |
| strrpos | > mb_strpos |
| strspn | DO NOT USE, use preg_match |
| strstr | > mb_strstr |
| strtok | DO NOT USE |
| strtolower | DO NOT USE. mb_strtoupper fails on some cases when mb_convert_case($str, MB_CASE_UPPER, “UTF-8″) does not |
| strtoupper | DO NOT USE. mb_strtolower fails on some cases when mb_convert_case($str, MB_CASE_LOWER, “UTF-8″) does not |
| strtr | DO NOT USE with 3-params. 2-param version ok with valid utf-8. |
| substr_compare | DO NOT USE |
| substr_count | > mb_substr_count, or preg_match_all? |
| substr_replace | DO NOT USE |
| substr | > mb_substr, see also mb_strcut & mb_strimwidth |
| trim | OK without a $charlist 2nd param. or preg_replace(‘/(^\s+)|(\s+$)/’, ”, $s); |
| ucfirst | DO NOT USE |
| ucwords | DO NOT USE, mb_convert_case($str, MB_CASE_TITLE, “UTF-8″) |
| vfprintf | DO NOT USE, http://www.php.net/manual/en/function.sprintf.php#89020 |
| vprintf | DO NOT USE, http://www.php.net/manual/en/function.sprintf.php#89020 |
| vsprintf | DO NOT USE, http://www.php.net/manual/en/function.sprintf.php#89020 |
| wordwrap | SUSPECT |
| urlencode | OK |
| rawurlencode | OK |
| urldecode | SUSPECT |
| rawurldecode | SUSPECT |
| utf8_encode | only use on ascii or 8859-1 |
| utf8_decode | ? |
Neophyte errors at Quabbin Reservoir Road Race
April 26, 2009
i was in the 4/5 35+ race. the pace was pretty strong and i’m glad there were downhill stretches between the ups. it’s 63 miles with very little flat. nice course with good quality surfaces and safe wide downhills.
by 25 miles in there were only about a dozen riders left in the group i was in. having kept close the the front, i was under impression it was the lead group. at 35 miles i got a flat and pulled over to wait for the support vehicle. it never came.
eventually the support for the 4/5 open race drove by without acknowledging me. later the women came by and a vehicle stopped. an official said she had no support with her but took my number and said the wheel truck is only a minute behind. it too blew past me.
it seems that the error i made was to misconstrue the organizers’ promise of support, as stated in the flyer and then explained to us before the start of the race. i spoke to an official after the race and he explained that the support vehicle only supports the race leaders and vehicles aren’t supposed to help riders in other races.
so there must have been a break ahead of us that i was unaware of. though i rode near the front (i thought) until i flatted i didn’t see them go and i didn’t see the support vehicle pass. i guess it must have been a small number of riders in the lead group.
thus in a relentlessly hilly race like quabbin, in which the field necessarily gets strung out, it seems that when they say that support is provided, this has to be construed as meaning that no support is provided to 95% of the riders. unless confident of being in the money, you must assume that you’re on your own.
i wish i had known that in advance.
anyway, i chased the women’s support truck for 8 miles on a flat without catching it. i stopped to talk to the policeman at the turn in hardwick and asked if there was a way to contact the support crews. he said he had no idea and bemoaned that he had been completely unprepared, that nothing had been explained to him.
a back-marker from the 4/5 open race came past then and offered me co2. i remembered that i had sealant in my tires so i accepted and it worked. the tire stayed inflated to the finish. i’m very grateful for that. i rode on my own except for about the last 8 miles with one of women from the group i passed.
my other error was: forgetting to get the 3-hour bottle of perpetuem out of the cooler box before going to the start line. with spending half an hour waiting for imagined support i was out of water with more than an hour of hot riding to go and very thirsty. 3 bottles was not enough. i was getting bonkers towards the end. i have only myself to blame for that dumb error.
astonishingly, the results put me 60th out of 70 starters and 67 finishers, 45 minutes behind the winner. i though my ride was bad enough; i’d love to hear the stories of the 6 behind me.
Commenting Philip Dawdy’s comments on bipolar II
April 12, 2009
Regular readers of Philip Dawdy’s excellent Furious Seasons web site will be familiar with his opinion of the DSM’s bipolar II diagnosis. In keeping with his idea of “a free market of ideas in the mental health world” I would like to contribute my opinions on this topic.
First, let me be clear: I admire Phillip’s work on Furious Seasons, have supported his fundraisers, and hope he keeps at it.
The opinion that causes some controversy is succinctly put in his interview with Christopher Lane in Psychology Today.
Here’s the quote in full:
I may be the only writer in America who thinks BP2 is controversial and I can hardly think of any doctors who do. For me, it’s a questionable classification and something of a cop-out by the DSM writers for a couple of reasons: One, BP2 isn’t bipolar disorder, properly understood. There’s no mania, there’s no hospitalization for mania, and there’s no one running naked down the street. The most prominent features of BP2 are depression (and that covers the vast majority of a person’s time who is diagnosed with BP2) and bursts of energy, broadly understood. To me, that sounds a whole lot more like depression and agitation than it does manic-depression.
Two, the minute someone gets hit with a bipolar disorder diagnosis of any subtype, then they are faced with a profoundly bad set of social assumptions; they get stigmatized by friends and family; and they lose their jobs. I know of multiple cases along these lines, including one of a sheriff’s deputy in King County, Washington who was fired from her job as soon as the brass learned she had BP2, even though she had a stellar track record as a cop and had done nothing wrong on the job. That hardly seems fair when we’re talking about a disorder that doesn’t involve hallucinations or psychosis and has none of the off-the-charts impulsivity of true manic-depression. While it’s nice of researchers and mental-health advocates to claim that we’ve got to end this kind of stigma, in the real world that would take generations and by then people with BP2 today will have reached the ends of their natural lives.
Why BP2 wasn’t called something else is beyond me, but the diagnosis has sure caused a lot of unfair social damage.
I have a BP2 diagnosis, the comical history of which you can read here, and Phillip’s description in the first paragraph doesn’t characterize my experience at all well. The reason I have a BP2 dx rather than BP is that I haven’t suffered “marked functional impairment” in any of my “hypomanic episodes”. If I had then DSM 4’s criteria would have me as BP.
Hospitalization is not a required criterion for diagnosis of mania or BP. Nor is running naked down the street. What I experienced included delusions (e.g. I once began planning to become Prime Minister), paranoia, demented spending (thankfully I had no lines of credit when the behavior was worst when I was younger or it would have been ruinous), crazy creativity with loss of my self-critical faculty, no sleep, ludicrous self-esteem and embarrassing incidents the memory of which make me wince decades hence. This is a bit more than a “burst of energy, broadly understood”. And there is suspicion of genetic evidence: my father’s odd behavior and suicide smacks of manic depression. I rather agree with my shrink that the criteria of mania and BP are met rather closely except that, because I never lost a job, got kicked out of school, got arrested or was hospitalized, it lacks “marked functional impairment”. In other words, I got away with it. Apparently that makes it BP2.
Nor is this behavior agitated depression. I have a lot of experience with that and it is entirely different. In agitated depression my mood is dysphoric, pessimistic and cynical but I can’t sleep, relax or let up with the negativity whereas in hypomania I am euphoric, self-confident, optimistic and at one with the world. There’s no way to confuse these states, in my experience.
On Philip’s second point, I don’t really disagree but the statement sounds a little sweeping. I’m sure some people have suffered negative and unfair social consequences but I’m not aware of any affecting me, at least not so far and certainly not within the first minute of diagnosis.
Whether or not a different name for this disorder would, on the whole, have been better for patients, I really don’t know. Would the social consequences for something called, say, Major Depression with Hypomania (with, as most new psychiatric disorders have, a three letter abbreviation, say MDH) be any better? I don’t find that very convincing but I honestly don’t know.
Moreover, I imagine there may be benefit to patients from the BP2 name. It seems clear from the reading I’ve done that it’s important to treat BP2 in basically the same way as bipolar, especially in regard to the dangers of antidepressants. I imagine that many (most?) physicians are aware of these concerns in bipolar. My own GP refused to prescribe an antidepressant because of his suspicion of bipolar. He sent me to a psychiatrist who refused to prescribe an antidepressant without first a robust mood stabilizer. It took two years to get that right before I was given the antidepressant. According to, for example, Husseini Manji, this is the safest approach. (He even prefers in cases of MDD that are familial.)
If BP2 had instead a name that failed to make the association with bipolar, I wonder if some physicians, especially those who aren’t psychiatrists, might be less likely to recognize these risks. Given that most BP2 patients present with depression, the association with the bipolar word may spare them some risk.
Bontrager inForm RXL saddle review
April 10, 2009
Bontrager inForm RXL saddle review
Summary: I tried out a Bontrager inForm RXL saddle for two weeks and took it on two 70+ mile rides. It was ok on short rides but after about 40 miles it started to hurt. By the end of the two long rides I was hurt so bad I needed a couple of days to recover. The saddle also has a fairly slippery cover that I also found undesirable. I prefer a saddle that presents more resistance to lateral forces so I don’t slide around unexpectedly.
Background and requirements: I am 44 years old, male, with 40+ years cycling experience. I ride long distance events and recently started road racing. On my long distance comfort bike I usually ride a Brooks B17. It is generally comfy but puts too much pressure on the perineum when riding low on the drop or on aero bars. I can start to feel my family assets go numb after only about 100 miles on a B17. That’s ok if I’m in no hurry because I can sit up more but I’m planning on riding the Saratoga 24-hour time trial this July and would like to do 400 miles if I can. A B17 isn’t going to work for that. I need a saddle that will be comfortable for 24 hours with a lot of that spent low on the drops or aero bars.
My racing bike has a Specialized Toupé saddle that is pretty good but also not comfortable enough for long rides. After about 80 miles the tissue under my public arch (the bone cyclists sit on) gets sore. So I’m looking to solve that problem too.
I was interested in the Bontrager inForm because of their claim to have put some formal scientific study into the physiology and biomechanics relating to saddle design. I was also attracted by their 90-day trial period. I was measured and chose the RXL medium width. It was good as far as reducing pressure on the perineum was concerned. The problem, like the Toupé, was with the tissue under the public arch. I became so sore after about 40 miles on both the longer rides that I found myself standing far too often just to relieve the pressure. The pain was present for a couple of days after both rides. It is a wonder that anyone could achieve such an uncomfortable saddle design. I returned it.
So I’m still looking for the right saddle. Fizik Airone has many followers, perhaps the Tri version. And I was recommended Sella Italia Flite Gel Flow and SLC Gel Flow. Any other ideas? Trial and error can get expensive in this game.