Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence
OK, this one is a bit geeked out again, but it’s relevant to China. If you’re an american, you could probably go your entire life without ever bumping into codepages, but if you’re life crosses paths with asia, you almost certainly will…
As we’re developing a new website,doing our subversion (version control system) check-in, I started bumping into a very unusual error.
ryan@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ sudo svn up svn: Valid UTF-8 data (hex:) followed by invalid UTF-8 sequence (hex: b8 b4 bc fe)
Unfortunately, google didn’t come up with much. The best hit was a Oct 10th post on the subversion users mailing list. Basically, the answer is that there’s no answer.
Well, I did an svn up in each child directory of the one causing the problem and eventually tracked the error down through my project’s directory tree. It looks like one of the guys using a windows system copied a JPEG with a Chinese GBK encoded filename onto the server. Everything is best kept in UTF-8.
Once finding the right file, you have to figure out how to delete a file with a name that can’t be typed…
ryan@116843:/spike/public/news/app/webroot/redv1.0/img/menu$ ls logo02.jpg ???? logo.jpg menu_acc_down.jpg menu_home_down.jpg menu_work_down.jpg logo03.jpg logo.jpg menu_acc.jpg menu_home.jpg menu_work.jpg logo04.jpg logo_top1.jpg menu_cameras_down.jpg menu_len_down.jpg logo05.jpg logo_top2.jpg menu_cameras.jpg menu_len.jpg logo06.jpg logo_top3.jpg menu_gall_down.jpg menu_tech_down.jpg logo_bottom.jpg logo_top.jpg menu_gall.jpg menu_tech.jpg
In this case, I just used: rm *\ logo.jpg since there was only one file matching this pattern… Next, I could commit again!
ryan@116843:/spike$ sudo svn up D public/.htaccess Updated to revision 38.
Thanks, that was just my problem!
(no other results suggested filename issues
)
I wrote a tiny script to enumerate though the directory, outputing the path then running ’svn status’ on each one to find the culprit. (as found no way to get svn to output which folder it was about to try before doing it – so it would show before the ‘helpful’ error message)
Thanks! I also ran into this problem, and could not see anyone coming up with a solution. Actually thought the problem laid within the files – thus deleting the ones making trouble would fix the problem. Good thing I spotted your blog first
Thanks for this post. You saved me quite a bit of time.
Thanks, you saved me a bit of time. You can run into problems importing invalid utf too, I was importing a Wordpress sitemap plugin that gave me problems. The last filename before the error was the folder of files that were invalid.
Hope this helps some one.
I’ve juste had this problem. I force UTF-8 encode to all the last file i change and it’s works !
Too bad you’ve got a lot of spam on here, but the answer here was perfect. Thanks.
I’ve just gone through and cleaned up some of the remaining SPAM, but I must admit that I have no clue where some of it comes from… To submit a comment you should be required to fill out the RE-CAPTSHA field, but it seems that some spammers have found a way around this. Since installing RE-CAPTSHA the SPAM has slowed down dramatically, but it does still arrive and the rate does seem to be increasing again.
Maybe there’s a backdoor in re-CAPTSHA?
I ran into the same error, however I cannot find the file causing the error. I even now have the same error on all my repositories, even the ones that were not affected. I still don;t have a clue how to recover my repositories I already removed the malicious project from the affected repository using svnfilter with no effect. Is there a way to rebuild the repository form the dump and identfy the malicious directory or file???
Regards Eric
@Eric-
Eric-
Your best bet is to go through each of the subdirectories of your project and do an “svn up” on the directory one at a time until you find the subdirectory(s) containing the file(s) that have an incompatible encoding.
Another useful tool is the Unix “file” command. Just run “file *” in the directory that you locate with the incompatible encoding and search for the file that comes back to you with an encoding type other than UTF-8.
Best of luck-
-R
Hi Ryan,
I did the exercise as you suggested, but I don’t see the UTF-8 encoded types. The file * command gives me “ASCII C++ program text, with CRLF line terminators”, “XML document text” etc… but no UTF-8, besides when I ran into these trouble I removed the latest directory that created this error from my repository, my current repository contains only those parts that where there before the problem popped up, however the problem still exists. The problem even pop’s up on repositories that were never touched?? I switched to an other svn client, tried the commandline and even created a new repository to test, all of them give me the same error and now I’m totally puzzeld and stuck with a messed-up subversion installation.
this is the part of the logging from tortoisesvn where the error popped up for the first time.
Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider.idl Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_i.c Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\IBugTraqProvider_h.h Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\inc\Interop.BugTraqProvider.dll Adding : C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin\issue-tracker-plugins.txt Error : Valid UTF-8 data Error : (hex:) Error : followed by invalid UTF-8 sequence Error : (hex: c0 a4 01) Finished! : 52 kBytes transferred in 0 minute(s) and 2 second(s)
I removed the directories and files added here using svnadmin dump and svnfilter, however in the new repository the error still persists (I removed most of the revisions that handled the above actions, but not all of them could be removed). What I cannot understand is, how this can affect a different repository?
@Eric-
I understand your frustration buddy. Just trying to help. You’re not looking for files that ARE UTF-8. You’re looking for files that ARE NOT UTF-8.
In your case, just run: cd C:\Projects\Visual Studio 2005\Projects\TortoiseRedminePlugin svn up inc svn up … (where “…” is another folder under TortiseRedminePlugin)
You’ll notice that this will run successfully for most of your files, but there will be some that don’t complete successfully. When you locate the folder where the error starts, then you run “svn up” one file at a time until you find the individual file(s) that are causing the problem. Delete those files from the repository and you’ll be all fixed up.
As for the UTF-8 issue… Again – the key is to find files that are in encodings OTHER THAN UTF-8. You’ve got such a file – it’s just a matter of finding it.
Hi Ryan,
Sorry if I sounded offending, It’s subversion that’s frustrating me, giving me no decent clue where to look. I appreciate your help very much. Thanks a lot ! I will go again through my files, but how about the project I removed from the repository, should I add it again to the repository to go though all the files in there ?
Finally I rebuild subversion with a patch that showed me the directory that was causing the trouble (found the patch at: http://www.nabble.com/-PATCH–Issue–2748:-non-UTF-8-filenames-in-the-repository-td19531299.html) however it didn’t help me yet. The error I get now is: Adding D:\Projects\t\TextDocument.txt Error Error converting entry in directory Error ‘/dtm/home/svn/svn/t/db/transactions/0-0.txn’ to UTF8 Error Valid UTF-8 data Error (hex:) Error followed by an invalid UTF-8 sequence Error (hex: c0 a4 01)
Now there are at least 2 things I don’t get 1: The dir points is a subversion repository transactionlog file not the project file I’m trying to import 2: This is a new repositories, I deleted all my old repositories and did a fresh import of a single text file edited with vi
And I still get the same error ???? I’m totally puzzeled if you have any clue,
Please….
Thankz Eric
Yes, filenames with non-latin chars cause the problem. Deleting them fixes the problem.
strace svn status will give you the name of the offending file. unfortunately, svn care about name of files that are in one of its directories, even if it’s not under revision.
Evdsande-
For some reason I never got notified about this comment.
The issue as reported above applies certainly to Mac OS X, Linux and other Unix systems, but I haven’t used Windows for more than a few minutes in the entire 21st Century, so I’m of limited help here.
The error does look the same though, so it is most likely one of the files under your “TortoiseRedminePlugin” folder. I would go through the sub-folders one at a time (first to “inc”) and then to the other sub-folders and do the commit one by one.
One of the files should have the encoding issue as described. Delete that file and the commit should proceed properly.
Best-
-R
Great post, I found it via Google. My problem was almost identical. An image got copied into a checked out tree, and svn update began failing due to a wacky character in the image name.
Thanks for taking the time to post this! It’s always a great feeling to find a solution to strange issues that actually work.
also happens for files that are not part of SVN!
my program spits out a log file each run — an error in the filename string made a bunch of log files with garbled file names. i got the same error as others (valid UTF-8 followed by invalid UTF-8), even though none of these log files were ever checked into the repo!
Just joining in with the thanks!
Top result on Google for svn “followed by invalid UTF-8 sequence”, you should be proud
Thanks for this post, this saved me some time. I just want to add my two cents. Sometimes finding the exact file that’s causing the problem is tough. I have a images directory with 2k files and one of these have this problem.
So this told me the directory was images/thumbnails. To find which file, i did:
So this told me the filename starts with Nokia-5530-
Hope this helps
Carlos