Hi guys, Christ is risen!
I have some problems with a php script which is supposed to send unicode text into a database.
The most important problem is described in this projects title:
An article's text is being first extracted from a site which is greek, but when it is stored in the database it becomes transliterated to some sort of extended latin. I have tried SET NAMES 'utf8' and SET CHARACTER SET 'utf8', but when I use that, almost all of the text gets truncated.
I'm attaching the script.
And here are some URLs that the script can use as input:
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
[login to view URL]
Now there are three more things I would like to have from this project:
1) The article of the last link contains a photograph. Can the extraction page be used to also extract and store such photographs together with the article?
2) The source code of the original site contains a span tag with the property: class='headline'. Can the enclosed text be retrieved and stored in a new field of the database table as the title of the article?
3) The time-stamp does not need to be stored in the database.
Thanks for looking.