[one-users] Migration issue(s) from 3.7 to 4.2

Federico Zani federico.zani at roma2.infn.it
Mon Aug 26 05:53:14 PDT 2013


Hi Carlos,
    the problem is that I can't even get the xml of the vms.
It seems it's something related to how the xml in the "body" column (for 
both hosts and vms) of the database is structured.

Looking deeply in the migrations scripts, I solved the hosts problem by 
adding the <vms> node (even without child) under the <host> tag of the 
body column in "host_pool" table, but for the vms I still have to find a 
solution.

Now with hosts access I'm able to submit and control new vm instances, 
but I have dozens of running vms that I'm not even able to destroy (not 
even with the force switch turned on).

This is the xml of one my hosts, as returned by onehost show -x 
(relevant names are remmed out via the "[...]" string) :

<HOST>
   <ID>15</ID>
   <NAME>[...]</NAME>
   <STATE>2</STATE>
   <IM_MAD>im_kvm</IM_MAD>
   <VM_MAD>vmm_kvm</VM_MAD>
   <VN_MAD>dummy</VN_MAD>
   <LAST_MON_TIME>1377520947</LAST_MON_TIME>
   <CLUSTER_ID>101</CLUSTER_ID>
   <CLUSTER>[...]</CLUSTER>
   <HOST_SHARE>
     <DISK_USAGE>0</DISK_USAGE>
     <MEM_USAGE>20971520</MEM_USAGE>
     <CPU_USAGE>1800</CPU_USAGE>
     <MAX_DISK>0</MAX_DISK>
     <MAX_MEM>24596936</MAX_MEM>
     <MAX_CPU>2400</MAX_CPU>
     <FREE_DISK>0</FREE_DISK>
     <FREE_MEM>5558100</FREE_MEM>
     <FREE_CPU>2323</FREE_CPU>
     <USED_DISK>0</USED_DISK>
     <USED_MEM>19038836</USED_MEM>
     <USED_CPU>76</USED_CPU>
     <RUNNING_VMS>6</RUNNING_VMS>
   </HOST_SHARE>
   <VMS>
     <ID>326</ID>
   </VMS>
   <TEMPLATE>
     <ARCH><![CDATA[x86_64]]></ARCH>
     <CPUSPEED><![CDATA[1600]]></CPUSPEED>
     <FREECPU><![CDATA[2323.2]]></FREECPU>
     <FREEMEMORY><![CDATA[5558100]]></FREEMEMORY>
     <HOSTNAME><![CDATA[[...]]]></HOSTNAME>
     <HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
     <MODELNAME><![CDATA[Intel(R) Xeon(R) CPU E5645  @ 
2.40GHz]]></MODELNAME>
     <NETRX><![CDATA[16007208117863]]></NETRX>
     <NETTX><![CDATA[1185926401588]]></NETTX>
     <TOTALCPU><![CDATA[2400]]></TOTALCPU>
<TOTALMEMORY><![CDATA[24596936]]></TOTALMEMORY>
     <TOTAL_ZOMBIES><![CDATA[5]]></TOTAL_ZOMBIES>
<USEDCPU><![CDATA[76.8000000000002]]></USEDCPU>
<USEDMEMORY><![CDATA[19038836]]></USEDMEMORY>
     <ZOMBIES><![CDATA[one-324, one-283, one-314, one-317, 
one-304]]></ZOMBIES>
   </TEMPLATE>
</HOST>

As you can see, every hosts now recognize the connected vms as 
"zombies", probably because he can't query the vms.

I'm also sending you the xml contained in the "body" column of the 
vm_pool table of a vm I can't query with onevm show :

<VM>
    <ID>324</ID>
    <UID>0</UID>
    <GID>0</GID>
    <UNAME>oneadmin</UNAME>
    <GNAME>oneadmin</GNAME>
    <NAME>[...]</NAME>
    <PERMISSIONS>
       <OWNER_U>1</OWNER_U>
       <OWNER_M>1</OWNER_M>
       <OWNER_A>0</OWNER_A>
       <GROUP_U>0</GROUP_U>
       <GROUP_M>0</GROUP_M>
       <GROUP_A>0</GROUP_A>
       <OTHER_U>0</OTHER_U>
       <OTHER_M>0</OTHER_M>
       <OTHER_A>0</OTHER_A>
    </PERMISSIONS>
    <LAST_POLL>1375778872</LAST_POLL>
    <STATE>3</STATE>
    <LCM_STATE>3</LCM_STATE>
    <RESCHED>0</RESCHED>
    <STIME>1375457045</STIME>
    <ETIME>0</ETIME>
    <DEPLOY_ID>one-324</DEPLOY_ID>
    <MEMORY>4194304</MEMORY>
    <CPU>9</CPU>
    <NET_TX>432290511</NET_TX>
    <NET_RX>2072231827</NET_RX>
    <TEMPLATE>
       <CONTEXT>
          <ETH0_DNS><![CDATA[[...]]]></ETH0_DNS>
<ETH0_GATEWAY><![CDATA[[...]]]></ETH0_GATEWAY>
          <ETH0_IP><![CDATA[[...]]]></ETH0_IP>
<ETH0_MASK><![CDATA[[...]]]></ETH0_MASK>
          <FILES><![CDATA[[...]]]></FILES>
          <HOSTNAME><![CDATA[[...]]]></HOSTNAME>
          <TARGET><![CDATA[hdb]]></TARGET>
       </CONTEXT>
       <CPU><![CDATA[4]]></CPU>
       <DISK>
          <CLONE><![CDATA[YES]]></CLONE>
<CLUSTER_ID><![CDATA[101]]></CLUSTER_ID>
<DATASTORE><![CDATA[nonshared_ds]]></DATASTORE>
<DATASTORE_ID><![CDATA[101]]></DATASTORE_ID>
          <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
          <DISK_ID><![CDATA[0]]></DISK_ID>
          <IMAGE><![CDATA[[...]]]></IMAGE>
          <IMAGE_ID><![CDATA[119]]></IMAGE_ID>
<IMAGE_UNAME><![CDATA[oneadmin]]></IMAGE_UNAME>
          <READONLY><![CDATA[NO]]></READONLY>
          <SAVE><![CDATA[NO]]></SAVE>
<SOURCE><![CDATA[/var/lib/one/datastores/101/3860dfcd1bec39ce672ba855564b44ca]]></SOURCE>
          <TARGET><![CDATA[hda]]></TARGET>
          <TM_MAD><![CDATA[ssh]]></TM_MAD>
          <TYPE><![CDATA[FILE]]></TYPE>
       </DISK>
       <DISK>
          <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
          <DISK_ID><![CDATA[1]]></DISK_ID>
          <FORMAT><![CDATA[ext3]]></FORMAT>
          <SIZE><![CDATA[26000]]></SIZE>
          <TARGET><![CDATA[hdc]]></TARGET>
          <TYPE><![CDATA[fs]]></TYPE>
       </DISK>
       <DISK>
          <DEV_PREFIX><![CDATA[hd]]></DEV_PREFIX>
          <DISK_ID><![CDATA[2]]></DISK_ID>
          <SIZE><![CDATA[8192]]></SIZE>
          <TARGET><![CDATA[hdd]]></TARGET>
          <TYPE><![CDATA[swap]]></TYPE>
       </DISK>
       <FEATURES>
          <ACPI><![CDATA[yes]]></ACPI>
       </FEATURES>
       <GRAPHICS>
          <KEYMAP><![CDATA[it]]></KEYMAP>
          <LISTEN><![CDATA[0.0.0.0]]></LISTEN>
          <PORT><![CDATA[6224]]></PORT>
          <TYPE><![CDATA[vnc]]></TYPE>
       </GRAPHICS>
       <MEMORY><![CDATA[4096]]></MEMORY>
       <NAME><![CDATA[[...]]]></NAME>
       <NIC>
          <BRIDGE><![CDATA[br1]]></BRIDGE>
<CLUSTER_ID><![CDATA[101]]></CLUSTER_ID>
          <IP><![CDATA[[...]]]></IP>
<MAC><![CDATA[02:00:c0:a8:1e:02]]></MAC>
          <MODEL><![CDATA[virtio]]></MODEL>
          <NETWORK><![CDATA[[...]]]></NETWORK>
          <NETWORK_ID><![CDATA[9]]></NETWORK_ID>
<NETWORK_UNAME><![CDATA[oneadmin]]></NETWORK_UNAME>
          <VLAN><![CDATA[NO]]></VLAN>
       </NIC>
       <OS>
          <ARCH><![CDATA[x86_64]]></ARCH>
          <BOOT><![CDATA[hd]]></BOOT>
       </OS>
       <RAW>
          <TYPE><![CDATA[kvm]]></TYPE>
       </RAW>
       <REQUIREMENTS><![CDATA[CLUSTER_ID = 101]]></REQUIREMENTS>
       <TEMPLATE_ID><![CDATA[38]]></TEMPLATE_ID>
       <VCPU><![CDATA[4]]></VCPU>
       <VMID><![CDATA[324]]></VMID>
    </TEMPLATE>
    <HISTORY_RECORDS>
       <HISTORY>
          <OID>324</OID>
          <SEQ>0</SEQ>
          <HOSTNAME>[...]</HOSTNAME>
          <HID>15</HID>
          <STIME>1375457063</STIME>
          <ETIME>0</ETIME>
          <VMMMAD>vmm_kvm</VMMMAD>
          <VNMMAD>dummy</VNMMAD>
          <TMMAD>ssh</TMMAD>
          <DS_LOCATION>/var/datastore</DS_LOCATION>
          <DS_ID>102</DS_ID>
          <PSTIME>1375457063</PSTIME>
          <PETIME>1375457263</PETIME>
          <RSTIME>1375457263</RSTIME>
          <RETIME>0</RETIME>
          <ESTIME>0</ESTIME>
          <EETIME>0</EETIME>
          <REASON>0</REASON>
       </HISTORY>
    </HISTORY_RECORDS>
</VM>

I think it'd be of a great help for me to have the update XSD files for 
all the body columns in the databases: I'd able to validate the xml 
structure of all the tables to highlight migration problems.

Thanks! :)

F.


Il 21/08/2013 12:13, Carlos Martín Sánchez ha scritto:
> Hi,
>
> Could you send us the xml of some of the failing vms and hosts? You 
> can get it with the -x flag in onevm/host list.
>
> Send them off-list if you prefer.
>
> Regards
>
> --
> Join us at OpenNebulaConf2013 <http://opennebulaconf.com> in Berlin, 
> 24-26 September, 2013
> -- 
> Carlos Martín, MSc
> Project Engineer
> OpenNebula - The Open-source Solution for Data Center Virtualization
> www.OpenNebula.org <http://www.OpenNebula.org> | 
> cmartin at opennebula.org <mailto:cmartin at opennebula.org> | @OpenNebula 
> <http://twitter.com/opennebula>
>
>
> On Thu, Aug 8, 2013 at 11:29 AM, Federico Zani 
> <federico.zani at roma2.infn.it <mailto:federico.zani at roma2.infn.it>> wrote:
>
>     Hi,
>       I am experiencing some issues after the update from 3.7 to 4.2
>     (frontend on a CentOS 6.4 and hosts with KVM virt manager), this
>     is what I did :
>
>      - Stopped one and sunstone and backed up /etc/one
>      - yum localinstall opennebula-4.2.0-1.x86_64.rpm
>     opennebula-java-4.2.0-1.x86_64.rpm
>     opennebula-ruby-4.2.0-1.x86_64.rpm
>     opennebula-server-4.2.0-1.x86_64.rpm
>     opennebula-sunstone-4.2.0-1.x86_64.rpm
>      - duplicated im and vmm for kvm mads as specified here
>     http://opennebula.org/documentation:archives:rel4.0:upgrade#driver_names
>
>      - checked for other mismatch in one.conf but actually I found
>     nothing to be fixed
>      - onedb upgrade -v --sqlite /var/lib/one/one.db (no errors, just
>     a few warning about manual fixes needed - that I did)
>      - moved vm description files from //var/lib/one//[0-9]* to
>     //var/lib/one/vms//
>
>     Then I tried to fsck the sqlite db but got the following error :
>     --------------
>     onedb fsck -f -v -s /var/lib/one/one.db
>     Version read:
>     4.2.0 : Database migrated from 3.7.80 to 4.2.0 (OpenNebula 4.2.0)
>     by onedb command.
>
>     Sqlite database backup stored in /var/lib/one/one.db.bck
>     Use 'onedb restore' or copy the file back to restore the DB.
>       > Running fsck
>
>     Datastore 0 is missing fom Cluster 101 datastore id list
>     Image 127 is missing fom Datastore 101 image id list
>     undefined method `elements' for nil:NilClass
>     Error running fsck version 4.2.0
>     The database will be restored
>     Sqlite database backup restored in /var/lib/one/one.db
>     -----------
>
>     I also tried to reinstall ruby gems with
>     /usr/share/one/install_gems but still got the same issue.
>
>     After a few searching, I  tried to start one and sunstone-server
>     anyway, and this is the result :
>      - I can do "onevm list" and "onehost list" correctly
>      - When I do a "onevm show" on a terminated vm it shows me the
>     correct information
>      - When I do a "onevm show" (on a running vm) or "onehost show",
>     it returns a "[VirtualMachineInfo] Error getting virtual machine
>     [312]." or either "[HostInfo] Error getting host [30]."
>
>     In the log file (/var/log/oned.log) I can see the following
>     errors, when issuing those commands :
>     ----------
>     Tue Aug  6 12:49:40 2013 [ONE][E]: SQL command was: SELECT body
>     FROM host_pool WHERE oid = 30, error: callback requested query abort
>     Tue Aug  6 12:49:40 2013 [ONE][E]: SQL command was: SELECT body
>     FROM vm_pool WHERE oid = 312, error: callback requested query abort
>     ------------
>
>     I am still able to see datastores informations and the overall
>     situation of my private cloud through the sunstone dashboard, but
>     it seems I cannot access informations related to running vms and
>     hosts: it leads to an unusable private cloud (can't stop vms,
>     can't run a new one, etc...)
>
>     Any clues ?
>
>     Federico.
>
>     _______________________________________________
>     Users mailing list
>     Users at lists.opennebula.org <mailto:Users at lists.opennebula.org>
>     http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130826/4e1c01b1/attachment-0002.htm>


More information about the Users mailing list